Qwen 3.5 Flash

Qwen 3.5 Flash is Alibaba Cloud's production-hosted multimodal model built on a hybrid linear-attention MoE architecture, offering a context window of 1M tokens and sub-second responsiveness for high-throughput agentic workloads. Your use subject to Alibaba Cloud's Terms & Privacy Policies.

Vision (Image)File InputReasoningTool Use

Use with AI Gateway View docs

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'alibaba/qwen3.5-flash',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

More models by Alibaba Cloud

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Capabilities	Providers	ZDR	No Training	Release Date

alibaba/qwen3.7-flash

991K

2.3s

170tps

$0.03/M

$0.13/M

Read:

$0.01/M

Write:

$0.04/M

—

07/28/2026

alibaba/qwen3.7-plus

2.0s

239tps

$0.32/M

$1.28/M

Read:$0.08/M

Write:$0.5/M

—

06/02/2026

alibaba/qwen3.7-max

991K

2.5s

55tps

$2.50/M

$7.50/M

Read:$0.5/M

Write:$3.13/M

—

05/21/2026

alibaba/qwen3.6-plus

0.9s

115tps

$0.50/M

$3/M

Read:

$0.1/M

Write:

$0.63/M

—

04/02/2026

alibaba/qwen3-embedding-0.6b

33K

$0.01/M

—

11/14/2025

alibaba/qwen-3-235b

262K

0.5s

85tps

$0.09/M

$0.10/M

—

04/28/2025

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Qwen 3.5 Flash

More models by Alibaba Cloud