Gemini 3 Flash
Gemini 3 Flash delivers Gemini 3's pro-grade reasoning at flash-level latency and cost, outperforming Gemini 2.5 Pro across most benchmarks with meaningful gains in token efficiency.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemini-3-flash', prompt: 'Why is the sky blue?'})About Gemini 3 Flash
Gemini 3 Flash is Google's speed-optimized model in the Gemini 3 generation, combining Gemini 3's reasoning depth with the efficiency profile of the Flash tier. It outperforms Gemini 2.5 Pro across most benchmarks, meaning a speed-tier model now surpasses a previous-generation flagship. Gemini 3 Flash achieves this with meaningful gains in token efficiency over the 2.5 generation. See live metrics on this page for current throughput.
Thinking is first-class in Gemini 3 Flash. The thinkingLevel and includeThoughts provider options let you surface intermediate reasoning steps. This helps when debugging multi-step pipelines, constructing chain-of-thought datasets, or validating that Gemini 3 Flash reasons through a problem correctly. Set thinkingLevel to high when the task demands deeper inference and your latency budget allows it.
Because Gemini 3 Flash sits at the intersection of quality and throughput, it fits a wide range of real-world traffic patterns, from low-latency chat interfaces to batch document processing pipelines. Accessing it through AI Gateway adds observability, automatic retries, and provider failover without requiring a Google Cloud account.