Gemini 3 Flash

Gemini 3 Flash delivers Gemini 3's pro-grade reasoning at flash-level latency and cost, outperforming Gemini 2.5 Pro across most benchmarks with meaningful gains in token efficiency. Your use subject to Google's Terms & Privacy Policies.

ReasoningTool UseFile InputVision (Image)Web Searchtiered-costImplicit Caching

Use with AI Gateway View docs

TypeScript

Python

import { streamText } from 'ai'

const result = streamText({
  model: 'google/gemini-3-flash',
  prompt: 'Why is the sky blue?'
})

Read docs

Overview About Providers Throughput Latency Uptime Status Similar FAQ

About Gemini 3 Flash

Gemini 3 Flash is Google's speed-optimized model in the Gemini 3 generation, combining Gemini 3's reasoning depth with the efficiency profile of the Flash tier. It outperforms Gemini 2.5 Pro across most benchmarks, meaning a speed-tier model now surpasses a previous-generation flagship. Gemini 3 Flash achieves this with meaningful gains in token efficiency over the 2.5 generation. See live metrics on this page for current throughput.

Thinking is first-class in Gemini 3 Flash. The thinkingLevel and includeThoughts provider options let you surface intermediate reasoning steps. This helps when debugging multi-step pipelines, constructing chain-of-thought datasets, or validating that Gemini 3 Flash reasons through a problem correctly. Set thinkingLevel to high when the task demands deeper inference and your latency budget allows it.

Because Gemini 3 Flash sits at the intersection of quality and throughput, it fits a wide range of real-world traffic patterns, from low-latency chat interfaces to batch document processing pipelines. Accessing it through AI Gateway adds observability, automatic retries, and provider failover without requiring a Google Cloud account.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Gemini 3 Flash

About Gemini 3 Flash