Skip to content

MiniMax M3

MiniMax M3 is MiniMax's first model with a 1M tokens context window and native multimodal input. It targets software engineering, terminal-based tool use, and agentic web browsing, with a max output of 1M tokens per request.

ReasoningTool UseVision (Image)File InputImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'minimax/minimax-m3',
prompt: 'Why is the sky blue?'
})

Playground

Try out MiniMax M3 by MiniMax. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
MiniMax
Legal:Terms
Privacy
1M
4.5s
47tps
$0.60/M$0.30/M
$2.40/M$1.20/M
Read:
$0.12/M$0.06/M
Write:
+3
05/31/2026
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by MiniMax

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
205K
0.4s
211tps
$0.15/M$0.60/M
Read:$0.06/M
Write:$0.38/M
blackbox logo
fireworks logo
minimax logo
+2
03/18/2026
205K
0.8s
42tps
$0.60/M$2.40/M
Read:$0.06/M
Write:$0.38/M
+2
minimax logo
03/18/2026
1M
0.5s
313tps
$0.07/M$0.57/M
Read:$0.03/M
Write:$0.38/M
+1
bedrock logo
blackbox logo
deepinfra logo
+3
02/12/2026
205K
0.8s
46tps
$0.60/M$2.40/M
Read:$0.03/M
Write:$0.38/M
+1
minimax logo
novita logo
02/12/2026
205K
0.5s
308tps
$0.30/M$1.20/M
Read:$0.03/M
Write:$0.38/M
+1
bedrock logo
minimax logo
novita logo
10/27/2025
205K
0.7s
96tps
$0.30/M$1.20/M
Read:$0.03/M
Write:$0.38/M
+1
minimax logo
novita logo
10/27/2025

About MiniMax M3

MiniMax M3 is built around MiniMax Sparse Attention (MSA), an attention variant that splits the key-value cache into blocks and pre-filters which blocks contribute to each query. That design supports the 1M tokens context window without the quadratic compute scaling of full attention, and it lets MiniMax M3 keep prefill and decode efficient on long inputs.

Native multimodality is wired in from the start of training rather than bolted on later. MiniMax M3 accepts text, image, and video input and produces text output. The pretraining pipeline aligns visual and textual semantics directly, which carries over to multimodal coding tasks like analyzing a screenshot of a failing test and writing a patch, or reproducing a bug from a GitHub issue thread.

MiniMax M3 is positioned for software engineering, terminal-based tool use, and agentic web browsing. It scores 59.0% on SWE-Bench Pro and 70.06% on OSWorld-Verified for computer use. Automatic prompt caching is enabled by default, which reduces effective cost on repeated context patterns common in agent loops.

What To Consider When Choosing a Provider

  • Configuration: MiniMax M3 pairs the 1M tokens context window with native image and video input, which makes it a fit for workflows that reason over screenshots, design references, or long video transcripts alongside code. Route through AI Gateway via the AI SDK plus Chat Completions / Responses / Messages APIs to get provider failover, observability, and unified pricing across the providers serving MiniMax M3.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use MiniMax M3

Best For

  • Long-horizon coding agents: Sessions that span an entire repository without fragmenting context across requests
  • Multimodal engineering: Workflows that reason over screenshots, diagrams, or video alongside code
  • Computer-use agents: Browser and desktop automation that benefits from strong OSWorld performance
  • Terminal-driven tool chains: Agents that read command output and iterate across many steps
  • Long-video and long-document understanding: Tasks that require sustained attention over hours of input

Consider Alternatives When

  • Raw inference speed: Latency matters more than capability breadth, so consider M3-highspeed
  • Short single-turn text tasks: A smaller model is cheaper when the workload is single-turn and text-only
  • Text-only reasoning: Standard M2.7 covers the use case at lower cost when multimodal input is not needed

Conclusion

MiniMax M3 brings a 1M tokens context window, native multimodal input, and agentic coding capability into one model. For teams running long-horizon agents over full repositories, browser sessions, or video input, MiniMax M3 reduces the need to split work across multiple specialized models. Route it through AI Gateway via the AI SDK plus Chat Completions / Responses / Messages APIs for failover and unified observability.

Frequently Asked Questions

  • What is MiniMax Sparse Attention?

    MSA is the attention variant behind MiniMax M3. It splits the key-value cache into blocks and pre-filters which blocks contribute to each query, which keeps compute manageable at the 1M tokens context length.

  • What input types does MiniMax M3 accept?

    Text, image, and video input. Output is text. Multimodality is native to MiniMax M3 rather than added through a separate vision adapter.

  • What is the context window for MiniMax M3?

    MiniMax M3 supports a context window of 1M tokens and a max output of 1M tokens per request.

  • How does MiniMax M3 compare to M2.7?

    M2.7 focuses on multi-agent orchestration, dynamic tool search, and text-only enterprise workflows. MiniMax M3 extends the series with native multimodal input, the 1M tokens context window, and the MSA architecture for long-context efficiency.

  • Is there a faster variant of MiniMax M3?

    Yes. Select minimax/minimax-m3-highspeed where your provider exposes it. The highspeed variant targets higher throughput with the same output behavior.

  • Does MiniMax M3 support automatic prompt caching?

    Yes. Automatic prompt caching is enabled by default, which reduces effective cost on repeated context patterns. $0.06 per million cached input tokens applies where the provider exposes a cached rate.

  • How do I access MiniMax M3 through the AI SDK?

    Set the model identifier to minimax/minimax-m3 in your AI SDK configuration. AI Gateway routes the request across the providers serving MiniMax M3 with configurable failover.

  • Is Zero Data Retention available for MiniMax M3?

    Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.