About MiniMax M3

MiniMax M3 is built around MiniMax Sparse Attention (MSA), an attention variant that splits the key-value cache into blocks and pre-filters which blocks contribute to each query. That design supports the 1.0M tokens context window without the quadratic compute scaling of full attention, and it lets MiniMax M3 keep prefill and decode efficient on long inputs.

Native multimodality is wired in from the start of training rather than bolted on later. MiniMax M3 accepts text, image, and video input and produces text output. The pretraining pipeline aligns visual and textual semantics directly, which carries over to multimodal coding tasks like analyzing a screenshot of a failing test and writing a patch, or reproducing a bug from a GitHub issue thread.

MiniMax M3 is positioned for software engineering, terminal-based tool use, and agentic web browsing. It scores 59.0% on SWE-Bench Pro and 70.06% on OSWorld-Verified for computer use. Automatic prompt caching is enabled by default, which reduces effective cost on repeated context patterns common in agent loops.

What To Consider When Choosing a Provider

Configuration: MiniMax M3 pairs the 1.0M tokens context window with native image and video input, which makes it a fit for workflows that reason over screenshots, design references, or long video transcripts alongside code. Route through AI Gateway via the AI SDK plus Chat Completions / Responses / Messages APIs to get provider failover, observability, and unified pricing across the providers serving MiniMax M3.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use MiniMax M3

Best for

Long-horizon coding agents: Sessions that span an entire repository without fragmenting context across requests
Multimodal engineering: Workflows that reason over screenshots, diagrams, or video alongside code
Computer-use agents: Browser and desktop automation that benefits from strong OSWorld performance
Terminal-driven tool chains: Agents that read command output and iterate across many steps
Long-video and long-document understanding: Tasks that require sustained attention over hours of input

Consider alternatives when

Raw inference speed: Latency matters more than capability breadth, so consider M3-highspeed
Short single-turn text tasks: A smaller model is cheaper when the workload is single-turn and text-only
Text-only reasoning: Standard M2.7 covers the use case at lower cost when multimodal input is not needed

Conclusion

MiniMax M3 brings a 1.0M tokens context window, native multimodal input, and agentic coding capability into one model. For teams running long-horizon agents over full repositories, browser sessions, or video input, MiniMax M3 reduces the need to split work across multiple specialized models. Route it through AI Gateway via the AI SDK plus Chat Completions / Responses / Messages APIs for failover and unified observability.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

MiniMax M3

Playground

Providers

More models by MiniMax