2M-token context
The largest production context window in the frontier tier — enough for entire codebases, video transcripts, or document libraries in a single pass.
Long-context multimodal model with 2M-token windows and native video understanding.
Gemini 3 Pro is Google DeepMind's long-context multimodal model, released in December 2025. Its headline feature is a 2 million-token context window with native video understanding — it can watch hours of footage and reason across an entire document set in one request.
It is tightly integrated with Google Workspace and Vertex AI, which makes it a natural fit for teams already on Google Cloud. Pricing is aggressive for the context size, making large-context workloads cheaper than comparable frontier models.
The largest production context window in the frontier tier — enough for entire codebases, video transcripts, or document libraries in a single pass.
Reads and reasons over video directly, including timestamps and on-screen text — useful for review, summarisation, and search.
First-class hooks into Google Workspace and Vertex AI, with grounding against Google Search.
Accurate citation and grounding across mixed text, image, and video inputs.
Loading hundreds of documents into a single 2M-token window to synthesise findings with citations.
2Mtokens in one requestTranscribing, summarising, and making long-form video searchable by content and on-screen text.
hoursof footage reasoned over per callDrafting in Docs, analysing Sheets, and triaging Gmail with native Workspace context.
nativeGoogle Workspace integrationBest value inside Google Cloud. Much of the advantage comes from Workspace and Vertex integration; outside that ecosystem the edge narrows.
Large contexts are slow. Filling the 2M-token window adds real latency, so it is better for back-office analysis than sub-second chat.
No self-hosting. Like other frontier closed models, Gemini 3 Pro cannot be self-hosted or fine-tuned on its weights.
Gemini 3 Pro supports up to a 2 million-token context window — the largest in the current frontier tier.
Yes — it natively reads and reasons over video, including timestamps and on-screen text, not just extracted frames.
Pricing is around $7 per million input tokens and $21 per million output tokens — aggressive for the context size on offer.
Our weekly AI brief — written by the team shipping it.
Joined by 4,200+ engineers, founders & product leads · Unsubscribe anytime