Skip to main content
Meta AIReleased September 25, 2024

Llama 3.2 90B Vision

Open Source#11 Arena Rank90B parameters

Llama 3.2 90B Vision holds a solid spot in the Arena rankings at #11. Context window: 0.128K tokens.

Context

128K

Input

Free (open)

Key Specifications

🏆

Arena Rank

#11

📐

Context Window

128K

📥

Input Price

per 1M tokens

Free (open)

📤

Output Price

per 1M tokens

Free (open)

🧠

Parameters

90B

🔓

Open Source

Yes

Best For

Image understandingvisual QAmultimodal tasks

About Llama 3.2 90B Vision

Llama 3.2 90B Vision, developed by Meta AI, is a multimodal open-source model with 90 billion parameters and a 128K token context window. The model processes both text and images, enabling visual question answering, document understanding, chart analysis, and image-grounded reasoning tasks. It represents Meta's first open-source model with vision capabilities, extending the Llama family beyond text-only processing. The vision encoder integrates seamlessly with the language model, producing coherent responses that reference visual elements accurately. Free and open-source, it can be deployed on enterprise GPU infrastructure for privacy-sensitive visual AI applications. Llama 3.2 90B Vision ranks #11 on the Chatbot Arena leaderboard, making it one of the highest-ranked open-source multimodal models available and a strong alternative to proprietary vision-language systems.

Pricing per 1M tokens

Input Tokens

Free (open)

Output Tokens

Free (open)

Frequently Asked Questions

What is Llama 3.2 90B Vision?
Llama 3.2 90B Vision, developed by Meta AI, is a multimodal open-source model with 90 billion parameters and a 128K token context window. The model processes both text and images, enabling visual question answering, document understanding, chart analysis, and image-grounded reasoning tasks. It represents Meta's first open-source model with vision capabilities, extending the Llama family beyond text-only processing. The vision encoder integrates seamlessly with the language model, producing coherent responses that reference visual elements accurately. Free and open-source, it can be deployed on enterprise GPU infrastructure for privacy-sensitive visual AI applications. Llama 3.2 90B Vision ranks #11 on the Chatbot Arena leaderboard, making it one of the highest-ranked open-source multimodal models available and a strong alternative to proprietary vision-language systems.
How much does Llama 3.2 90B Vision cost?
Meta AI charges Free (open) per 1M input tokens for Llama 3.2 90B Vision, with output at Free (open). Competitive with other models in its tier.
What is Llama 3.2 90B Vision's context window?
Llama 3.2 90B Vision supports up to 128K tokens per request. A larger context window allows the model to reason over longer inputs, which matters for document analysis, code review, and multi-turn conversations.
Is Llama 3.2 90B Vision open source?
Yes — Meta AI released Llama 3.2 90B Vision as open source. That means you're free to deploy it however you want: cloud, on-prem, edge. No API lock-in.
What is Llama 3.2 90B Vision best for?
Meta AI positions Llama 3.2 90B Vision for: Image understanding, visual QA, multimodal tasks. Real-world performance will depend on your specific prompts and data, but these are the intended strengths.