đť•Ź Grok-2 Beta Release

by Stephen M. Walker II, Co-Founder / CEO

Top tip

Grok-2 fine-tuning is now available, enabling Grok-2 customization for your specific use cases.

What is Grok-2?

Grok-2 represents a significant advancement in x.ai's language model offerings, building upon the capabilities of its predecessor, Grok-1.5. Released on the đť•Ź platform, Grok-2 and its smaller counterpart, Grok-2 mini, are designed to provide intelligent chat assistants with superior reasoning, chat, and coding functionalities.

BenchmarkGrok-1.5Grok-2 mini‡Grok-2‡
GPQA35.9%51.0%56.0%
MMLU81.3%86.2%87.5%
MMLU-Pro51.0%72.0%75.5%
MATH§50.6%73.0%76.1%
HumanEval¶74.1%85.7%88.4%
MMMU53.6%63.2%66.1%
MathVista52.8%68.1%69.0%
DocVQA85.6%93.2%93.6%
Grok-2 Factuality Preference

These models are currently in beta, with grok-2 mini available currently on x.com and both models released to Enterprise APIs in the coming weeks.

Performance Benchmarks

Grok-2 has been rigorously evaluated across various benchmarks, demonstrating its prowess in reasoning, reading comprehension, math, science, and coding. It outperforms previous models and competes effectively with other leading AI models.

Notably, Grok-2 excels in vision-based tasks, setting new standards in visual math reasoning and document-based question answering.

Grok-2 Win Rate

The Grok-2 models are evaluated across a series of academic benchmarks, including reasoning, reading comprehension, math, science, and coding. Both Grok-2 and Grok-2 mini exhibit significant improvements over the previous Grok-1.5 model. These models achieve performance levels that are competitive with other frontier models in areas such as graduate-level science knowledge (GPQA), general knowledge (MMLU, MMLU-Pro), and math competition problems (MATH). Furthermore, Grok-2 excels in vision-based tasks, delivering state-of-the-art performance in visual math reasoning (MathVista) and document-based question answering (DocVQA).

BenchmarkGrok-2Gemini Pro 1.5Llama 3 405BGPT-4oClaude 3.5 Sonnet
GPQA56.0%46.2%51.1%53.6%59.6%
MMLU87.5%85.9%88.6%88.7%88.3%
MMLU-Pro75.5%69.0%73.3%72.6%76.1%
MATH§76.1%67.7%73.8%76.6%71.1%
HumanEval¶88.4%71.9%89.0%90.2%92.0%
MMMU66.1%62.2%64.5%69.1%68.3%
MathVista69.0%63.9%—63.8%67.7%
DocVQA93.6%93.1%92.2%92.8%95.2%

Real-Time Information Integration

Grok-2 Real-Time Information Integration

Grok-2 integrates real-time information from the đť•Ź platform, enhancing its ability to provide accurate and timely responses. This feature is particularly beneficial for users seeking up-to-date insights and solutions.

Enterprise API Access

Later this month, Grok-2 and Grok-2 mini will be accessible through x.ai's enterprise API, offering developers a robust platform for integrating advanced AI capabilities into their applications. The API promises low-latency access and enhanced security features, making it a valuable tool for businesses worldwide.

Future Developments

x.ai is committed to continuous improvement and innovation. The Grok-2 release marks a pivotal moment in AI development, with plans to expand its capabilities further. Users can expect ongoing enhancements and new features that will push the boundaries of what AI can achieve.

Grok-2 is a testament to x.ai's dedication to advancing AI technology, providing users with a powerful tool for a wide range of applications, from everyday tasks to complex problem-solving scenarios.


† GPT-4-Turbo and GPT-4o scores are from the May 2024 release.

†† Claude 3 Opus and Claude 3.5 Sonnet scores are from the June 2024 release.

‡ Grok-2 MMLU, MMLU-Pro, MMMU and MathVista were evaluated using 0-shot CoT.

§ For MATH, we present maj@1 results.

¶ For HumanEval, we report pass@1 benchmark scores.

More terms

What is propositional calculus?

Propositional calculus, also known as propositional logic, statement logic, sentential calculus, or sentential logic, is a branch of logic that deals with propositions and the relationships between them.

Read more

What is cognitive science?

Cognitive science is an interdisciplinary field that studies the mind and its processes. It draws on multiple disciplines such as psychology, artificial intelligence, linguistics, philosophy, neuroscience, and anthropology. The field aims to understand and formulate the principles of intelligence, focusing on how the mind represents and manipulates knowledge.

Read more

It's time to build

Collaborate with your team on reliable Generative AI features.
Want expert guidance? Book a 1:1 onboarding session from your dashboard.

Start for free   →