đť•Ź Grok-2 Beta Release

by Stephen M. Walker II, Co-Founder / CEO

Top tip

Grok-2 fine-tuning is now available, enabling Grok-2 customization for your specific use cases.

What is Grok-2?

Grok-2 represents a significant advancement in x.ai's language model offerings, building upon the capabilities of its predecessor, Grok-1.5. Released on the đť•Ź platform, Grok-2 and its smaller counterpart, Grok-2 mini, are designed to provide intelligent chat assistants with superior reasoning, chat, and coding functionalities.

BenchmarkGrok-1.5Grok-2 mini‡Grok-2‡
GPQA35.9%51.0%56.0%
MMLU81.3%86.2%87.5%
MMLU-Pro51.0%72.0%75.5%
MATH§50.6%73.0%76.1%
HumanEval¶74.1%85.7%88.4%
MMMU53.6%63.2%66.1%
MathVista52.8%68.1%69.0%
DocVQA85.6%93.2%93.6%
Grok-2 Factuality Preference

These models are currently in beta, with grok-2 mini available currently on x.com and both models released to Enterprise APIs in the coming weeks.

Performance Benchmarks

Grok-2 has been rigorously evaluated across various benchmarks, demonstrating its prowess in reasoning, reading comprehension, math, science, and coding. It outperforms previous models and competes effectively with other leading AI models.

Notably, Grok-2 excels in vision-based tasks, setting new standards in visual math reasoning and document-based question answering.

Grok-2 Win Rate

The Grok-2 models are evaluated across a series of academic benchmarks, including reasoning, reading comprehension, math, science, and coding. Both Grok-2 and Grok-2 mini exhibit significant improvements over the previous Grok-1.5 model. These models achieve performance levels that are competitive with other frontier models in areas such as graduate-level science knowledge (GPQA), general knowledge (MMLU, MMLU-Pro), and math competition problems (MATH). Furthermore, Grok-2 excels in vision-based tasks, delivering state-of-the-art performance in visual math reasoning (MathVista) and document-based question answering (DocVQA).

BenchmarkGrok-2Gemini Pro 1.5Llama 3 405BGPT-4oClaude 3.5 Sonnet
GPQA56.0%46.2%51.1%53.6%59.6%
MMLU87.5%85.9%88.6%88.7%88.3%
MMLU-Pro75.5%69.0%73.3%72.6%76.1%
MATH§76.1%67.7%73.8%76.6%71.1%
HumanEval¶88.4%71.9%89.0%90.2%92.0%
MMMU66.1%62.2%64.5%69.1%68.3%
MathVista69.0%63.9%—63.8%67.7%
DocVQA93.6%93.1%92.2%92.8%95.2%

Real-Time Information Integration

Grok-2 Real-Time Information Integration

Grok-2 integrates real-time information from the đť•Ź platform, enhancing its ability to provide accurate and timely responses. This feature is particularly beneficial for users seeking up-to-date insights and solutions.

Enterprise API Access

Later this month, Grok-2 and Grok-2 mini will be accessible through x.ai's enterprise API, offering developers a robust platform for integrating advanced AI capabilities into their applications. The API promises low-latency access and enhanced security features, making it a valuable tool for businesses worldwide.

Future Developments

x.ai is committed to continuous improvement and innovation. The Grok-2 release marks a pivotal moment in AI development, with plans to expand its capabilities further. Users can expect ongoing enhancements and new features that will push the boundaries of what AI can achieve.

Grok-2 is a testament to x.ai's dedication to advancing AI technology, providing users with a powerful tool for a wide range of applications, from everyday tasks to complex problem-solving scenarios.


† GPT-4-Turbo and GPT-4o scores are from the May 2024 release.

†† Claude 3 Opus and Claude 3.5 Sonnet scores are from the June 2024 release.

‡ Grok-2 MMLU, MMLU-Pro, MMMU and MathVista were evaluated using 0-shot CoT.

§ For MATH, we present maj@1 results.

¶ For HumanEval, we report pass@1 benchmark scores.

More terms

What is neural machine translation?

Neural Machine Translation (NMT) is a state-of-the-art machine translation approach that uses artificial neural network techniques to predict the likelihood of a sequence of words. This can be a text fragment, a complete sentence, or even an entire document with the latest advances. NMT is a form of end-to-end learning that can be used to automatically produce translations.

Read more

How is AI used in metabolic network reconstruction and simulation?

Metabolic network reconstruction involves compiling information about all the biochemical reactions that occur in an organism to create a comprehensive map of its metabolic pathways. This process typically integrates data from various sources, such as genomic and biochemical databases.

Read more

It's time to build

Collaborate with your team on reliable Generative AI features.
Want expert guidance? Book a 1:1 onboarding session from your dashboard.

Start for free   →