𝕏 Grok-2 Beta Release

Stephen M. Walker II · Co-Founder / CEO

Top tip

Grok-2 fine-tuning is now available, enabling Grok-2 customization for your specific use cases.

What is Grok-2?

Grok-2 represents a significant advancement in x.ai's language model offerings, building upon the capabilities of its predecessor, Grok-1.5. Released on the 𝕏 platform, Grok-2 and its smaller counterpart, Grok-2 mini, are designed to provide intelligent chat assistants with superior reasoning, chat, and coding functionalities.

Benchmark	Grok-1.5	Grok-2 mini‡	Grok-2‡
GPQA	35.9%	51.0%	56.0%
MMLU	81.3%	86.2%	87.5%
MMLU-Pro	51.0%	72.0%	75.5%
MATH§	50.6%	73.0%	76.1%
HumanEval¶	74.1%	85.7%	88.4%
MMMU	53.6%	63.2%	66.1%
MathVista	52.8%	68.1%	69.0%
DocVQA	85.6%	93.2%	93.6%

These models are currently in beta, with grok-2 mini available currently on x.com and both models released to Enterprise APIs in the coming weeks.

Performance Benchmarks

Grok-2 has been rigorously evaluated across various benchmarks, demonstrating its prowess in reasoning, reading comprehension, math, science, and coding. It outperforms previous models and competes effectively with other leading AI models.

Notably, Grok-2 excels in vision-based tasks, setting new standards in visual math reasoning and document-based question answering.

The Grok-2 models are evaluated across a series of academic benchmarks, including reasoning, reading comprehension, math, science, and coding. Both Grok-2 and Grok-2 mini exhibit significant improvements over the previous Grok-1.5 model. These models achieve performance levels that are competitive with other frontier models in areas such as graduate-level science knowledge (GPQA), general knowledge (MMLU, MMLU-Pro), and math competition problems (MATH). Furthermore, Grok-2 excels in vision-based tasks, delivering state-of-the-art performance in visual math reasoning (MathVista) and document-based question answering (DocVQA).

Benchmark	Grok-2	Gemini Pro 1.5	Llama 3 405B	GPT-4o	Claude 3.5 Sonnet
GPQA	56.0%	46.2%	51.1%	53.6%	59.6%
MMLU	87.5%	85.9%	88.6%	88.7%	88.3%
MMLU-Pro	75.5%	69.0%	73.3%	72.6%	76.1%
MATH§	76.1%	67.7%	73.8%	76.6%	71.1%
HumanEval¶	88.4%	71.9%	89.0%	90.2%	92.0%
MMMU	66.1%	62.2%	64.5%	69.1%	68.3%
MathVista	69.0%	63.9%	—	63.8%	67.7%
DocVQA	93.6%	93.1%	92.2%	92.8%	95.2%

Real-Time Information Integration

Grok-2 integrates real-time information from the 𝕏 platform, enhancing its ability to provide accurate and timely responses. This feature is particularly beneficial for users seeking up-to-date insights and solutions.

Enterprise API Access

Later this month, Grok-2 and Grok-2 mini will be accessible through x.ai's enterprise API, offering developers a robust platform for integrating advanced AI capabilities into their applications. The API promises low-latency access and enhanced security features, making it a valuable tool for businesses worldwide.

Future Developments

x.ai is committed to continuous improvement and innovation. The Grok-2 release marks a pivotal moment in AI development, with plans to expand its capabilities further. Users can expect ongoing enhancements and new features that will push the boundaries of what AI can achieve.

Grok-2 is a testament to x.ai's dedication to advancing AI technology, providing users with a powerful tool for a wide range of applications, from everyday tasks to complex problem-solving scenarios.

† GPT-4-Turbo and GPT-4o scores are from the May 2024 release.

†† Claude 3 Opus and Claude 3.5 Sonnet scores are from the June 2024 release.

‡ Grok-2 MMLU, MMLU-Pro, MMMU and MathVista were evaluated using 0-shot CoT.

§ For MATH, we present maj@1 results.

¶ For HumanEval, we report pass@1 benchmark scores.

More terms

Continue exploring the glossary.

Learn how teams define, measure, and improve LLM systems.

Glossary term

What is a vision processing unit (VPU)?

A Vision Processing Unit (VPU) is a specialized type of microprocessor designed specifically for accelerating computer vision tasks such as image and video processing, object detection, feature extraction, and machine learning inference. VPUs are designed to handle real-time, high-volume data streams efficiently and with low power consumption.

Read term

June 28, 2024

MMLU Pro Benchmark

MMLU Pro is an enhanced version of the original MMLU Benchmark, designed to provide a more comprehensive and challenging evaluation of large language models. It expands on the original 57 tasks with additional domains, more complex questions, and a focus on advanced reasoning and problem-solving skills. MMLU Pro aims to push the boundaries of AI evaluation, offering a more nuanced assessment of models' capabilities in areas such as advanced mathematics, specialized scientific fields, and intricate legal and ethical scenarios.

Read term

It's time to build

Collaborate with your team on reliable Generative AI features.
Want expert guidance? Book a 1:1 onboarding session from your dashboard.