Anthropic Claude 3.5 Sonnet

by Stephen M. Walker II, Co-Founder / CEO

Top tip

If you're already using Claude for AI tasks, you'll find Claude 3.5 Sonnet to be a significant upgrade, offering superior performance, cost efficiency, and a wide range of capabilities that make it suitable for various complex tasks.

Anthropic has unveiled Claude 3.5 Sonnet, a groundbreaking AI model that sets new industry standards across expert knowledge, reasoning capabilities, and coding proficiency. Amazon Bedrock, Claude.ai, and Google Cloud's Vertex AI now offer access to this advanced model.

  • Performance and Cost-Efficiency: Claude 3.5 Sonnet operates twice as fast as its predecessor, Claude 3 Opus, while costing 80% less. It delivers superior intelligence at a fraction of the price, enabling businesses to leverage high-performance AI solutions without straining their budgets.

  • Benchmark Performance: Claude 3.5 Sonnet outperforms leading AI models like OpenAI's GPT-4o and Google's Gemini 1.5 Pro in most benchmark categories. It excels in undergraduate-level expert knowledge (MMLU), graduate-level expert reasoning (GPQA), and coding proficiency (HumanEval), setting new standards for AI capabilities.

  • Multimodal Capabilities: The model significantly improves visual processing and understanding. It accurately interprets charts and graphs and effectively transcribes text from imperfect images. These capabilities particularly benefit industries like retail, logistics, and financial services, where visual data processing plays a crucial role.

  • Writing and Content Generation: Claude 3.5 Sonnet demonstrates an enhanced understanding of nuance and humor. It produces high-quality written content with a natural, human-like tone. The model excels in creative writing, generating engaging and compelling content across various genres and styles.

  • Customer Support and Workflow Management: The model efficiently handles intricate customer inquiries and effectively orchestrates multi-step workflows. It improves customer satisfaction, reduces response times, and enhances overall support processes. Claude 3.5 Sonnet automates and streamlines customer interactions, providing seamless end-user experiences.

  • Coding and Software Development: Claude 3.5 Sonnet independently writes, edits, and executes code with sophisticated reasoning and troubleshooting capabilities. It streamlines developer workflows, accelerates coding tasks, and significantly reduces manual effort in software development processes.

  • Data Science and Analytics: The model augments human expertise in data science by effectively navigating unstructured data. It generates high-quality statistical visualizations and actionable predictions. Claude 3.5 Sonnet simplifies data analysis workflows and drives data-driven decision-making across organizations.

Klu Anthropic Claude 3.5 Sonnet Model

Claude 3.5 Sonnet marks a significant advancement in AI technology. It offers superior performance, cost efficiency, and a wide range of capabilities suitable for various complex tasks. This introduction propels the AI landscape forward, equipping businesses and developers with powerful tools to enhance operations and drive innovation effectively.

Long Context Performance

To further assess Claude 3.5 Sonnet's capabilities in handling long contexts, a "needle in a haystack" evaluation was conducted. This test involves inserting a small, crucial piece of information (the "needle") within a large amount of irrelevant text (the "haystack") and evaluating the model's ability to locate and utilize this information accurately.

The Claude 3.5 Sonnet model's long context accuracy varies across different context lengths and depths, with performance peaking at mid-range depths (36%-68%) for most lengths. Scores range from 1 to 10, with higher consistency and stability in performance observed at context lengths up to 126,500, while variability increases significantly at lengths beyond 151,000.

Claude 3.5 Sonnet Needle in a Haystack Evaluation

Availability

Claude 3.5 Sonnet is accessible for free on Claude.ai and the Claude iOS app, with higher rate limits for Claude Pro and Team plan subscribers. It is also available via the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI.

Klu Claude AI Logic

Accessible in 159 countries via claude.ai and its API, the Claude 3 models, including Opus and Sonnet, have already begun to impact the market, with plans to introduce Haiku soon. Anthropic claims models stand out in performing live customer chats, auto-completions, and data extraction tasks, demonstrating superior capabilities in benchmark tests when compared to competitors like OpenAI's GPT-4. Early tests leave some room for skepticism.

Frontier Intelligence at 2x the Speed

Claude 3.5 Sonnet pushes industry standards by setting new benchmarks in GPQA (graduate-level reasoning), MMLU (undergraduate-level knowledge), and HumanEval (coding proficiency). The model demonstrates significant advancements in comprehending nuance, humor, and complex instructions. It also produces high-quality content with a natural, relatable tone that resonates with readers.

Operating at twice the speed of its predecessor, Claude 3 Opus, Claude 3.5 Sonnet delivers a substantial performance boost. Its combination of cost-effectiveness and speed makes it the ideal choice for tackling complex tasks such as context-sensitive customer support and intricate workflow orchestration.

Our internal coding evaluation revealed Claude 3.5 Sonnet's impressive capabilities. The model successfully solved 64% of problems, significantly outperforming Claude 3 Opus, which solved 38%. This evaluation assesses a model's ability to enhance an open-source codebase based on natural language descriptions. Claude 3.5 Sonnet excels in independently writing, editing, and running code, making it an invaluable tool for updating legacy applications and migrating codebases.

Klu Claude 3 Vision

In comparison to leading models from OpenAI and Google, Claude 3 Opus sets a new standard for conversational AI. It demonstrates superior performance in undergraduate and graduate knowledge, as well as grade school math. The Claude 3 Sonnet model further impresses with its ability to interpret scientific diagrams, highlighting its potential for enterprise operations and data analysis.

TaskClaude 3.5 SonnetGPT-4oGemini 1.5 Pro
Graduate level reasoning
GPQA, Diamond 1
59.4%* 0-shot CoT53.6% 0-shot CoT-
Undergraduate level knowledge
MMLU 2
88.7%** 5-shot88.7% 0-shot CoT85.9% 5-shot
Code
HumanEval
92.0% 0-shot90.2% 0-shot84.1% 0-shot
Multilingual math
MGSM
91.6% 0-shot CoT90.5% 0-shot CoT87.5% 8-shot
Reasoning over text
DROP, F1 score
87.1% 3-shot83.4% 3-shot74.9% Variable shots
Mixed evaluations
BIG-Bench-Hard
93.1% 3-shot CoT-89.2% 3-shot CoT
Math problem-solving
MATH
71.1% 0-shot CoT76.6% 0-shot CoT67.7% 4-shot
Grade school math
GSM8K
96.4% 0-shot CoT-90.8% 11-shot

Claude 3.5 Sonnet is Anthropic's most advanced vision model, outperforming Claude 3 Opus in visual benchmarks. It excels in visual reasoning tasks like interpreting charts and graphs and can accurately transcribe text from imperfect images. This makes it ideal for industries such as retail, logistics, and financial services. Anthropic prioritizes AI safety, working to reduce bias and ensure neutrality, making Claude 3.5 Sonnet a top choice for both enterprise and consumer applications.

Klu Claude 3 Vision
TaskClaude 3.5 SonnetGPT-4oGemini 1.5 Pro
Visual math reasoning
MathVista (testmini)
67.7% 0-shot CoT63.8% 0-shot CoT63.9% 0-shot CoT
Science diagrams
AI2D, test
94.7% 0-shot94.2% 0-shot94.4% 0-shot
Visual question answering
MMMU (val)
68.3% 0-shot CoT69.1% 0-shot CoT62.2% 0-shot CoT
Chart Q&A Relaxed accuracy (test)90.8% 0-shot CoT85.7% 0-shot CoT87.2% 0-shot CoT
Document visual Q&A ANLS score, test95.2% 0-shot92.8% 0-shot93.1% 0-shot

Claude 3 Model Series

The Claude 3 models, especially Opus, demonstrate exceptional performance in key AI evaluation benchmarks such as MMLU, GPQA, and GSM8K. These models exhibit near-human levels of comprehension and fluency when tackling complex tasks. They also show significant improvements in critical areas including analysis, forecasting, content creation, code generation, and multilingual communication.

Anthropic has touted the economic potential of Claude 3, particularly the Opus model, highlighting its capabilities as an economic analyst. This suggests its potential utility in specialized professional domains. However, our own testing and comparisons against GPT-4 Turbo have not yielded substantial evidence to support these claims. Despite this, the Claude 3 models are poised to accelerate the adoption of generative AI applications. Over 10,000 organizations already utilize Amazon Bedrock for such applications, and the introduction of Claude 3 models is expected to further drive this trend.

Accuracy

Klu Claude 3 Hard Questions

Claude 3 significantly improves on accuracy, addressing previous issues with excessive creativity. Opus demonstrates a twofold increase in correct answers for complex, factual questions. Anthropic plans to introduce citations to support answer verification, further enhancing reliability.

Context length up to 1 million tokens

Klu Claude 3 Retrieval Recall

Claude 3 models initially offer a 200K context window, with plans to expand to over 1 million tokens for select customers. Opus achieves near-perfect recall in the 'Needle In A Haystack' evaluation, showcasing robust information retrieval capabilities.

Speed

The Claude 3 series delivers impressive speed improvements:

  • Haiku: Fastest and most cost-effective for its intelligence level
  • Sonnet: Doubles the speed of predecessors while increasing intelligence
  • Opus: Matches previous speeds but with significantly enhanced intelligence

These models excel in live chats, auto-completions, and data extraction tasks.

Vision

Claude 3 models feature advanced vision capabilities, processing various visual formats. Anthropic offers this functionality to enterprise customers with visually encoded knowledge bases.

Instruction Following, Reduced Refusals & AI Safety

Claude 3 models excel at following complex instructions and producing structured outputs like JSON. They significantly reduce unnecessary refusals compared to Claude 2, demonstrating improved understanding of prompts.

Klu Claude 3 Refusals vs Claude 2

Anthropic prioritizes responsible AI development in Claude 3:

  • Currently operates at AI Safety Level 2
  • Emphasizes neutrality and bias mitigation
  • Achieves lower bias rates compared to previous versions
  • Handles sensitive prompts more effectively

Planned updates

Anthropic aims to expand Claude 3's capabilities:

  • Extend context window up to 1 million tokens for select customers
  • Introduce Tool Use (function calling)
  • Implement interactive coding (REPL)
  • Develop advanced agentic capabilities

The company plans frequent updates to enhance functionalities, particularly for enterprise applications and large-scale deployments.

Claude 3 models integrate seamlessly with major platforms like Amazon Bedrock and Google Cloud's Vertex AI, facilitating widespread adoption across industries.

Pricing

ModelDescriptionInput Price (per million tokens)Output Price (per million tokens)
OpusMost intelligent, ideal for complex tasks$15$75
SonnetBalances intelligence and speed for enterprise workloads$3$15
HaikuFastest response for simple queries$0.25$1.25

All three Claude 3 models are now available. Sonnet 3.5 is currently accessible, while Opus 3.5 and Haiku 3.5 are scheduled for release later this year. Sonnet powers the free experience on claude.ai, and the current Opus version is available to Claude Pro subscribers. These models can also be accessed through Amazon Bedrock and Google Cloud's Vertex AI Model Garden.

Anthropic commits to frequent updates for the Claude 3 family, focusing on enterprise features and large-scale deployment capabilities. The company actively seeks feedback to enhance Claude's utility while aligning AI development with positive societal outcomes.

Footnotes

  1. Claude 3.5 Sonnet scores 67.2% on 5-shot CoT GPQA with maj@32

  2. Claude 3.5 Sonnet scores 90.4% on MMLU with 5-shot CoT prompting

More terms

Statistical Classification

Statistical classification is a method of machine learning that is used to predict the probability of a given data point belonging to a particular class. It is a supervised learning technique, which means that it requires a training dataset of known labels in order to learn the mapping between data points and class labels. Once the model has been trained, it can then be used to make predictions on new data points.

Read more

What is the theory of computation?

The theory of computation is a fundamental branch of computer science and mathematics. It investigates the limits of computation and problem-solving capabilities through algorithms. This theory utilizes computational models such as Turing machines, recursive functions, and finite-state automata to comprehend these boundaries and opportunities.

Read more

It's time to build

Collaborate with your team on reliable Generative AI features.
Want expert guidance? Book a 1:1 onboarding session from your dashboard.

Start for free