Google Gemini Pro 1.5

May 15, 2023

by Stephen M. Walker II, Co-Founder / CEO

Google Gemini Pro 1.5: A Quantum Leap in AI Capabilities

Google's Gemini Pro 1.5, along with its specialized variant Gemini 1.5 Flash, represents a significant advancement in the field of artificial intelligence. Through major releases in February and May 2024, this next-generation model has demonstrated remarkable improvements in performance, efficiency, and applicability across various domains.

Key Advancements and Features

Google Gemini Pro 1.5 represents a quantum leap in AI capabilities, introducing groundbreaking advancements across model architecture, multimodal processing, and ethical considerations. Its innovations significantly outperform predecessors and competitors, marking a pivotal advancement in artificial intelligence.

Long Context Performance

To further assess Gemini 1.5 Flash's capabilities in handling long contexts, a "needle in a haystack" evaluation was conducted. This test involves inserting a small, crucial piece of information (the "needle") within a large amount of irrelevant text (the "haystack") and evaluating the model's ability to locate and utilize this information accurately.

To evaluate Gemini 1.5 Flash's long-context capabilities, we created a specialized needle-haystack dataset. We used a recently published book, "Nuclear War" by Annie Jacobsen, replacing all proper nouns with science fiction alternatives generated by an LLM. The haystack consisted of random 10kb text chunks, with critical 4-sentence "needles" inserted at various points.

Gemini 1.5 Flash Needle in Haystack Evaluation (1M tokens)

The Gemini 1.5 Flash model's performance with a 1M context window shows significant variability across different context lengths and depths. High scores (10) are frequently achieved at various depths, especially at shorter context lengths (20,000 to 265,000) and again at the longest length (1,000,000). However, there are notable performance drops, particularly at certain depths within specific context lengths. After several attempts we witnessed a range of both low and high scores, making it difficult to draw a complete conclusion. Next, we look at the first 20k context.

Gemini Needle in a Haystack Evaluation for 20k Tokens

The Gemini 1.5 Flash model exhibits unexpected performance inconsistencies when handling relatively short context lengths between 0 and 20,000 tokens. This observation is particularly noteworthy as shorter contexts are typically easier to manage, yet the model's accuracy fluctuates considerably across different depths within this range. These issues could potentially impact the model's reliability in tasks involving shorter documents or information snippets, warranting further investigation into its behavior with brief inputs.

10 Question Q&A

Our tests revealed that while the model performed well in many scenarios, it exhibited significant performance drops at certain context lengths and depths.

Notably, we observed catastrophic forgetfulness when dealing with 120k context length and multiple inserted facts, highlighting potential limitations in the model's long-context processing abilities.

Enhanced Model Architecture and Performance

Gemini Pro 1.5 introduces a more sophisticated transformer-based architecture, incorporating improvements in attention mechanisms and model scaling techniques. This has resulted in substantial performance gains:

15% increase in GLUE benchmark scores for natural language understanding
22% improvement in GSM8K benchmark for mathematical reasoning
18% increase in HumanEval pass rate for code generation
25% improvement in OK-VQA benchmark for multimodal tasks

Breakthrough in Long-Context Understanding

One of the most notable advancements is the expansion of the context window to 1 million tokens, with successful tests up to 10 million tokens. This enables the model to process, analyze, and summarize vast amounts of content within a given prompt, including:

Up to 1 hour of video
11 hours of audio
Codebases with over 30,000 lines of code
Over 700,000 words of text

While these advancements are promising, we maintain a cautious perspective on Google Gemini 1.5's real-world deployment and performance until further independent verification and widespread adoption provide more conclusive evidence of its capabilities.

Advanced Multimodal Capabilities

Gemini Pro 1.5 excels in seamlessly analyzing and reasoning across different modalities, including text, video, audio, and code. It can effectively reason about conversations, events, and details found in extensive documents or analyze complex multimedia content.

Specialized Variants and Domain-Specific Optimization

The May 2024 release introduced domain-specific versions of Gemini Pro 1.5, optimized for fields such as healthcare, finance, and scientific research. This allows for more targeted and efficient application in specialized domains.

Ethical AI and Multilingual Improvements

Enhanced safeguards against biases and improved alignment with human values have been implemented. Additionally, the model now offers expanded support for low-resource languages and improved translation quality across language pairs.

Gemini 1.5 Flash: Speed and Efficiency

Alongside the main release, Google introduced Gemini 1.5 Flash, a specialized variant designed for:

Ultra-fast inference, reducing latency in real-time applications
Optimized performance on less powerful hardware, suitable for edge computing and mobile devices
Seamless API integration with existing software ecosystems
More flexible fine-tuning options for specific use cases

Availability and Future Developments

Currently, Gemini Pro 1.5 is available via private preview to developers and enterprise customers through AI Studio and Vertex AI. Google plans to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens as the model is improved.

Gemini 1.5 Pro Pricing

Gemini 1.5 Pro is now available with tiered pricing based on context window size, from 128,000 to 1 million tokens. Input costs range from $3.50 to $7.00 per million tokens, while output costs $10.50 to $21.00 per million tokens. Context caching incurs additional fees. Free tier usage is limited to 2 requests per minute, 32,000 tokens per minute, and 50 requests per day. Paid tier allows 360 requests per minute, 2 million tokens per minute, and 10,000 requests per day. Regional restrictions apply for free tier usage in EEA, UK, and Switzerland.

Feature	Free of charge	Pay-as-you-go (prices in USD)
Rate Limits	2 RPM (requests per minute) 32,000 TPM (tokens per minute) 50 RPD (requests per day)	360 RPM (requests per minute) 2 million TPM (tokens per minute) 10,000 RPD (requests per day)
Price (input)	Free of charge	$3.50 / 1 million tokens (for prompts up to 128K tokens) $7.00 / 1 million tokens (for prompts longer than 128K)
Context caching	Not applicable	$0.875 / 1 million tokens (for prompts up to 128K tokens) $1.75 / 1 million tokens (for prompts longer than 128K) $4.50 / 1 million tokens per hour (storage)
Price (output)	Free of charge	$10.50 / 1 million tokens (for prompts up to 128K tokens) $21.00 / 1 million tokens (for prompts longer than 128K)
Prompts/responses used to improve our products	Yes	No

Gemini 1.5 Flash Pricing

Gemini 1.5 Flash is now generally available with competitive pricing and usage limits. The free tier offers 15 requests per minute (RPM), 1 million tokens per minute (TPM), and 1,500 requests per day (RPD). Pay-as-you-go users benefit from increased limits of 1000 RPM and 2 million TPM. Input pricing starts at $0.35 per million tokens for prompts up to 128K, increasing to $0.70 for longer prompts. Output costs $1.05 per million tokens (up to 128K) and $2.10 for longer outputs. Context caching is available at $0.0875 per million tokens (up to 128K) and $1.00 per million tokens per hour for storage.

Feature	Free of charge	Pay-as-you-go (prices in USD)
Rate Limits	15 RPM (requests per minute) 1 million TPM (tokens per minute) 1,500 RPD (requests per day)	1000 RPM (requests per minute) 2 million TPM (tokens per minute)
Price (input)	Free of charge	$0.35 / 1 million tokens (for prompts up to 128K tokens) $0.70 / 1 million tokens (for prompts longer than 128K)
Context caching	Not applicable	$0.0875 / 1 million tokens (for prompts up to 128K tokens) $0.175 / 1 million tokens (for prompts longer than 128K) $1.00 / 1 million tokens per hour (storage)
Price (output)	Free of charge	$1.05 / 1 million tokens (for prompts up to 128K tokens) $2.10 / 1 million tokens (for prompts longer than 128K)
Prompts/responses used to improve our products	Yes	No

Conclusion

Google Gemini Pro 1.5 and Gemini 1.5 Flash mark significant AI advancements, processing up to 1 million tokens (tested to 10 million) for extensive data analysis in a single prompt. These models feature advanced multimodal processing across text, audio, video, and code. Gemini 1.5 Flash optimizes for edge computing and mobile applications. These improvements enable efficient large-scale document analysis, complex code review, and sophisticated multimedia content interpretation. As Google refines these models, they're poised to revolutionize industries like healthcare, finance, and scientific research, transforming AI interactions in professional and everyday contexts. This evolution promises enhanced efficiencies, deeper insights, and expanded capabilities across diverse applications, potentially reshaping numerous aspects of work and daily life.

Klu is remote-first and global

Follow us

Google Gemini Pro 1.5