Google released Gemini 1.0 on 6 December 2023 with three variants: Gemini Pro, Ultra, and Nano.

Gemini Ultra outperforms GPT-4 on 30 out of 32 standard benchmarks. It’s the best in the world at coding, and the first to perform better than a human expert on MMLU. It also supports Audio and Video input on top of Image and Text input.

Here, you will find accurate information about all Gemini models, its capabilities, and how it compares with other AIs like GPT-4, ChatGPT 3.5, Claude with official sources.

Capabilities

Note: While this is the official demo by Google, but they have edited to be faster and also added some pre-prompts which are not shown. This post has the full prompts.

Google Gemini is a powerful and versatile AI model with a range of capabilities.

  1. Multimodal understanding: Gemini can understand, operate across, and combine different types of information, including text, code, audio, images, and video.
  2. Advanced reasoning: Gemini can distinguish between relevant and irrelevant information in large datasets, such as scientific papers.
  3. Improved code generation: Gemini is more efficient and effective in generating code, with Google’s AlphaCode 2 system performing better than 85% up from 50% for the original AlphaCode.
  4. Powering various AI services: Gemini Pro powers Google’s chatbot Bard, while Gemini Nano is designed for specific tasks and mobile devices.
  5. Enhancing user experience: Gemini will be integrated into Google Search, Chrome, Duet AI, and Ads. Early tests show Gemini reducing Search Generative Experience (SGE) latency by 40%.

Gemini vs GPT-4

Google-gemini-vs-chatgpt-4 benchmarks chart
  • Gemini Ultra has surpassed GPT-4 in reasoning, math, and code-related text-based benchmarks.
  • In the MMLU benchmark, Gemini Ultra achieved a score of 90.0%. This not only surpasses GPT-4’s score of 86.4% but also marks the first time a model has exceeded human expert performance in this benchmark.
  • MMLU tests a combination of subjects to assess world knowledge and problem-solving abilities.
  • In image, video, and audio tests, Gemini Ultra beats GPT-4 by achieving state-of-the-art results on various few-shot video captioning tasks as well as zero-shot video question answering tasks.
  • Gemini Ultra performed well without needing OCR systems to process images, indicating advanced inherent capabilities.
TaskGemini UltraGemini ProFew-shot State of the Art
VATEX (test)
English video captioning
62.757.456.0
DeepMind Flamingo, 4-shots
VATEX ZH (test)
Chinese video captioning
51.350.0—
YouCook2 (val)
English cooking video captioning
135.4123.274.5
DeepMind Flamingo, 4-shots
NextQA (test)
Video question answering
29.928.026.7
DeepMind Flamingo, 0-shot
ActivityNet-QA (test)
Video question answering
52.249.845.3
Video-LLAVA, 0-shot
Perception Test MCQA (test)
Video question answering
54.751.146.3
SeViLA (Yu et al., 2023), 0-shot

Gemini vs ChatGPT 3.5

Bard uses Gemini Pro which offers more advanced reasoning, planning, and writing surpassing GPT 3.5

In blind evaluations with our third-party raters, Bard is now the most preferred free chatbot compared to leading alternatives.  

Source: Google
  • “Bard with Gemini Pro” is a tuned version of Gemini Pro with enhanced reasoning, planning, and writing capabilities.
  • It exceeds GPT 3.5 in six out of eight benchmarks, including MMLU and GSM8K.
  • Bard with Gemini Pro is claimed to have made the single biggest quality improvement since Bard’s launch.

Gemini vs Palm 2

The instruction-tuned Gemini Pro models have shown significant advancements across various capabilities when compared to the PaLM 2 model API.

  • In creative writing tasks, Gemini Pro outperformed PaLM 2 65.0% of the time.
  • For following instructions, Gemini Pro’s win-rate was 59.2%.
  • Notably, Gemini Pro achieved a 68.5% win-rate for providing safer responses.

Gemini Benchmarks vs Popular LLMs

TaskGemini UltraGemini ProGPT-4GPT-3.5PaLM 2-LClaude 2Inflection-2Grok 1LLAMA-2
MMLU (Multiple-choice questions in 57 subjects)90.04%79.13%87.29%70%78.4%78.5%79.6%73.0%68.0%
GSM8K (Grade-school math)94.4%86.5%92.0%57.1%80.0%88.0%81.4%62.9%56.8%
MATH (Math problems across 5 difficulty levels & 7 subdisciplines)53.2%32.6%52.9%50.3%34.1%34.4%34.8%23.9%13.5%
BIG-Bench-Hard (Subset of hard BIG-bench tasks)83.6%75.0%83.1%66.6%77.7%———51.2%
HumanEval (Python coding tasks)74.4%67.7%67.0%48.1%—70.0%44.5%63.2%29.9%
Natural2Code (Python code generation)74.9%69.6%73.9%62.3%—————
DROP (Reading comprehension & arithmetic)82.474.180.964.182.0————
HellaSwag (validation set)87.8%84.7%95.3%85.5%86.8%—89.0%—80.0%
WMT23 (Machine translation)74.471.773.8—72.7————
Source: Gemini technical report

Gemini Pro vs Gemini Ultra vs Gemini Nano

FeatureGemini ProGemini UltraGemini Nano
SizeBest for scaling across a wide range of tasks requiring multimodality.Largest and most capable model for highly complex tasks requiring advanced reasoningMost efficient model for on-device tasks
Multimodal CapabilitiesYesYesYes
AvailabilityAvailable nowComing early next yearAvailable for Android developers
Use CaseWide range of multimodal tasksHighly complex tasksOn-device tasks
Benchmark PerformanceSurpasses GPT-4 in some areasExceeds the capability of all existing AI modelsNot specified

How to Access Gemini?

  1. Bard with Gemini Pro is becoming available in English in 170 countries and territories. The UK and Europe will get access soon.
  2. Bard Advanced is a new service that will offer early access to the most sophisticated Gemini models, including Gemini Ultra.
  3. Access for Developers and Businesses: Starting December 13, developers and business customers can use Gemini Pro. They can access it through the Gemini API in two Google platforms: AI Studio and Cloud Vertex AI.
  4. Release of Gemini Ultra: Gemini Ultra, the most advanced version, will be available for developers and business customers in early next year.
  5. Gemini Nano for Android Developers: Android developers can use Gemini Nano, which is tailored for specific tasks and mobile devices.

Closing Thoughts

With Gemini, Google has finally introduced a worthy rival to OpenAI’s GPT-4.

Gemini Ultra consistently outperforms other models in various benchmarks, particularly in tasks involving language understanding, video captioning, and question answering. This showcases its advanced capabilities in handling multimodal data and complex reasoning tasks.

Gemini Pro, while slightly trailing behind Gemini Ultra, also demonstrates robust performance, often surpassing Few-shot State-of-the-Art (SoTA) models. Its strengths are particularly evident in video-related tasks, underscoring Google’s strides in multimodal AI research.