Hallucination Rates for Major AI Models

Published by

Vectara analyzed major language models, testing the accuracy on 1,000 texts, and released the results. This evaluates how often an LLM introduces hallucinations when summarizing a document.

Another reason to hire a writer and fact-checker for your AI and avoid embarrassment like thisthisthis, or this.

Updated 11/1/23

ModelAccuracyHallucination RateAnswer Rate
GPT 497.0 %3.0 %100.0 %
GPT 4 Turbo97.0 %3.0 %100.0 %
GPT 3.5 Turbo96.5 %3.5 %99.6 %
Llama 2 70B94.9 %5.1 %99.9 %
Llama 2 7B94.4 %5.6 %99.6 %
Llama 2 13B94.1 %5.9 %99.8 %
Cohere-Chat92.5 %7.5 %98.0 %
Cohere91.5 %8.5 %99.8 %
Anthropic Claude 291.5 %8.5 %99.3 %
Mistral 7B90.6 %9.4 %98.7 %
Google Palm87.9 %12.1 %92.4 %
Google Palm-Chat72.8 %27.2 %88.8 %

H/T The Rundown AI