woman in white shirt sitting on chair

Misinformation & Hallucinations

OWASP LLM TOP 10

Dr. Fatemeh Kazemeyni

5/31/20263 min read

When engineers design traditional database backends, application errors typically present as explicit, loud stack traces: a 404 Not Found, a 500 Internal Server Error, or a syntax crash.

Large Language Models do not fail this way. Because they are probabilistic auto-regressive systems built to predict the next logically plausible token, an LLM when pushed beyond its knowledge limits will rarely stop and admit uncertainty. Instead, it creates mathematically confident, authoritative, yet entirely fabricated assertions.

In the OWASP Top 10 framework, this failure pattern is formalized under LLM09: Misinformation & Hallucinations. Unlike other critical risks that require an active adversarial threat actor, LLM09 is a structural hazard inherent to the mathematics of deep learning. Left unmanaged, it introduces severe compliance liabilities, financial penalties, and downstream software supply chain compromises.

What are Hallucinations and Misinformation?

LLM09: Misinformation occurs when an application outputs false, inaccurate, or deeply misleading content presented as authoritative fact. The primary engine behind misinformation is Hallucination, the phenomenon where a model fills data gaps or training blind spots using learned statistical patterns rather than verified truth.

This risk is heavily multiplied by a psychological phenomenon called Overreliance. Because LLM completions are highly fluent, perfectly structured, and convey a confident tone, human operators and downstream applications frequently accept AI outputs without verification, integrating corrupted data directly into critical operational decisions.

The Vectors of LLM09 Failure

Misinformation generally manifests in production systems across three major vectors:

1. Factual and Expert Fabrications

The model invents authoritative-sounding citations, legal precedents, or medical studies. In documented legal cases, lawyers have faced severe judicial sanctions after submitting operational briefs containing non-existent case law fabricated entirely by an online conversational interface.

2. Customer-Facing Compliance Failures

When public-facing customer service bots misrepresent corporate rules or pricing models, organizations face immediate regulatory and legal liabilities. In a landmark case against Air Canada, a passenger successfully sued the airline for financial damages after its customer support chatbot hallucinated a custom, non-existent bereavement refund policy that contradicted the airline's official web documentation.

3. The Package Hallucination Attack (The DevSecOps Threat)

This is an actively exploited supply-chain attack vector. When developers use AI coding assistants to write software, the LLM frequently hallucinates non-existent code libraries, import packages, or third-party wrappers.

[The Package Hallucination Exploit Pipeline]
1. LLM Assistant tells Developer: "To solve this, run 'pip install langchain-core-auth-utils'"
2. Threat Actor analyzes LLM telemetry or runs targeted probing to identify this common hallucination.
3. Threat Actor registers 'langchain-core-auth-utils' on public PyPI / npm registries.
4. Threat Actor seeds the repo with malicious backdoor payloads.
5. Developer runs the command blindly -> Build infrastructure compromised.

How to Prevent and Mitigate LLM09

Eliminating hallucinations entirely from a foundation model is mathematically impossible given current transformer architectures. Therefore, mitigation means moving away from prompt engineering patches (like adding "Be factual" to your system prompt) and focusing on hard programmatic validation layers.

1. Anchor Models Using Retrieval-Augmented Generation (RAG)

The most effective way to drop hallucination rates is to limit the model's creative freedom. Use a RAG architecture to fetch verified, deterministic text chunks from a private database or vector store based on the user's query. Explicitly instruct the model to answer only using the provided context, and to output an explicit, safe fallback phrase (such as "The requested information is unavailable in verified company records") if the context lacks the answer.

2. Implement Automated Output Cross-Verification

Deploy automated verification utilities directly inside your orchestration layer (e.g., using a LangChain or LlamaIndex pipeline) to check high-value tokens before they exit the system boundary.

  • For Code Generation: Run output strings through regex scripts or package registry lookups to verify that every recommended npm or pip package exists in a verified allowlist.

  • For Factual Extraction: Deploy cross-referencing validator nodes to check generated names, numbers, or SKUs against deterministic SQL tables.

3. Redesign the User Interface to Combat Overreliance

Don't style your AI outputs to look like an omniscient oracle.

  • Force the application UI to explicitly cite the exact source documents fetched during the RAG retrieval cycle, enabling rapid human-in-the-loop auditability.

  • Include dynamic visual indicators showing model confidence or flag responses where semantic alignment scores drop below a strict threshold.

Automated Testing with Open-Source Tools

Evaluating your application for output reliability requires shifting to semantic correctness unit testing frameworks.

1. Calculating RAG Faithfulness via Ragas

You can mathematically measure how much your model is hallucinating by integrating the open-source Ragas framework into your testing pipeline. It uses an evaluation LLM to calculate Faithfulness (checking if the output is derived strictly from the retrieved context) and Answer Relevance.

# A conceptual view of a Ragas automated validation step
from ragas.metrics import faithfulness
from ragas import evaluate

# If the model begins inventing details not present in your technical documentation,
# the score drops below the defined threshold, causing the CI/CD pipeline to fail the build.
results = evaluate(dataset, metrics=[faithfulness])
print(f"Model Faithfulness Score: {results['faithfulness']}")

2. Behavioral Testing with Promptfoo

Configure Promptfoo to pass highly complex, edge-case questions, or questions with false premises (e.g., "How do I configure a feature that doesn't exist?") to your staging API. By running these automated assertions, you can ensure your model gracefully declines to fabricate answers under stress.

CONTACT

security@aisecintelgroup.com

@ 2026 AISecIntel Group.

SUBSCRIBE

AISecIntel Group
Open Source Adversarial AI Defense