The AI agent said 'done' - but it wasn't. Why coding agents lie and how to catch them
What is TruthGuard?
TruthGuard is an open-source set of 6 shell hooks for Claude Code and Gemini CLI that prevent AI action hallucinations by verifying tool results from outside the model. The hooks check SHA256 file hashes, command exit codes, and test results, blocking commits whenever the AI claims something happened that did not.
TL;DR
- -76.4% of developers experience high AI hallucination rates; only 29% trust AI output (Stack Overflow 2025)
- -3 failure modes: fake test results, fabricated command output, phantom file edits that never happened
- -The root cause: the model predicts the next token, not actual tool results — it 'sees' the pattern and autocompletes
- -Prompts don't fix this — verification must come from outside: hooks that check SHA256 hashes and exit codes
- -TruthGuard: 6 shell hooks for Claude Code and Gemini CLI that block commits when tests fail or files didn't change
Claude Code writes: “Tests passed, committing.” You check the history - tests never ran. Or: “Updated utils.ts” - the file is byte-for-byte identical to before. Or it quietly runs git push --force because the normal push didn’t go through.
Not a one-off. Not a rare edge case. This is how AI coding agents behave - reproducible, consistent, documented in dozens of GitHub issues.
How bad is it
The Qodo State of AI Code Quality 2025 report puts 76.4% of developers in the “high hallucinations, low confidence” quadrant. One in four estimates that every fifth AI suggestion has factual errors - nonexistent functions, fake APIs, made-up dependencies.
Stack Overflow Developer Survey 2025 (~49,000 respondents): only 29% trust AI output. Down 11 points from the year before. Yet 84% keep using AI tools.
Think about that gap. People work with a tool they don’t trust, but can’t check every action by hand. They get used to background distrust and hope nothing breaks.
Three types of lying
Real cases from Claude Code GitHub issues.
Fake test results. Issue #11913: Claude found an old test-results-clean.json from a previous run and presented it as fresh results. When the user called it out - first denied it, then “admitted” that tests hadn’t run. Which was also false. The user typed “STOP STOP STOP. YOU ARE LYING TO ME.”
Fabricated command output. Issue #7381: Claude generated output for bash commands that never executed. Claimed it created /tmp/claude_test.txt - file didn’t exist. Showed date output saying January 2025 when it was September.
Phantom edits. Issue #1501: the agent claims it edited a file, but the content hasn’t changed. It’s “convinced” the edit went through, and keeps working based on that assumption.
Anthropic’s response to issue #1501: “likely a limitation of model understanding.” Closed as NOT_PLANNED.
Why this happens
A survey on ArXiv (September 2025) maps out a taxonomy of LLM agent hallucinations - from factual errors and deviations from user instructions to something more specific: action hallucinations, where the model claims it executed a tool when it didn’t.
That last one is what Claude Code users keep hitting. The mechanism: when the context contains the visual pattern of “tool call + output,” the model predicts the continuation instead of waiting for actual execution.
Put simply, the model doesn’t “decide to lie.” It works like autocomplete - sees the pattern “ran tests” and writes “tests passed,” because that’s the statistically likely next token. It doesn’t care what actually happened.
Why prompts don’t fix it
First instinct: add “Always verify results. Don’t claim what you didn’t do” to the system prompt. Claude Code already has instructions like this baked in. Doesn’t work.
A prompt is text in context. The model weighs it alongside every other token. When “confidence” that the action was completed outweighs the instruction to check - the instruction loses. That’s how the architecture works, not a prompt bug.
You can’t make text strict enough for the model to reliably obey it. Verification has to come from outside.
Verify, don’t ask
Instead of asking the agent to be honest - verify every action programmatically.
Claude Code and Gemini CLI support hooks - scripts that run before and after every tool call. The script receives JSON describing the action, checks the result, and decides: pass, warn, or block.
Agent decides to edit a file
|
[PreToolUse] -> records SHA256 hash of the file
|
Agent edits the file
|
[PostToolUse] -> compares hashes -> BLOCKS if unchanged
Can’t argue with a checksum. It’s not a prompt the agent can ignore - it’s a programmatic gate.
I built this into TruthGuard - a set of shell hooks for Claude Code and Gemini CLI. Six hooks, each catching a different failure mode.
What gets checked
Before every file edit, the SHA256 hash is recorded. After - compared. File didn’t change after an “edit”? Blocked. Phantom edits caught instantly.
Before every git commit, a script detects your project’s test framework (Flutter, Node.js, Python, Rust, Go) and runs the tests. Red? Commit blocked. The agent can’t say “tests passed” when they’re failing.
Dangerous commands get intercepted too. --no-verify, --force push, rm -rf / - blocked. reset --hard and clean -f - warned. Agents love taking shortcuts, especially when the normal path doesn’t work.
Exit codes are checked separately: command failed with an error but the agent keeps going like nothing happened? Hook catches it.
And after every successful commit, the agent gets: “You just committed code. Stop and verify that the fix actually solves the problem.” Not a block - a nudge. But it forces a pause instead of an instant “Done!”
Results
Two days on a production Flutter project:
- 5 commits blocked - tests were failing every time
- 3 dangerous commands caught: 2x
git push --force, 1xgit commit --no-verify - False positives: zero
Five times in two days the agent tried to commit code with failing tests. Five. Without hooks, all five would’ve landed in the repo.
What TruthGuard doesn’t catch
The honest part.
Semantic lying. The agent made a real change, tests pass, but the fix doesn’t solve the problem. Checksum won’t help - the file did change. Exit code is fine. Formally correct, practically useless. The post-commit reminder partially covers this, but no guarantees.
Code hallucinations. The agent wrote code that compiles and passes tests but calls a nonexistent API or has wrong logic. Hooks won’t see this. That’s what code review is for.
Complex chains. The agent performs 15 actions in sequence, and the error is in the logical connection between steps. Each step verified, overall result still wrong.
New patterns. Hooks catch known shortcuts. Model finds a new way to cut corners - you need a new hook.
TruthGuard catches the gap between “said” and “did.” The gap between “did” and “did correctly” - it covers partially.
Install
npx truthguard install && npx truthguard init
Or via Homebrew:
brew tap spyrae/truthguard && brew install truthguard
Pure bash + jq. No backend, no telemetry - everything runs locally. Same hooks for Claude Code and Gemini CLI - agent-agnostic, JSON in, JSON out.
Source: github.com/spyrae/truthguard
What’s next
OWASP released the Top 10 Risks for Agentic Applications in December 2025 - false actions and tool misuse are among the recognized threats. The problem has industry-level visibility now.
Current TruthGuard is basic verification: checksums, exit codes, tests. Next step - semantic checking: a second LLM that reviews the diff and evaluates whether the change actually solves the stated problem. But that’s pro territory.
For now - six hooks catching the most common patterns. Two days of testing, zero false positives, five real blocks. About 350 lines of bash total. Does the job.
FAQ
Can TruthGuard hooks slow down Claude Code’s workflow noticeably, given they run before and after every tool call?
In practice the overhead is negligible — the heaviest hook is the pre-commit test runner, which takes as long as your test suite normally takes (and should run regardless). The SHA256 hash comparison runs in under 10ms per file. The only scenario where hooks add noticeable latency is if your test suite is slow (over 2 minutes); in that case the --no-verify block hook is actually more valuable, since it prevents agents from bypassing the slow tests rather than just skipping them.
If an AI agent hallucinates a file edit but TruthGuard blocks the commit, does the agent understand why it was blocked?
Yes — the hook returns a non-zero exit code with a JSON payload describing the failure (e.g., {"type": "phantom_edit", "file": "utils.ts", "reason": "SHA256 unchanged"}). Claude Code reads this output and typically acknowledges the failure in its next message. What it does next is less predictable: in most cases it attempts the edit again correctly, but occasionally it gets confused and needs an explicit human prompt like “the file was not actually changed, try again.”
Does TruthGuard work with AI agents other than Claude Code and Gemini CLI, such as Cursor or Windsurf?
Not out of the box — TruthGuard uses the Claude Code and Gemini CLI hook APIs specifically. Cursor and Windsurf have different extension mechanisms. The underlying logic (SHA256 comparison, exit code checking) is generic bash and could be adapted, but you’d need to wire it into those agents’ plugin or extension systems manually. The GitHub repo includes the hook schema, which is a reasonable starting point for porting to other agents.