Unless you're sending identical requests, you can't expect the same token counts for any given of bytes, or that a slightly longer (but different) message will lead to more tokens than a slightly shorter one, or vice versa.
I'm pretty sure the tester checked. If the request format is the same (which it is, given it uses the same as Anthropic's stable public API) and the same prompt/messages then bytes will correlate pretty well.
Claude code caches a big chunk of context (all messages of current session). While a lot of data is going through network, in ccaudit itself, 98% is context is from cache.
Granted, to view the actual system prompt used by claude, one can only inspect network request. Otherwise best guess is token use in first exchange with Claude.
https://github.com/simple10/agent-super-spy - llm proxy + http MiTM proxy + LLMetry + other goodies
https://github.com/simple10/agents-observe - fancier claude hooks dashboard
It started as a need to keep an eye on OpenClaw but is incredibly useful for really understanding any agent harness at the raw LLM request level.
and a missing cache-mark that will make skills & project-claude.md cachemiss every time too https://github.com/anthropics/claude-code/issues/47098
TLDR: for now launch using `CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1 claude "Hello"`
https://news.ycombinator.com/item?id=47754795