This is methodologically flawed, as bytes only weakly correlate with tokens.

Unless you're sending identical requests, you can't expect the same token counts for any given of bytes, or that a slightly longer (but different) message will lead to more tokens than a slightly shorter one, or vice versa.

Bolwin3 hours ago | | | parent | | on: 47765052
> The numbers came from the same project and the same prompt across versions.

I'm pretty sure the tester checked. If the request format is the same (which it is, given it uses the same as Anthropic's stable public API) and the same prompt/messages then bytes will correlate pretty well.

marginalia_nu3 hours ago | | | parent | | on: 47771625
The prompt may be the same, but the project context would have have surely changed. User prompt itself is unlikely to be ~200KB.