Token counting is incorrect, thinking tokens should be added to output

Spotted here for gemini-2.5-pro-preview-05-06: https://gist.github.com/simonw/87a59e7f5c12274d65e2ac053b0eacdb#token-usage

264 input, 104 output, {"promptTokensDetails": [{"modality": "TEXT", "tokenCount": 6}, {"modality": "IMAGE", "tokenCount": 258}], "thoughtsTokenCount": 989}

And then Aider wrote about the same problem: https://aider.chat/2025/05/07/gemini-cost.html

An investigation determined the primary cause was that the litellm package (used by aider for LLM API connections) was not properly including reasoning tokens in the token counts it reported.

Here's where LiteLLM fixed it: BerriAI/litellm@a7db0df

This note is very important:

Gemini-2.5-flash - support reasoning cost calc + return reasoning content BerriAI/litellm#10141 (comment)

Note that the Gemini API returns different usage metadata than Vertex AI. With the Gemini API, candidatesTokenCount includes thinking tokens, but on Vertex AI, candidatesTokenCount does not include thinking tokens.

But that doesn't fit what I'm seeing here, because I did NOT use Vertex but still got that response where thoughtsTokenCount did not get included in the candidatesTokenCount

For reference, here's our current code for that:

llm-gemini/llm_gemini.py

Lines 365 to 379 in 902519b

    
           def set_usage(self, response): 
        
               try: 
        
                   # Don't record the "content" key from that last candidate 
        
                   for candidate in response.response_json["candidates"]: 
        
                       candidate.pop("content", None) 
        
                   usage = response.response_json.pop("usageMetadata") 
        
                   input_tokens = usage.pop("promptTokenCount", None) 
        
                   output_tokens = usage.pop("candidatesTokenCount", None) 
        
                   usage.pop("totalTokenCount", None) 
        
                   if input_tokens is not None: 
        
                       response.set_usage( 
        
                           input=input_tokens, output=output_tokens, details=usage or None 
        
                       ) 
        
               except (IndexError, KeyError): 
        
                   pass

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token counting is incorrect, thinking tokens should be added to output #75

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	def set_usage(self, response):
	try:
	# Don't record the "content" key from that last candidate
	for candidate in response.response_json["candidates"]:
	candidate.pop("content", None)
	usage = response.response_json.pop("usageMetadata")
	input_tokens = usage.pop("promptTokenCount", None)
	output_tokens = usage.pop("candidatesTokenCount", None)
	usage.pop("totalTokenCount", None)
	if input_tokens is not None:
	response.set_usage(
	input=input_tokens, output=output_tokens, details=usage or None
	)
	except (IndexError, KeyError):
	pass

Token counting is incorrect, thinking tokens should be added to output #75

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions