Skip to content

Conversation

@bennetbo
Copy link
Member

@bennetbo bennetbo commented Oct 10, 2025

Previously we were guessing the context window size here:

fn get_max_tokens(name: &str) -> u64 {

This is inaccurate and must be updated manually. This PR ensures that we extract the context window size from the request in the same way that the Ollama CLI does when running ollama show <model-name> (Relevant code is here)

The format looks like this:

{
  "model_info": {
    "general.architecture": "llama",
    "llama.context_length": 132000
  }
}

Once this PR is merged we could technically remove the old code

fn get_max_tokens(name: &str) -> u64 {

I decided to keep it for now, as it is unclear if the necessary fields are available via the API on older Ollama versions.

Release Notes:

  • Fixed an issue where Ollama models would use the wrong context window size
@cla-bot cla-bot bot added the cla-signed The user has signed the Contributor License Agreement label Oct 10, 2025
@bennetbo bennetbo enabled auto-merge (squash) October 10, 2025 12:47
@bennetbo bennetbo merged commit 3d5ddcc into main Oct 10, 2025
25 checks passed
@bennetbo bennetbo deleted the ollama-context-length-via-api branch October 10, 2025 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The user has signed the Contributor License Agreement

2 participants