Skip to content

Fallback trim history by configured context window#572

Open
Y1fe1Zh0u wants to merge 3 commits into
dataelement:mainfrom
Y1fe1Zh0u:feat/history-token-budget-fallback-80
Open

Fallback trim history by configured context window#572
Y1fe1Zh0u wants to merge 3 commits into
dataelement:mainfrom
Y1fe1Zh0u:feat/history-token-budget-fallback-80

Conversation

@Y1fe1Zh0u

Copy link
Copy Markdown
Collaborator

Summary

  • add backend model context_window_tokens config for deriving an 80% history token budget
  • trim old API-safe history blocks by token budget while preserving tool-call/tool-result atomicity
  • apply the fallback in websocket and Feishu LLM request paths, with message-count truncation retained when no model window is configured

Notes

Tests

  • python -m pytest tests/test_history_window.py
  • python -m py_compile app/services/history_window.py app/models/llm.py app/schemas/schemas.py app/api/websocket.py app/api/feishu.py app/api/enterprise.py
  • git diff --check
  • python -m alembic heads
TatsuKo-Tsukimi and others added 3 commits May 12, 2026 16:08
Single-agent chat history is currently sliced by message count, which can split assistant tool_calls from their required tool result messages. This refreshes PR dataelement#487 on current upstream/main by introducing a pair-aware history window helper and routing web chat plus Feishu history truncation through it.

Constraint: Provider APIs require every role=tool message to match a preceding assistant tool_call in the same request.

Rejected: Keep only the existing head-pop guard | it only fixes orphan tool messages at the first position and misses cuts inside multi-message tool-call blocks.

Confidence: high

Scope-risk: narrow

Directive: Future token-budget truncation must reuse the pair-aware walker instead of slicing conversation lists directly.

Tested: /Users/zhou/Code/clawith/backend/.venv/bin/python -m pytest tests/test_history_window.py from /tmp/clawith-pr487-work/backend, 16 passed

Not-tested: Full backend suite and live web/Feishu chat flows
History truncation must not only avoid orphan tool results; it also has to reject assistant tool-call blocks whose declared results are incomplete. This keeps the message-count window aligned with provider API invariants without inventing synthetic tool outputs for older history.

Constraint: OpenAI-compatible APIs require assistant tool calls and tool results to remain paired.

Rejected: Inject synthetic missing tool results | safe for API repair in some runtimes, but can pollute weak-model context during old-history truncation.

Confidence: high

Scope-risk: narrow

Directive: Keep truncation loss by dropping whole invalid tool blocks; only add synthetic tool results in a separate active-tool-recovery path.

Tested: /Users/zhou/Code/clawith/backend/.venv/bin/python -m pytest tests/test_history_window.py

Tested: git diff --check

Tested: py_compile app/services/history_window.py tests/test_history_window.py
Weak or self-hosted models can have large advertised context windows while still failing on overlong raw history, so the fallback trims old history blocks once a model-level input window is configured. The runtime derives an 80% budget from llm_models.context_window_tokens and preserves the existing message-count truncation when the field is absent.

Constraint: Context window must come from backend model configuration instead of provider or model-name hardcoding

Rejected: Hardcode provider/model context windows | custom and self-hosted models drift too often and would make the fallback misleading

Confidence: high

Scope-risk: moderate

Directive: Keep this as a drop-only fallback; do not add synthetic tool results or summary compaction here

Tested: python -m pytest tests/test_history_window.py

Tested: python -m py_compile app/services/history_window.py app/models/llm.py app/schemas/schemas.py app/api/websocket.py app/api/feishu.py app/api/enterprise.py

Tested: git diff --check

Tested: python -m alembic heads
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants