Skip to content

[Bug]: Interrupted/duplicate agent turns persist orphan tool-results, bricking the conversation for all providers on replay #99

Description

@Mariomarquezt

Summary

An agent conversation can reach a state where tool-result messages are persisted without their preceding assistant tool_calls message ("orphan" tool results). Once that happens, replaying the conversation history fails for every provider — the chat request 400s on tool-call/result pairing — and the conversation becomes permanently unusable regardless of which model is selected.

This is not provider-specific: the built-in OpenAI driver hits it too, and the persistence path (server/ai/runtime/runner.ts) is unchanged core code.

Steps to reproduce

I was not able to pin an exact deterministic sequence — it surfaced during heavy agent use (building a full site) while switching models mid-conversation and with some interrupted / duplicate sends. The smallest conditions I believe trigger it:

  1. Run the site agent through a long, tool-heavy turn (many insertHtml/token tools).
  2. Interrupt or re-submit a prompt while a tool turn is mid-flight (or switch the model mid-turn).
  3. Continue the same conversation.

The resulting persisted state is deterministic and is the real evidence (below): the conversation ends up with tool-result rows that have no matching assistant tool_calls row before them.

Expected behavior

A turn that errors, is interrupted, or is double-submitted should never leave the conversation in a state that bricks all future turns. Replaying history should always send well-formed tool-call/result pairing to the provider.

Actual behavior

ai_messages ends up with orphan role:'tool' rows (a tool result whose tool_call_id has no preceding assistant tool_calls message). Every subsequent turn replays that malformed history and the provider rejects it with a tool-pairing error — across providers. Switching models does not help, because the corruption is in the stored history, not the model.

Version or commit

CoreBunch/Instatic@a125a4a (main), exercised through a local branch that adds an OpenAI-compatible provider — but runner.ts/persistence is untouched core code and the built-in OpenAI driver reproduces the same error, so this is not specific to that branch.

Deployment mode

Local dev with Bun

Logs or screenshots

# Provider errors on replay (all the same underlying tool-pairing problem):
OpenAI (400):   No tool call found for function call output with call_id call_37ac.
DeepSeek (400): An assistant message with 'tool_calls' must be followed by tool
                messages responding to each 'tool_call_id'. (insufficient tool
                messages following tool_calls message)
MiniMax (400):  invalid params, tool call result does not follow tool call (2013)

# Persisted ai_messages for the bricked conversation (pos | role | tool_call_id | tool_name):
88 | user      | -        | -            | "Build a single-page website..."
89 | user      | -        | -            | (duplicate of 88 — double-submit)
90 | tool      | call_37ac| read_document| toolResult ok   <-- ORPHAN (no assistant tool_calls before it)
91 | tool      | call_dc2b| insertHtml   | toolResult ok   <-- ORPHAN
92 | tool      | call_a9df| insertHtml   | toolResult ok   <-- ORPHAN
93 | user      | -        | -            | "you only did the hero section"

# Count mismatch confirms it: 35 tool-result rows vs 32 assistant tool_call rows = 3 orphans.

Likely cause (hypothesis)

In server/ai/runtime/runner.ts, a turn persists the assistant tool_calls (appendToolCall) and then the tool results (appendToolResult) as separate events. Orphan results imply a turn persisted results while its assistant tool_calls rows were lost — consistent with an interrupted / concurrent / double-submitted turn racing on one conversation (the tail of the corrupted conversation showed duplicate prompts and repeated sends). I did not fully isolate the exact race, so treat this as a hypothesis backed by the corrupted end-state, not a confirmed root cause.

Suggested mitigation

Make history replay resilient to malformed pairing when building the provider request from persisted history: drop orphan tool-result messages (a result with no matching prior assistant tool_call) and assistant tool_calls with no following result. That way a single interrupted/duplicated turn can't permanently brick a conversation, and it also hardens against cross-provider replay differences. (Optionally, also persist a turn's assistant tool_calls + results atomically and/or guard against concurrent turns on the same conversation, to prevent the orphans from being written in the first place.)


Disclosure: surfaced while testing a local OpenAI-compatible provider against a reasoning-model gateway; the root-cause analysis was AI-assisted (Claude Code). Reproduced the symptom on the built-in OpenAI driver, so it is not provider-specific.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions