[Bug]: Interrupted/duplicate agent turns persist orphan tool-results, bricking the conversation for all providers on replay

Summary

An agent conversation can reach a state where tool-result messages are persisted without their preceding assistant tool_calls message ("orphan" tool results). Once that happens, replaying the conversation history fails for every provider — the chat request 400s on tool-call/result pairing — and the conversation becomes permanently unusable regardless of which model is selected.

This is not provider-specific: the built-in OpenAI driver hits it too, and the persistence path (server/ai/runtime/runner.ts) is unchanged core code.

Steps to reproduce

I was not able to pin an exact deterministic sequence — it surfaced during heavy agent use (building a full site) while switching models mid-conversation and with some interrupted / duplicate sends. The smallest conditions I believe trigger it:

Run the site agent through a long, tool-heavy turn (many insertHtml/token tools).
Interrupt or re-submit a prompt while a tool turn is mid-flight (or switch the model mid-turn).
Continue the same conversation.

The resulting persisted state is deterministic and is the real evidence (below): the conversation ends up with tool-result rows that have no matching assistant tool_calls row before them.

Expected behavior

A turn that errors, is interrupted, or is double-submitted should never leave the conversation in a state that bricks all future turns. Replaying history should always send well-formed tool-call/result pairing to the provider.

Actual behavior

ai_messages ends up with orphan role:'tool' rows (a tool result whose tool_call_id has no preceding assistant tool_calls message). Every subsequent turn replays that malformed history and the provider rejects it with a tool-pairing error — across providers. Switching models does not help, because the corruption is in the stored history, not the model.

Version or commit

CoreBunch/Instatic@a125a4a (main), exercised through a local branch that adds an OpenAI-compatible provider — but runner.ts/persistence is untouched core code and the built-in OpenAI driver reproduces the same error, so this is not specific to that branch.

Deployment mode

Local dev with Bun

Logs or screenshots

# Provider errors on replay (all the same underlying tool-pairing problem):
OpenAI (400):   No tool call found for function call output with call_id call_37ac.
DeepSeek (400): An assistant message with 'tool_calls' must be followed by tool
                messages responding to each 'tool_call_id'. (insufficient tool
                messages following tool_calls message)
MiniMax (400):  invalid params, tool call result does not follow tool call (2013)

# Persisted ai_messages for the bricked conversation (pos | role | tool_call_id | tool_name):
88 | user      | -        | -            | "Build a single-page website..."
89 | user      | -        | -            | (duplicate of 88 — double-submit)
90 | tool      | call_37ac| read_document| toolResult ok   <-- ORPHAN (no assistant tool_calls before it)
91 | tool      | call_dc2b| insertHtml   | toolResult ok   <-- ORPHAN
92 | tool      | call_a9df| insertHtml   | toolResult ok   <-- ORPHAN
93 | user      | -        | -            | "you only did the hero section"

# Count mismatch confirms it: 35 tool-result rows vs 32 assistant tool_call rows = 3 orphans.

Likely cause (hypothesis)

In server/ai/runtime/runner.ts, a turn persists the assistant tool_calls (appendToolCall) and then the tool results (appendToolResult) as separate events. Orphan results imply a turn persisted results while its assistant tool_calls rows were lost — consistent with an interrupted / concurrent / double-submitted turn racing on one conversation (the tail of the corrupted conversation showed duplicate prompts and repeated sends). I did not fully isolate the exact race, so treat this as a hypothesis backed by the corrupted end-state, not a confirmed root cause.

Suggested mitigation

Make history replay resilient to malformed pairing when building the provider request from persisted history: drop orphan tool-result messages (a result with no matching prior assistant tool_call) and assistant tool_calls with no following result. That way a single interrupted/duplicated turn can't permanently brick a conversation, and it also hardens against cross-provider replay differences. (Optionally, also persist a turn's assistant tool_calls + results atomically and/or guard against concurrent turns on the same conversation, to prevent the orphans from being written in the first place.)

Disclosure: surfaced while testing a local OpenAI-compatible provider against a reasoning-model gateway; the root-cause analysis was AI-assisted (Claude Code). Reproduced the symptom on the built-in OpenAI driver, so it is not provider-specific.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Interrupted/duplicate agent turns persist orphan tool-results, bricking the conversation for all providers on replay #99

Summary

Steps to reproduce

Expected behavior

Actual behavior

Version or commit

Deployment mode

Logs or screenshots

Likely cause (hypothesis)

Suggested mitigation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Interrupted/duplicate agent turns persist orphan tool-results, bricking the conversation for all providers on replay #99

Description

Summary

Steps to reproduce

Expected behavior

Actual behavior

Version or commit

Deployment mode

Logs or screenshots

Likely cause (hypothesis)

Suggested mitigation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions