Skip to content

GetSmallAI/SmallHarness

Repository files navigation

Small Harness

A small, terminal-first coding harness. Bring your own model, your own key, or your own MCP server.

Install · Run it · First session · Backends · Tools & commands · Cost & credentials · Going further · Configuration · Troubleshooting

CI Rust Version Backends Apple Silicon License MIT


What it is

A coding agent that lives in your terminal. Bring your own API key (OpenAI or OpenRouter) or point it at a local model (Ollama, LM Studio, MLX, llama.cpp) — same tools, same commands, same session log either way. It ships with the usual tool kit — read, edit, grep, shell, run tests — plus a few that aren't usual:

  • Local or cloud, one TUI. Switch providers mid-session with /backend <name> — the tools, commands, and session log don't change.
  • Per-turn cost on the status line. $0.003 this turn · $0.41 session when pricing is known or reported by the provider. Local turns just show tokens.
  • OpenRouter Fusion, one command away. /fusion on switches to the openrouter/fusion alias for deliberative work; /fusion tool attaches Fusion to a chosen OpenRouter coding model for hard reviews, architecture tradeoffs, and high-stakes debugging.
  • Multi-model routing. /route select <task> asks a configured selector model to pick low/medium/high orchestrator and coder tiers, plus play vs production review and security review, then switches the active coding model.
  • Real undo. /undo reverts the last agent turn's file mutations, including files the agent created or files that weren't tracked when the turn started.
  • Session paths. /path fork branches the conversation and workspace so you can try two fixes, diff them, and /path pick the winner — no worktree required.
  • Plan, then grade the work. /plan expands a one-line intent into a spec; /iterate runs a generate→evaluate loop where a separate critic agent scores each pass against a rubric and feeds back until it clears the bar — the generator never grades itself.
  • Reset over compaction. /reset writes a handoff artifact and starts a clean session seeded with it — better coherence on long tasks than summarizing in place.
  • MCP-native. Drop servers into mcpServers in your config; their tools show up as mcp__<server>__<tool> to the model on next launch.
  • /auth instead of .env. Paste API keys once into a 0600 file under ~/.config/small-harness/. Env vars still win when set.
  • Approval gates you can live with. Every mutating call shows you the diff first, with allow once / allow session / always allow caching.

Install

Homebrew (macOS):

brew install getsmallai/tap/small-harness

From source (Rust 1.75+):

git clone https://github.com/GetSmallAI/SmallHarness.git
cd SmallHarness
cargo build --release    # binary at target/release/small-harness

Run it

Launch the interactive session:

small-harness

From a source checkout without installing, use cargo run --release instead.

The first launch runs a short setup wizard (it writes agent.config.json — backend, model, approval policy). Skip it with SMALL_HARNESS_NO_WIZARD=true. Every launch after that opens straight into a session.

Small Harness talks to one backend at a time — pick the path that fits.

Path A — Cloud API key

Fastest to start, frontier-model quality, nothing to install locally.

  1. Set your key — OpenAI or OpenRouter:

    export OPENAI_API_KEY=sk-...
    # or
    export OPENROUTER_API_KEY=sk-or-...
  2. Launch, then select the provider in the first-run wizard (or any time with /backend openai):

    small-harness

Prefer not to put the key in your environment? Launch first, then run /auth set openai inside the app and paste it once — it's stored in a 0600 file under ~/.config/small-harness/. Cost per turn and per session shows live on the status line.

Path A2 — ChatGPT / Codex subscription login

If you want to use a ChatGPT/Codex subscription instead of OpenAI API billing, log in with OAuth inside the TUI:

/login openai-codex
/backend openai-codex

This is intentionally separate from /auth set openai: openai uses an OPENAI_API_KEY and the public OpenAI API, while openai-codex stores a refreshable ChatGPT OAuth token in auth.json and talks to the Codex Responses backend.

Path B — Local model

Private, free, offline — runs entirely on your machine.

  1. Install Ollama, start it, and pull a coding model:

    brew install ollama
    brew services start ollama
    ollama pull qwen2.5-coder:7b
  2. Launch — Ollama is the default backend, so there's nothing else to set:

    small-harness

LM Studio, MLX, and llama.cpp work the same way — see Backends for their ports and start commands.

Tip: switch backends mid-session with /backend <name>, and run /doctor if a backend won't connect.


First session

> what files are in src/?

  Listed src/  (24 files)

src/ has 24 Rust files: main.rs is the entry point (input loop, banner,
warmup); agent.rs runs the chat-completions loop; backends.rs handles the
providers; tools/ contains the tool implementations…

  1.2k in · 87 out · $0.0003 this turn · $0.0003 session

> add a function in src/util.rs that lowercases a string and trims it

  Read src/util.rs
  Edited src/util.rs

  --- src/util.rs
  +++ src/util.rs
  @@ ...
  +pub fn normalize(input: &str) -> String {
  +    input.trim().to_lowercase()
  +}

  Apply? [y/n/a]: y
  checkpoint saved (1 file) — /undo to revert
  3.4k in · 412 out · $0.001 this turn · $0.0013 session

A handful of moves worth knowing right away:

  • /mode explore | edit | ship | review toggles tool + approval + step-budget presets.
  • /undo reverts the last turn's file mutations.
  • /path fork branches the session to try an alternate approach; /path switch, /path diff, and /path pick compare and merge paths.
  • /shipcheck summarizes git state; /handoff drafts a commit message, changelog bullets, and a release post from local context.
  • /ship turns that into a last-mile preflight, local commit, and push path: readiness verdict, blockers, commit-message draft, guarded git commit, and guarded git push; /ship pr opens a draft pull request through GitHub CLI when available, and /ship status summarizes open PR checks/review state.
  • /scorecard shows global quality PRs shipped; /ship pr closes a PR unit with readiness/test evidence. /scorecard close <label> scores manual closes from shipcheck (not the separate /play score fixture report).
  • /plan <intent> drafts a spec; /iterate <goal> runs a generate→evaluate loop where a separate critic grades each pass against a rubric.
  • /play fix-failing-test runs a bundled demo in an isolated sandbox so you can try a real agent loop without touching your repo.
  • Ctrl-J for newline; Enter submits.
  • small-harness --continue resumes the most recent session in the cwd.

Backends

Backend Default URL Notes
ollama http://localhost:11434/v1 Easiest setup; mature tool-call templates
lm-studio http://localhost:1234/v1 GUI model browser; explicit load / unload
mlx http://localhost:8080/v1 Fastest inference on Apple Silicon (via mlx_lm.server)
llamacpp http://localhost:8080/v1 Direct GGUF serving (via llama-server)
openrouter https://openrouter.ai/api/v1 Cloud A/B with /compare; access to frontier models and Fusion
openai https://api.openai.com/v1 Direct provider access with your own key
openai-codex https://chatgpt.com/backend-api/codex/responses ChatGPT/Codex subscription OAuth via /login openai-codex

Switch at runtime with /backend <name>. Endpoint overrides: OLLAMA_BASE_URL, LM_STUDIO_BASE_URL, MLX_BASE_URL, LLAMACPP_BASE_URL, OPENAI_BASE_URL, OPENAI_CODEX_BASE_URL. API backends require an API key (set via /auth or env var); openai-codex requires /login openai-codex.

Default model per backend

Each backend has one sensible default; local backends default to a 7B coder that runs on modest hardware. Override any time with /model, AGENT_MODEL, or modelOverride in your config.

Backend Default model
ollama qwen2.5-coder:7b
lm-studio qwen2.5-coder-7b-instruct
mlx mlx-community/Qwen2.5-Coder-7B-Instruct-4bit
llamacpp gpt-3.5-turbo
openrouter qwen/qwen-2.5-coder-32b-instruct
openai gpt-4o-mini
openai-codex gpt-5.5

Recommend the right model for your box

All model-tuning lives under /doctor:

/doctor recommend       rank installed + default + cached models for your hardware
/doctor autotune apply  switch to the top-scoring cached local model
/doctor --deep          probe streaming, usage chunks, tool calls, fallbacks
/doctor bench           measure warmup, first-token, and total latency
/doctor models          show cached per-model capability + benchmark records

Tools and commands

Tools

Class Tools
Read file_read, grep, list_dir, glob, repo_search
Mutate (approval-gated) file_write, file_edit, apply_patch, batch_edit, shell
Workflow run_tests, ship_status, web_fetch, update_plan, task, critique
MCP anything an MCP server exposes, surfaced as mcp__<server>__<tool>

The default toolSelection: "auto" keeps the full working pool available for any real request (so "build me a site" writes files instead of dumping code into the chat) and sends no tools only for plain greetings. fixed always sends the pool. Set the pool with /tools file_read,grep,list_dir, or persistently in agent.config.json.

Approval policies

Policy Behavior
always (default) Every mutating call prompts you, with a diff preview
dangerous-only Only shell calls matching rm, sudo, chmod, dd, mkfs, etc. prompt
never No prompts — use only when you trust the model

At each prompt: [y]es, [n]o, [a]lways for this tool, or [s]ession-allow this exact call. The session cache resets on /new.

Slash commands

Session and config

/help                  list commands
/new                   start a fresh conversation
/setup                 rerun the setup wizard
/config                show resolved configuration
/session [title <…>]   show / rename the current session
/sessions              list saved sessions
/resume latest|<id>    resume a saved session
/export current|<id>   export transcript to markdown or json
/undo                  revert the last agent turn's file mutations
/path                  fork, switch, diff, pick, or drop parallel session paths
/paths                 list saved session paths

Operator modes and workflow

/mode explore|edit|ship|review   switch operator preset
/plan <intent>                   expand a short intent into a spec (.small-harness/spec.md)
/plan validate                   check the spec's Done Criteria against the working diff
/shipcheck                       summarize git + test readiness
/ship [--tests]                  preview last-mile ship readiness and commit message
/ship commit --all|--staged-only guarded local git commit with ship record
/ship push                       guarded git push, setting upstream when needed
/ship pr [--base main]           create a draft GitHub PR via gh, or print the command
/ship status                     summarize open PR checks and review state
/scorecard                       show global quality PRs shipped
/scorecard current               show tracked tokens on the current repo/branch
/scorecard prs [limit]            list recent closed PRs (numbered)
/scorecard pr <n>                 drill into PR quality, sessions, and trace audit
/scorecard verify <n>|--all       append GitHub PR checks/review/merge verification
/scorecard close <label> [--url <url>] [--tests]  close branch with shipcheck quality score
/scorecard doctor                inspect the local scorecard ledger for malformed JSONL
/scorecard export [path]          copy the raw scorecard ledger before repair or sharing
/handoff                         draft commit, changelog, release copy
/test discover|run|smart         discover or run tests
/fix                             fix-until-green loop
/iterate <goal>                  generate→evaluate→improve loop (rubric-scored)
/auto <goal> | --spec            autonomous overnight run (iterate + auto-reset, budget/deadline)
/batch / /refactor               coordinated multi-file edits
/play fix-failing-test           bundled demo in an isolated sandbox

Backend, model, tools

/backend <name>        switch backend
/model [id]            list / pick a model (shows context + cost when known)
/tools auto|fixed|<…>  show or set the active tool pool
/auth                  manage API keys and OAuth credentials
/login openai-codex    sign in with ChatGPT/Codex subscription OAuth
/logout openai-codex   clear the stored ChatGPT/Codex login
/image <path>          attach an image to the next user turn
/reasoning on|off      toggle the streaming reasoning panel
/verbose on|off        show every tool call with its full args + result
/trace on|off          show nested subagent/critic tool calls (indented)
/compare [model]       re-send the last prompt against OpenRouter for A/B
/fusion on|tool|off    use OpenRouter Fusion alias or attach Fusion to a model
/route select|apply    select or apply a configured multi-model stack route

Memory, capabilities, context

/index                 build / refresh project memory
/map [query]           print a repo map or focused hits
/remember <text>       save a durable project note
/forget <id|all>       remove notes
/context               show prompt budget, model limit, auto-guard status
/compact               summarize older turns (auto-runs at threshold)
/reset                 write a continuation handoff and start a fresh session
/doctor [--deep]       probe backend, tools, streaming, capabilities
/doctor models         show cached per-model capability + benchmark records
/doctor autotune       pick the best cached local model (add `apply` to switch)
/doctor recommend      rank models for your hardware
/doctor bench          measure warmup + first-token + total latency
/checkpoints           toggle per-turn snapshots

Run /help in the harness for the full list with descriptions.


Cost and credentials

Credentials with /auth and /login

API-key cloud backends authenticate with API keys. Paste them once and Small Harness stores them at ~/.config/small-harness/auth.json (mode 0600). Environment variables always win at lookup time, so CI and scripted users see no change in behavior.

/auth                    show what's configured (keys are masked)
/auth set openai         paste your OpenAI key, save to file + this session
/auth set openrouter     paste your OpenRouter key
/auth clear openai       remove from the file (env stays for this session)
/login openai-codex      browser/device-code login with ChatGPT/Codex
/logout openai-codex     remove the stored OAuth credential

openai-codex is not an OPENAI_API_KEY replacement. It uses browser/device OAuth, stores {access, refresh, expires, accountId} in the same auth.json, refreshes the access token before use, and sends model traffic to the Codex Responses backend.

Per-turn and session cost

When you're on a cloud backend with known pricing or provider-reported usage cost, every turn prints its own cost plus the running session total:

  2.1k in · 845 out · $0.013 this turn · $0.094 session

Switch to Ollama mid-session and the line shows $0.00 this turn but keeps the running total honest. OpenRouter returns usage.cost for many requests, including dynamic routers like Fusion; Small Harness uses that reported value when present. If a cloud model does not expose cost, the turn shows $? and prefixes the session total with to signal it is a lower bound, not a fiction.

The /model picker shows the same data while you choose:

   1) gpt-4o-mini            128k ctx · $0.15/$0.60 per Mtoken
   2) gpt-4o                 128k ctx · $2.50/$10.00 per Mtoken
   3) o1-mini                128k ctx · $3.00/$12.00 per Mtoken

Quality PR scorecard

/scorecard tracks whether Small Harness-assisted PRs are shipping with good local quality evidence at close time — not post-merge CI on GitHub. Each successful interactive turn still records input + output tokens under the current repo and branch, but tokens are context rather than the score. /ship pr closes that branch as a PR unit automatically and attaches a quality snapshot from local ship readiness: blockers, warnings, whether tests passed, and whether the GitHub PR command succeeded.

If you open a PR outside the built-in flow, run /scorecard close <label> to close it with the same shipcheck-based score. Add --url <github-pr-url> when you have a PR link and --tests to run tests before scoring.

The default view shows quality PR count, quality rate, average quality score, clean ships, PRs needing follow-up, tokens per quality PR, the open branch total, and a GitHub-style daily grid. /scorecard prs lists numbered recent closes with session and ship-record hints; /scorecard pr <n> shows the full audit captured at close time — quality rubric, per-session turn-trace summaries (turns, steps, tool calls, timing), paths to session event logs, and explicit reasons when a scored PR did not count as quality-shipped.

For PRs with GitHub URLs, /scorecard verify <n> refreshes the remote outcome through gh pr view: PR state, review decision, mergeability, and check-rollup status. /scorecard verify --all appends verification events for all recent verifiable PRs. This does not rewrite the local close-time score; it adds later remote evidence that /scorecard pr <n> renders next to the original audit.

After enough turns on a feature branch, the turn footer nudges you to close via /ship pr. Audit snapshots come from local event logs at close time; export raw traces with /export <session> events.

A PR counts as quality-shipped when its local score meets scorecard.qualityThreshold (default 80), tests passed, readiness was not blocked, and either the PR creation command succeeded or a PR URL was captured with --url. Configure via scorecard in agent.config.json or disable with scorecard.enabled: false. Data is stored locally under the Small Harness data directory; /scorecard path prints the exact JSONL file. Use /scorecard doctor if the ledger looks wrong; malformed JSONL lines are skipped rather than allowed to break the scorecard, and /scorecard export [path] copies the raw ledger before manual repair. /scorecard reset --yes now saves a timestamped backup before removing the active store.

Note: /play score shows playground fixture results — unrelated to this global quality PR scorecard.


Going further

Plan a feature first

/plan <intent> expands a one- or two-sentence intent into an ambitious spec — goal, user outcomes, scope, done criteria, open questions — and writes it to .small-harness/spec.md. It deliberately stays at the level of what and why, not implementation, so an early spec doesn't lock in the wrong details. /plan show prints the saved spec; --export <path> writes elsewhere.

/plan validate closes the loop: it reads the spec's Done Criteria and checks each one against the current working-tree diff (the same done-check /auto runs each round), printing a met/unmet checklist so you can ask "am I actually done?" by hand. Like /iterate, it sends the diff to the model, so it runs on a local backend unless you set rubric.allowCloud.

Generate, evaluate, iterate

/iterate <goal> runs a generate→evaluate→improve loop. After each attempt a separate, read-only critic agent (critique) scores the work 0–10 against a weighted rubric and hands back actionable feedback; the loop repeats — refining or pivoting — until the score clears the threshold or it runs out of rounds (--max N, default 6, capped at 15; --threshold X). The harness, not the model, computes the weighted total and pass/fail, so a critic that over-rates can't wave weak work through.

The rubric defaults to quality / originality / craft / functionality and penalizes generic "AI slop"; override it with a .small-harness/rubric.md using ## Name (weight: N) sections. Set iterate.evaluatorModel to grade with a different model than the generator — the cleanest version of the generator/evaluator split. Turn on rubric.liveVerify and the critic runs your test suite (via a fixed-surface verify tool — no arbitrary shell) before scoring functionality. The critique tool is also available on its own for a one-off, independent grade.

Workspace context is never sent to a cloud backend for grading unless you set rubric.allowCloud.

Reset over compaction

On a long task, /reset writes a structured handoff artifact — done, in progress, key decisions, next steps, key files — to .small-harness/continue.md, then starts a fresh session seeded with only that artifact. Unlike /compact, which summarizes in place, this is a clean context window carrying just what's needed to continue, which holds coherence better over long runs. /reset --dry-run writes the artifact without clearing; cloud backends require --cloud, since drafting the note sends the conversation to the model.

Run it overnight with /auto

/auto is the unattended version of the loop above: it runs /iterate's generate→evaluate round repeatedly and, when the context window fills, fires /reset automatically — drafting a handoff and continuing in a fresh session — so a run can go for hours without blowing its budget. The goal and the latest feedback carry across each reset.

/auto "add retry logic to web_fetch" --budget 2.00 --deadline 6h
/auto --spec --max 20 --yolo        # drive the spec.md from /plan to done

Give it an inline goal, or --spec to read the goal and Done Criteria from .small-harness/spec.md (written by /plan). With criteria present, each round also checks them against the working-tree diff, and "done" means the rubric threshold and every criterion is met — a lightweight spec-validator folded in.

Flag Meaning
--spec Read goal + Done Criteria from .small-harness/spec.md
--max N Round ceiling (default 12, hard cap 40)
--threshold X Per-round rubric pass bar (default rubric.passThreshold)
--budget $ Stop after this much generator spend
--deadline 6h Wall-clock cap (h/m/s)
--reset-at 0.75 Context-fill ratio that triggers an auto-reset (0.50–0.95)
--yolo Auto-approve mutations for the whole run
--cloud Allow sending workspace context to a cloud backend

The run is always finitely bounded (a --max ceiling applies even with no other flag) and stops early on a stall — no score gain and no diff change for three rounds. However it ends — goal met, budget/deadline/rounds exhausted, stall, error, or Ctrl-C — it leaves a morning report at .small-harness/auto-report.md with the verdict, per-round scores, the Done Criteria checklist, cost, elapsed time, and reset count. Same guards as /iterate: it runs on a local backend unless you pass --cloud, needs rubric.enabled, and won't run inside a /play session. Defaults live in the auto config block. /undo reaches back only to the last reset boundary, so keep checkpoints.enabled on for an unattended run.

Project-specific system prompt

Drop a markdown file at .small-harness/prompt.md in your repo and Small Harness prepends it to the system prompt every turn. Use it for project conventions ("snake_case everywhere", "ship via make release", "never edit vendor/"). Auto-truncated at 8 KB.

MCP servers

Add an mcpServers block to agent.config.json:

{
  "mcpServers": {
    "fs": {
      "command": "/usr/local/bin/some-mcp-server",
      "args": ["--root", "/tmp"],
      "env": { "TOKEN": "abc" }
    }
  }
}

Small Harness spawns each server at startup, lists its tools, and exposes them through the same approval-gated tool layer with names like mcp__fs__read_file. JSON-RPC over stdio; no extra dependencies.

Image input

/image <path> attaches an image to your next prompt. Small Harness encodes it as a data:image/...;base64,... URL and sends it as a multi-part user message. The catalog tracks which models accept images; you get a warning if your current model isn't vision-capable.

Web fetch

web_fetch (off by default, approval-gated) lets the agent pull a URL, strip HTML to text, and read the result. Useful for docs and RFCs the model needs to consult mid-task. Enable per session with /tools auto file_read,grep,list_dir,web_fetch or persistently in your config.

Project memory

/index builds a safe local repo map at .sessions/project-memory/. It stores metadata only — paths, language, symbols, headings, capped keyword terms — never file bodies. It honors .gitignore and skips .git, .sessions, target, node_modules, binaries, oversized files, and common secret/env files. /map prints a compact view; /remember <text> saves a durable project note.

Compare clouds with /compare

/compare re-sends your last prompt against any OpenRouter model so you can A/B a local response against a frontier one without leaving the session. Requires OPENROUTER_API_KEY.

Use OpenRouter Fusion

Fusion is useful when a normal coding model is not enough: design reviews, multi-file architecture tradeoffs, incident debugging, dependency choices, or questions where a bad answer is more expensive than a few extra completions.

/fusion on

Switches the active backend to OpenRouter and the model to openrouter/fusion. Use it for deliberative turns, then run /fusion off to return to the normal OpenRouter default model.

/fusion tool anthropic/claude-sonnet-4.5
/fusion tool anthropic/claude-sonnet-4.5 panel=~openai/gpt-latest,deepseek/deepseek-v3.2 judge=~anthropic/claude-opus-latest max-tools=4

Tool mode keeps a chosen OpenRouter coding model as the outer agent and adds OpenRouter's Fusion plugin so the model can invoke multi-model deliberation when the turn warrants it. The same Small Harness tools, approvals, session log, token counts, and reported OpenRouter costs stay visible.

Route tasks across a model system

/route lets you describe a model stack that blends local and frontier models: separate orchestrators for low/medium/high planning, coders for low/medium/high implementation, play and production review models, a security review model, and one selector model that chooses the route for a task.

/route template
/route status
/route select add OAuth login with token refresh and tests
/route select --dry-run redesign the settings page
/route apply coder high
/route apply review production
/route apply security

/route select sends the task plus the configured stack to modelSystem.selector, expects a JSON decision, prints the selected orchestrator/coder/reviewer/security path, and switches the live session to the chosen coding model unless --dry-run is passed. The selector can also return coderEffort, reviewEffort, and securityEffort (none, minimal, low, medium, high, xhigh, or max). The chosen coder effort becomes the active session effort, appears in /session and the turn footer, and is sent to OpenRouter as reasoning.effort; local backends ignore unsupported request fields while still showing the selected effort.


Configuration

Resolution order (later overrides earlier):

  1. Built-in defaults
  2. agent.config.json in the working directory
  3. .env, then .env.local
  4. Process environment variables
  5. Slash command overrides at runtime

Environment variables (the useful ones)

BACKEND=ollama                                          # ollama|lm-studio|mlx|llamacpp|openrouter|openai|openai-codex
AGENT_MODEL=qwen2.5-coder:14b                           # overrides the backend default model

OPENAI_API_KEY=sk-...                                   # required for openai
OPENROUTER_API_KEY=sk-or-...                            # required for openrouter / /compare
OPENAI_BASE_URL=https://api.openai.com/v1               # point at a compatible proxy if needed
OPENAI_CODEX_BASE_URL=https://chatgpt.com/backend-api    # override Codex backend base if needed

APPROVAL_POLICY=always                                  # always | dangerous-only | never
AGENT_TOOLS=file_read,grep,list_dir,file_edit,file_write,shell,update_plan,task
AGENT_TOOL_SELECTION=auto                               # auto | fixed

WARMUP=true                                             # pre-warm prompt cache at startup
SMALL_HARNESS_NO_WIZARD=false                           # skip first-run setup
SMALL_HARNESS_NO_UPDATE_CHECK=false                     # skip the GitHub release check

Full list with comments in .env.example.

agent.config.json

For project-level defaults, run /setup or drop a JSON file at the repo root. Common shape:

{
  "backend": "ollama",
  "modelOverride": "qwen2.5-coder:14b",
  "approvalPolicy": "dangerous-only",
  "tools": ["file_read", "grep", "list_dir", "file_edit", "file_write", "shell", "update_plan", "task"],
  "toolSelection": "auto",
  "maxSteps": 20,
  "display": {
    "toolDisplay": "grouped",
    "eventLog": { "enabled": true }
  },
  "scorecard": {
    "enabled": true,
    "qualityThreshold": 80,
    "nudgeMinTurns": 3
  },
  "workspaceRoot": "/path/to/project",
  "outsideWorkspace": "prompt",
  "context": {
    "maxMessages": 40,
    "modelContextTokens": 8192,
    "autoCompact": true,
    "compactThreshold": 0.85,
    "reserveRatio": 0.25
  },
  "projectMemory": {
    "enabled": true,
    "autoInject": true,
    "allowCloudContext": false
  },
  "checkpoints": { "enabled": true, "maxTurns": 10 },
  "rubric": { "enabled": true, "passThreshold": 7.0, "allowCloud": false, "liveVerify": false },
  "iterate": { "maxIters": 6, "evaluatorModel": null },
  "auto": { "maxRounds": 12, "budgetUsd": null, "resetRatio": 0.75, "deadline": null },
  "paths": {
    "enabled": true,
    "maxPaths": 5,
    "maxSnapshotBytes": 52428800,
    "maxFileBytes": 1048576
  },
  "openrouter": {
    "fusion": {
      "enabled": false,
      "analysisModels": [],
      "judgeModel": null,
      "maxToolCalls": null
    }
  },
  "modelSystem": {
    "enabled": true,
    "selector": {
      "backend": "openrouter",
      "model": "openrouter/fusion",
      "effort": "high",
      "thinkingDepth": "deep"
    },
    "orchestrators": {
      "low": { "backend": "ollama", "model": "qwen2.5-coder:7b" },
      "medium": { "backend": "openrouter", "model": "qwen/qwen-2.5-coder-32b-instruct" },
      "high": { "backend": "openrouter", "model": "anthropic/claude-sonnet-4.5" }
    },
    "coders": {
      "low": { "backend": "ollama", "model": "qwen2.5-coder:7b" },
      "medium": {
        "backend": "openrouter",
        "model": "qwen/qwen-2.5-coder-32b-instruct",
        "effort": "medium"
      },
      "high": {
        "backend": "openrouter",
        "model": "anthropic/claude-sonnet-4.5",
        "effort": "high"
      }
    },
    "reviewers": {
      "play": { "backend": "ollama", "model": "qwen2.5-coder:7b" },
      "production": { "backend": "openrouter", "model": "openrouter/fusion" }
    },
    "securityReviewer": { "backend": "openrouter", "model": "openrouter/fusion" }
  },
  "mcpServers": {
    "fs": { "command": "/usr/local/bin/some-mcp-server", "args": [] }
  }
}

Anything in the config can be overridden by env or slash commands at runtime.


Quality of life

  • small-harness --continue resumes the most recent session in cwd without picking from a list.
  • small-harness completions bash|zsh|fish prints a completion script you can source.
  • /reasoning on|off toggles the streaming reasoning panel — adds a dim "thinking…" block above the answer for o-series and similar models.
  • /verbose on|off switches to a debug tool view: every tool call is printed with its full arguments and a large result preview, so you can see exactly what the agent is doing. /verbose off restores the normal view.
  • /trace on|off shows nested subagent and critic tool activity as indented lines in the TUI (without flooding the parent context). Every turn is also logged to a sidecar at .sessions/<session-id>.events.jsonl with tool calls, approvals, compaction, warmup, and timing — enabled by default via display.eventLog.enabled in agent.config.json.
  • Turn footer timing. After each turn the status line includes step count and a breakdown when available: TTFT, model, tools, approval, and total seconds alongside the existing token and cost stats.
  • Slash-command completion. Type / and a menu of matching commands (with descriptions) appears beneath the prompt; the best match also shows as dim ghost text. ↑/↓ select, Tab accepts (with a trailing space), accepts inline, Esc dismisses. It narrows live as you type.
  • Update check. Once a day, Small Harness checks GitHub for a newer release and shows a one-line notice in the banner if there is one. Background, cached, opt-out with SMALL_HARNESS_NO_UPDATE_CHECK=true.
  • Crash log. If the harness panics, it writes a redacted log (API keys scrubbed) to .sessions/crashes/<timestamp>.log and prints the path so you have something to attach to an issue.
  • One-shot modesmall-harness --print "summarize this repo" or printf '…\n' | small-harness for scripts and CI. Approval-gated tools are denied by default; pass --allow-tools to allow them.
  • Agent eval CLIsmall-harness --eval fix-failing-test [--model M] [--json] runs a bundled eval fixture and exits 0/1 (for CI scripts). --eval can also point at a data-only fixture JSON file; its workspace is resolved relative to that file and rejected if it escapes the fixture root.
  • Warmup. Small Harness sends a 1-token request with the full system prompt + tools at startup so llama.cpp-derived engines have a hot prompt-eval cache before your first prompt. Disable with WARMUP=false.

Troubleshooting

Backend not reachable: Connection error

  • Ollamabrew services start ollama or run ollama serve. Default port 11434.
  • LM Studio — open the app, go to Local Server, click Start. Default port 1234.
  • MLX — start mlx_lm.server --port 8080 against an MLX-format model.
  • llama.cppllama-server -m /path/to/model.gguf --host 127.0.0.1 --port 8080 --jinja (the --jinja flag enables native tool calls).
  • OpenRouter — set OPENROUTER_API_KEY (or use /auth set openrouter).
  • OpenAI — set OPENAI_API_KEY (or use /auth set openai). Use OPENAI_BASE_URL for a compatible proxy.
  • OpenAI Codex — run /login openai-codex, then /backend openai-codex.

Run /doctor --deep for a fuller capability probe (streaming, usage chunks, native tool calls, inline JSON fallback). Reports land under .sessions/doctor/.

First prompt is slow even with warmup

The cache becomes stale when you change /backend, /model, or /tools. The next prompt re-evaluates the new system prompt and tools. One-time per change.

Model returns tool calls as text JSON

Some small-model templates emit tool calls as plain content ({"name": "shell", "arguments": {…}}) instead of populating the tool_calls field. Small Harness detects and synthesizes a real tool call. If a particular model still misbehaves, llama3.1:8b has well-tested tool-call templates.

Model responds in another language unexpectedly

Some bilingual models (notably qwen) drift into Chinese on short greetings. The system prompt has an explicit language directive; if it's still happening, strengthen it by editing SYSTEM_PROMPT in src/config.rs.

cargo: command not found

Install Rust via rustup: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh.


Architecture

                +-------------------------+
                |        main.rs          |
                |  banner / input loop /  |
                |  warmup / approval      |
                +------------+------------+
                             |
                             v
+-----------+    +-------------------------+    +-------------------+
| config.rs |--->|        agent.rs         |<-->|    tools/*.rs     |
|  + auth/  |    |  chat/completions loop  |    | + mcp__ adapters  |
+-----------+    +-------------+-----------+    +-------------------+
                               |
                               v
                +-------------------------+
                |       backends.rs       |
                |  Ollama / LM Studio /   |
                |  MLX / llama.cpp /      |
                |  OpenRouter / OpenAI    |
                +-------------------------+

Source layout in src/agent.rs runs the loop, backends.rs holds the backend providers, tools/ holds tool implementations, mcp.rs is the stdio MCP client, catalog.rs has the per-model context + pricing table, auth.rs manages the credential file, session.rs writes the JSONL log. cargo doc --open for module-level docs.


Contributing

cargo check                # type-check without producing a binary
cargo run --release        # optimized build + run
cargo build --release      # target/release/small-harness
cargo fmt --check
cargo clippy --all-targets -- -D warnings
cargo test

Guidelines:

  • Mutating tools implement require_approval on the Tool trait (return true, or compute from args — see shell.rs).
  • New backends usually need an OpenAI-compatible /v1/chat/completions endpoint and a default model in backends.rs; non-compatible transports should add an adapter like codex_responses.rs.
  • Before opening a PR, run the full check suite: cargo fmt --check, cargo clippy --all-targets -- -D warnings, and cargo test.

Release tags use a leading v (v0.4.0). The release workflow at .github/workflows/release.yml builds notarized macOS binaries when Apple Developer secrets are present.


License

MIT.

About

A harness for small llms

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

Contributors