Skip to content
View druide67's full-sized avatar

Block or report druide67

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
druide67/README.md

Same model. Same Mac. 30 vs 71 tok/s. That's why I built asiai.

🦞 I'm Jean-Marc (druide67) — I build tools for local LLM inference on Apple Silicon.

asiai : Benchmark, monitor & compare 6 inference engines (Ollama, LM Studio, mlx-lm, llama.cpp, vllm-mlx, Exo). One CLI. Real numbers.

Built because my AI agents needed to monitor their own inference. So I gave them asiai's API. They started monitoring themselves.

Bench your claw!

Recent discoveries

  • MLX is 2.3x faster than llama.cpp for MoE architectures on Apple Silicon
  • DeltaNet KV cache stays flat from 64k to 256k context (same VRAM!)
  • Same model, same Mac: 30 tok/s on one engine, 71 tok/s on another

claude-whisper : Your Claude Code instances can now talk to each other. 240 lines of bash, zero daemon. The filesystem is the message bus.

OpenClaw : contributor — multi-agent AI assistant.

Strasbourg, France | asiai.dev | @jmn67 on X | LinkedIn

Pinned Loading

  1. asiai asiai Public

    Multi-engine LLM benchmark & monitoring CLI for Apple Silicon

    Python 10 1

  2. college-of-ai-rchitects college-of-ai-rchitects Public

    PRISM — A multi-LLM peer review framework for architectural governance. Triangulate AI Architecture Decisions.

    2