High-speed Large Language Model Serving for Local Deployment
-
Updated
Jan 24, 2026 - C++
High-speed Large Language Model Serving for Local Deployment
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
Modern desktop application (Rust + Tauri v2 + Svelte 5 + Candle (HF)) for communicating with AI models that runs completely locally on your computer. No subscriptions, no data sent to the internet — just you and your personal AI assistant
On-device AI for iOS & Android
Notolog Markdown Editor
Tool for test diferents large language models without code.
Study Buddy is a desktop application that provides AI tutoring without requiring internet access or accounts
LLM chatbot example using OpenVINO with RAG (Retrieval Augmented Generation).
Lightweight 6GB VRAM Gradio web app with auto-installer for running AuraFlow locally — no cloud, no clutter.
EN: An overfitted SD prompt engine with severe "aesthetic snobbery," forcibly transforming mundane ideas into professional-grade physical rendering instructions. CN: 一个具备“审美洁癖”的过拟合提示词引擎,强行将平庸构思纠偏为具备极致物理质感的工业级渲染指令。
Verify claims using AI agents that debate using scraped evidence and local language models.
Function calling and Agentic framework test app
MCP server that runs local LLMs (with full access to MCP tools included). Callable by Python to chain MCP tools with local intelligence.
Audit local LLM function calling and agentic reliability. Visual tool-use benchmarking for quantized models on YOUR hardware.
A lightweight Python implementation of Microsoft's Phi-3 model running locally on CPU.
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
Search through your thousands of photos using natural human language locally on your PC
script which performs RAG and use a local LLM for Q&A
The Streamlit AI Content Generator + Editor is a fully interactive, web-based tool designed to assist content creators, marketers, and developers in generating high-quality blog posts, articles, and marketing copy.
Add a description, image, and links to the local-inference topic page so that developers can more easily learn about it.
To associate your repository with the local-inference topic, visit your repo's landing page and select "manage topics."