Index of voice typing, dictation, and speech-to-text applications and utilities.
Three parallel tracks, each with its own use case:
- Real-time streaming at cursor — speak and see text appear as you go, for chat, IDEs, and quick input. Covered by VoiceType (hybrid local + cloud) and Parakeet Type Ubuntu (local-only proof of concept).
- Long-form note dictation — speak a full note, get back polished, formatted text in one pass. Covered by AI Typer V2.
- Android voice-to-text reformatter — hold-to-talk, single-pass transcription + reformatting into a chosen preset (email, prompt, to-do, Hebrew). Covered by Voxcast.
The flexible hybrid track — aiming to blend local and cloud STT so the user picks the tradeoff per session (latency, cost, privacy). Currently cloud-only via Deepgram Nova-3 streaming with keyterm prompting; local inference is planned. Python + PyQt6, single-process, no root (evdev uinput via the input group). System tray, hotkeys, push-to-talk, VAD, and an in-app cost tracking dialog. Ships as a .deb package.
The local-only track — a focused proof of concept for running NVIDIA Parakeet / NeMo ASR models on AMD CPU inference via sherpa-onnx, with no cloud and no GPU required. Built-in punctuation, multiple model profiles, system tray, configurable hotkeys.
The long-form dictation track — single-pass multimodal audio understanding (Gemini via OpenRouter) where the model transcribes and formats in one call. Smart format detection (email / list / notes), VAD + AGC preprocessing, optional second-pass coherence review, custom dictionary with CSV import/export, streaming live-text preview, global F13–F24 hotkeys, append mode, and type-at-cursor that works in terminals as well as GUI apps.
The Android mobile track — a hold-to-talk voice-to-text app (Expo / React Native) that transcribes and reformats in a single OpenRouter call (Gemini 3.1 Flash Lite) into one of eight serious presets: business email, AI prompt, dev prompt, basic cleanup, to-do list, note to self, casual Hebrew, and Hebrew email. Email modes return separate subject + body for two-tap copy. One preset active at a time, no layering. Sibling project to Crazy-Keyboard but reframed as a productivity tool.
Kept for reference — superseded by the active projects above.
Two-stage process for creating notes from dictated speech — transcription via Whisper API followed by light text formatting. Exports to markdown. Predecessor to AI Typer V2.
Early Whisper-based voice typing iteration.
Early voice keyboard prototype.
File upload based multimodal transcription tool using Gemini via Open Router.
Gemini-powered transcription notepad with cleanup.
Transcription notepad for Gemini ASR.
Workflow workspace for importing recordings from a DVR and using AI for transcription.
Audio cleanup and transcription tool.
Local transcription app with audio multimodal design.
ASR transcription pipeline.
MCP server for Gemini multimodal audio transcription with built-in post-processing.
MCP for using various cloud ASR models for speech-to-text and transcription.
MCP for local AI transcription.
WIP MCP for local STT with cleanup on AMD GPU machines.
Open Router-based audio transcription MCP server.
Comparing Whisper fine-tunes versus stock Whisper on local inference.
Quick eval to answer: how much does speaking pace affect WER/accuracy in ASR?
Evaluating various cloud audio understanding models on the transcribe-and-cleanup workflow.
Test samples for various microphones with an STT accuracy evaluation.
Quick evaluation to find the best STT model in Speech Note (Ubuntu) for local hardware.
Whisper words-per-minute testing.
Evaluation of Gemini 3.1 Lite on audio understanding tasks.
Testing various permutations in system prompting for raw audio transcript cleanup and comparing multimodal ASR vs. the STT + LLM approach.
Whisper fine-tuning iteration.
Validated fine-tuning script for fine-tuning Whisper on Modal GPU with a preformatted audio dataset.
Whisper fine-tuning dataset.
Whisper fine-tuning iteration.
Whisper base model via FUTO.
Local STT fine-tuning tests.
Fine-tuned STT data formats.
Whisper-Wayland with ROCm GPU acceleration — Docker setup for AMD GPUs.
whisper.cpp ROCm setup scripts.
Notes on local Whisper usage.
GUI to facilitate gathering training data for ASR/STT apps in organised datasets with audio capture, text capture, and JSONL metadata construction. Supports LLM-generated text and user-provided.
GUI template for ASR training data collection.
Breaks up texts by approximate reading duration for ASR training.
GUI for recording voice notes on Ubuntu/Linux.
Voice agent implementation for readiness checklists.
Model for classifying voice notes.
Voice note classifier model.
Frontend for open source voice note dataset for annotation/classification project.
Test pipeline: voice context data to Ragie.
Audio processing cleanup script.
Macropad configuration for transcription workflows.
Plan/key allocation for a macropad optimised for heavy daily dictation workflows.
Planning notes for a macropad for STT users.
Voice typer hardware notes.
Voice headset design notes.
Dictation microphone notes and comparisons.
Speech Note Linux app with text fixes — note taking, reading and translating with offline STT, TTS, and machine translation.
Testing Whisper with Hebrew-English mixed speech.
Concept for a speech tech solution — specced out by Claude.
Planning and research for real-time voice typing on Linux (Deepgram, Gemini, Parakeet).
Claude-assisted technical research into live voice typing implementation approaches — streaming inference patterns, partial-result handling, turn detection, and UX tradeoffs for at-cursor dictation.
Planning notes for a Linux voice typing tool.
Notes on STT processing chain for future voice projects.
Point-in-time pricing snapshots for ASR services.
Prompts and outputs on STT, ASR, and fine-tuning with Claude.
List of resources for voice technology with Linux support, encompassing STT, ASR, and dev frameworks.
Claude-enhanced research for voice control platforms with Linux support.
Collection of voice typing / STT GitHub repos for testing on Linux.
Useful speech-to-text tools that use Whisper under the hood (API/local).
Voiceflow planning notes.
STT and TTS training notes.