Stars
「あすらんす」は音声認識性能を比較評価するツールです.音声ファイルパスと正解文を実行時に入力することで,認識精度(マイクロCER),処理にかかった時間,CPU使用率を結果として出力します.
Official implementation of the Interspeech 2026 paper: UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction
🎙️ Voice translation in any direction. Locally on Apple Silicon. Tap Caps Lock, speak your language, get any other. Whisper.cpp, no cloud, no GPU rental.
VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency and Speaking rate Control
Self-hosted AI video dubbing with ASR, translation, voice cloning, subtitles, and local GPU inference.
Augmentations usage examples for albumentations library
A high-performance image processing library designed to optimize and extend the Albumentations library with specialized functions for advanced image transformations. Perfect for developers working …
Next-generation Albumentations: dual-licensed for open-source and commercial use
Practical, Colab-friendly notebooks for fine-tuning and running audio AI models
The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"
The open-source ElevenLabs alternative for local voice cloning, design, create, dubbing and dictation Desktop App
Automate the process of making money online.
Fully local voice interface for Claude Code on Apple Silicon. Parakeet STT + Kokoro TTS + SmartTurn EOU + dual VAD.
AI agents running research on single-GPU nanochat training automatically
Reimplementation of Bandit for "Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support"
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
A multilingual text-to-speech synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek.
MiMo-Audio: Audio Language Models are Few-Shot Learners
small ass language [extendable with tools] model that follows instructions
Renderer for the harmony response format to be used with gpt-oss
FluentBird is a userChrome.css theme for Mozilla Thunderbird, that implemenets Windows 11 Fluent Design and Mica transparency materials.






