Skip to content
View shigabeev's full-sized avatar
🥐
🥐

Highlights

  • Pro

Block or report shigabeev

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

「あすらんす」は音声認識性能を比較評価するツールです.音声ファイルパスと正解文を実行時に入力することで,認識精度(マイクロCER),処理にかかった時間,CPU使用率を結果として出力します.

Python 4 1 Updated Jun 22, 2026

Official implementation of the Interspeech 2026 paper: UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

Python 8 1 Updated Jun 17, 2026

🎙️ Voice translation in any direction. Locally on Apple Silicon. Tap Caps Lock, speak your language, get any other. Whisper.cpp, no cloud, no GPU rental.

Python 8 1 Updated May 2, 2026

VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency and Speaking rate Control

Python 241 30 Updated May 30, 2026

Self-hosted AI video dubbing with ASR, translation, voice cloning, subtitles, and local GPU inference.

Python 31 5 Updated Jun 22, 2026
Python 84 Updated May 21, 2026

Augmentations usage examples for albumentations library

Python 540 102 Updated Jun 12, 2026

A high-performance image processing library designed to optimize and extend the Albumentations library with specialized functions for advanced image transformations. Perfect for developers working …

Python 121 11 Updated Jun 15, 2026

Next-generation Albumentations: dual-licensed for open-source and commercial use

Python 496 31 Updated Jun 29, 2026

Practical, Colab-friendly notebooks for fine-tuning and running audio AI models

Jupyter Notebook 418 29 Updated May 19, 2026

The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"

Python 199 17 Updated Sep 24, 2025

The open-source ElevenLabs alternative for local voice cloning, design, create, dubbing and dictation Desktop App

Python 7,785 1,217 Updated Jul 1, 2026
Python 1,918 179 Updated Jun 20, 2026

a featureful union filesystem

C++ 5,715 218 Updated May 27, 2026

Automate the process of making money online.

Python 31,123 3,357 Updated Jun 14, 2026

Fully local voice interface for Claude Code on Apple Silicon. Parakeet STT + Kokoro TTS + SmartTurn EOU + dual VAD.

Python 30 2 Updated Mar 24, 2026

AI agents running research on single-GPU nanochat training automatically

Python 89,385 12,918 Updated Mar 26, 2026

Reimplementation of Bandit for "Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support"

Python 61 5 Updated Jul 29, 2025

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,544 267 Updated Jun 17, 2026

A multilingual text-to-speech synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek.

Python 84 9 Updated Aug 21, 2023

SOTA Open Source TTS

Python 31,066 2,655 Updated Jun 9, 2026

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 1,056 104 Updated Jun 17, 2026

small ass language [extendable with tools] model that follows instructions

Python 2 Updated Oct 16, 2025

PyTorch native post-training library

Python 5,780 732 Updated Jul 1, 2026

Простой нормализатор текстов перед синтезом речи

Python 47 4 Updated May 13, 2024

Renderer for the harmony response format to be used with gpt-oss

Rust 4,430 287 Updated Apr 8, 2026

FluentBird is a userChrome.css theme for Mozilla Thunderbird, that implemenets Windows 11 Fluent Design and Mica transparency materials.

CSS 634 10 Updated Jun 11, 2026

A fast multimodal LLM for real-time voice

Python 4,467 380 Updated Dec 12, 2025

Nano vLLM

Python 14,257 2,271 Updated Apr 26, 2026
Next