High-performance C/C++ inference engine for pyannote-audio speaker diarization, built on ggml.
pyannote.cpp is a pure C++ implementation of the pyannote-audio speaker diarization pipeline, optimized for CPU inference without requiring Python at runtime. It answers the question "who spoke when?" by:
- Voice Activity Detection (VAD): Detecting speech segments using PyanNet (SincNet + LSTM)
- Speaker Embedding: Extracting speaker features using Mel-filterbanks + ResNet34
- Clustering: Assigning speaker IDs via greedy cosine-distance clustering
- Pure C++ Inference: No Python dependency at runtime
- GGML-based: Leverages the efficient ggml tensor library
- GGUF Format: Models stored in the portable GGUF format
- Cross-platform: Supports x86_64, ARM, Apple Silicon
- CPU Optimized: Efficient CPU inference (GPU support planned)
pip install torch numpy gguf asteroid-filterbanks- CMake 3.10+
- GCC/Clang or MSVC
- Git
Download the PyTorch models from HuggingFace:
- Segmentation model:
pyannote/speaker-diarization-3.1(subfoldersegmentation/) - Embedding model:
pyannote/speaker-diarization-3.1(subfolderembedding/)
Or use the automated download script:
python download_models.pyConvert to GGUF format:
# Convert segmentation model
python examples/python/convert_pyannote_to_ggml.py \
models/segmentation/pytorch_model.bin \
models/pyannote-segmentation.gguf
# Convert embedding model
python examples/python/convert_embedding_to_ggml.py \
models/embedding/pytorch_model.bin \
models/pyannote-embedding.ggufmkdir build && cd build
cmake ..
make -j$(nproc)On macOS:
cmake ..
make -j$(sysctl -n hw.ncpu)Note: Currently only CPU backend is supported. GPU acceleration (CUDA, Metal, Vulkan) support is planned for future releases.
./build/bin/pyannote-diarization \
models/pyannote-segmentation.gguf \
models/pyannote-embedding.gguf \
audio.wavNote: Input audio must be 16kHz WAV format.
[00:00.500 --> 00:04.250] SPEAKER_00
[00:04.500 --> 00:08.750] SPEAKER_01
[00:09.000 --> 00:12.500] SPEAKER_00
pyannote.cpp/
├── src/ # ggml core library
├── include/ # ggml headers
├── examples/ # pyannote implementation
│ ├── main.cpp # main diarization pipeline
│ ├── vbx.cpp # clustering algorithm
│ ├── python/ # model conversion scripts
│ └── CMakeLists.txt
├── models/ # converted GGUF models (you create this)
└── CMakeLists.txt
For real-time processing:
./build/bin/pyannote-diarization-streaming \
models/pyannote-segmentation.gguf \
models/pyannote-embedding.gguf \
audio.wavThis project is based on ggml and follows the MIT License. See LICENSE for details.
- pyannote-audio: Original PyTorch implementation
- ggml: Efficient tensor library for machine learning
Contributions are welcome! Please feel free to submit issues and pull requests.
基于 ggml 实现的高性能 C/C++ pyannote-audio 说话人日志推理引擎。
pyannote.cpp 是 pyannote-audio 说话人日志流水线的纯 C++ 实现,针对 CPU 推理进行了优化,运行时无需 Python 依赖。它通过以下步骤回答"谁在何时说话"的问题:
- 语音活动检测 (VAD):使用 PyanNet (SincNet + LSTM) 检测语音片段
- 说话人特征提取:使用 Mel 滤波器组 + ResNet34 提取说话人特征
- 聚类:通过贪心余弦距离聚类分配说话人 ID
- 纯 C++ 推理:运行时无需 Python 依赖
- 基于 GGML:利用高效的 ggml 张量库
- GGUF 格式:模型以可移植的 GGUF 格式存储
- 跨平台:支持 x86_64、ARM、Apple Silicon
- CPU 优化:高效的 CPU 推理(GPU 支持规划中)
pip install torch numpy gguf asteroid-filterbanks- CMake 3.10+
- GCC/Clang 或 MSVC
- Git
从 HuggingFace 下载 PyTorch 模型:
- 分割模型:
pyannote/speaker-diarization-3.1(segmentation/子目录) - 嵌入模型:
pyannote/speaker-diarization-3.1(embedding/子目录)
或使用自动下载脚本:
python download_models.py转换为 GGUF 格式:
# 转换分割模型
python examples/python/convert_pyannote_to_ggml.py \
models/segmentation/pytorch_model.bin \
models/pyannote-segmentation.gguf
# 转换嵌入模型
python examples/python/convert_embedding_to_ggml.py \
models/embedding/pytorch_model.bin \
models/pyannote-embedding.ggufmkdir build && cd build
cmake ..
make -j$(nproc)macOS 系统:
cmake ..
make -j$(sysctl -n hw.ncpu)注意:当前仅支持 CPU backend。GPU 加速(CUDA、Metal、Vulkan)支持计划在未来版本中实现。
./build/bin/pyannote-diarization \
models/pyannote-segmentation.gguf \
models/pyannote-embedding.gguf \
audio.wav注意:输入��频必须是 16kHz WAV 格式。
[00:00.500 --> 00:04.250] SPEAKER_00
[00:04.500 --> 00:08.750] SPEAKER_01
[00:09.000 --> 00:12.500] SPEAKER_00
pyannote.cpp/
├── src/ # ggml 核心库
├── include/ # ggml 头文件
├── examples/ # pyannote 实现
│ ├── main.cpp # 主要日志流水线
│ ├── vbx.cpp # 聚类算法
│ ├── python/ # 模型转换脚本
│ └── CMakeLists.txt
├── models/ # 转换后的 GGUF 模型(需自行创建)
└── CMakeLists.txt
用于实时处理:
./build/bin/pyannote-diarization-streaming \
models/pyannote-segmentation.gguf \
models/pyannote-embedding.gguf \
audio.wav本项目基于 ggml,遵循 MIT 许可证。详见 LICENSE。
- pyannote-audio:原始 PyTorch 实现
- ggml:高效的机器学习张量库
欢迎贡献!请随时提交 issue 和 pull request。