Skip to content

基于 GGML 实现的的 C/C++ 高性能 pyannote 推理引擎。

License

Notifications You must be signed in to change notification settings

lhpqaq/pyannote.cpp

Repository files navigation

pyannote.cpp

English | 中文


English

High-performance C/C++ inference engine for pyannote-audio speaker diarization, built on ggml.

What is pyannote.cpp?

pyannote.cpp is a pure C++ implementation of the pyannote-audio speaker diarization pipeline, optimized for CPU inference without requiring Python at runtime. It answers the question "who spoke when?" by:

  1. Voice Activity Detection (VAD): Detecting speech segments using PyanNet (SincNet + LSTM)
  2. Speaker Embedding: Extracting speaker features using Mel-filterbanks + ResNet34
  3. Clustering: Assigning speaker IDs via greedy cosine-distance clustering

Features

  • Pure C++ Inference: No Python dependency at runtime
  • GGML-based: Leverages the efficient ggml tensor library
  • GGUF Format: Models stored in the portable GGUF format
  • Cross-platform: Supports x86_64, ARM, Apple Silicon
  • CPU Optimized: Efficient CPU inference (GPU support planned)

Prerequisites

For Model Conversion (Python required)

pip install torch numpy gguf asteroid-filterbanks

For Building (C++ toolchain)

  • CMake 3.10+
  • GCC/Clang or MSVC
  • Git

Quick Start

1. Download and Convert Models

Download the PyTorch models from HuggingFace:

  • Segmentation model: pyannote/speaker-diarization-3.1 (subfolder segmentation/)
  • Embedding model: pyannote/speaker-diarization-3.1 (subfolder embedding/)

Or use the automated download script:

python download_models.py

Convert to GGUF format:

# Convert segmentation model
python examples/python/convert_pyannote_to_ggml.py \
  models/segmentation/pytorch_model.bin \
  models/pyannote-segmentation.gguf

# Convert embedding model
python examples/python/convert_embedding_to_ggml.py \
  models/embedding/pytorch_model.bin \
  models/pyannote-embedding.gguf

2. Build

mkdir build && cd build
cmake ..
make -j$(nproc)

On macOS:

cmake ..
make -j$(sysctl -n hw.ncpu)

Note: Currently only CPU backend is supported. GPU acceleration (CUDA, Metal, Vulkan) support is planned for future releases.

3. Run Inference

./build/bin/pyannote-diarization \
  models/pyannote-segmentation.gguf \
  models/pyannote-embedding.gguf \
  audio.wav

Note: Input audio must be 16kHz WAV format.

Output Format

[00:00.500 --> 00:04.250] SPEAKER_00
[00:04.500 --> 00:08.750] SPEAKER_01
[00:09.000 --> 00:12.500] SPEAKER_00

Project Structure

pyannote.cpp/
├── src/               # ggml core library
├── include/           # ggml headers
├── examples/          # pyannote implementation
│   ├── main.cpp       # main diarization pipeline
│   ├── vbx.cpp        # clustering algorithm
│   ├── python/        # model conversion scripts
│   └── CMakeLists.txt
├── models/            # converted GGUF models (you create this)
└── CMakeLists.txt

Advanced Usage

Streaming Mode

For real-time processing:

./build/bin/pyannote-diarization-streaming \
  models/pyannote-segmentation.gguf \
  models/pyannote-embedding.gguf \
  audio.wav

License

This project is based on ggml and follows the MIT License. See LICENSE for details.

Acknowledgments

  • pyannote-audio: Original PyTorch implementation
  • ggml: Efficient tensor library for machine learning

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.


中文

基于 ggml 实现的高性能 C/C++ pyannote-audio 说话人日志推理引擎。

什么是 pyannote.cpp?

pyannote.cpp 是 pyannote-audio 说话人日志流水线的纯 C++ 实现,针对 CPU 推理进行了优化,运行时无需 Python 依赖。它通过以下步骤回答"谁在何时说话"的问题:

  1. 语音活动检测 (VAD):使用 PyanNet (SincNet + LSTM) 检测语音片段
  2. 说话人特征提取:使用 Mel 滤波器组 + ResNet34 提取说话人特征
  3. 聚类:通过贪心余弦距离聚类分配说话人 ID

特性

  • 纯 C++ 推理:运行时无需 Python 依赖
  • 基于 GGML:利用高效的 ggml 张量库
  • GGUF 格式:模型以可移植的 GGUF 格式存储
  • 跨平台:支持 x86_64、ARM、Apple Silicon
  • CPU 优化:高效的 CPU 推理(GPU 支持规划中)

环境要求

模型转换(需要 Python)

pip install torch numpy gguf asteroid-filterbanks

编译构建(需要 C++ 工具链)

  • CMake 3.10+
  • GCC/Clang 或 MSVC
  • Git

快速开始

1. 下载和转换模型

从 HuggingFace 下载 PyTorch 模型:

  • 分割模型:pyannote/speaker-diarization-3.1segmentation/ 子目录)
  • 嵌入模型:pyannote/speaker-diarization-3.1embedding/ 子目录)

或使用自动下载脚本:

python download_models.py

转换为 GGUF 格式:

# 转换分割模型
python examples/python/convert_pyannote_to_ggml.py \
  models/segmentation/pytorch_model.bin \
  models/pyannote-segmentation.gguf

# 转换嵌入模型
python examples/python/convert_embedding_to_ggml.py \
  models/embedding/pytorch_model.bin \
  models/pyannote-embedding.gguf

2. 编译

mkdir build && cd build
cmake ..
make -j$(nproc)

macOS 系统:

cmake ..
make -j$(sysctl -n hw.ncpu)

注意:当前仅支持 CPU backend。GPU 加速(CUDA、Metal、Vulkan)支持计划在未来版本中实现。

3. 运行推理

./build/bin/pyannote-diarization \
  models/pyannote-segmentation.gguf \
  models/pyannote-embedding.gguf \
  audio.wav

注意:输入��频必须是 16kHz WAV 格式。

输出格式

[00:00.500 --> 00:04.250] SPEAKER_00
[00:04.500 --> 00:08.750] SPEAKER_01
[00:09.000 --> 00:12.500] SPEAKER_00

项目结构

pyannote.cpp/
├── src/               # ggml 核心库
├── include/           # ggml 头文件
├── examples/          # pyannote 实现
│   ├── main.cpp       # 主要日志流水线
│   ├── vbx.cpp        # 聚类算法
│   ├── python/        # 模型转换脚本
│   └── CMakeLists.txt
├── models/            # 转换后的 GGUF 模型(需自行创建)
└── CMakeLists.txt

高级用法

流式模式

用于实时处理:

./build/bin/pyannote-diarization-streaming \
  models/pyannote-segmentation.gguf \
  models/pyannote-embedding.gguf \
  audio.wav

许可证

本项目基于 ggml,遵循 MIT 许可证。详见 LICENSE

致谢

贡献

欢迎贡献!请随时提交 issue 和 pull request。

About

基于 GGML 实现的的 C/C++ 高性能 pyannote 推理引擎。

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 540