pyannote.cpp

English

High-performance C/C++ inference engine for pyannote-audio speaker diarization, built on ggml.

What is pyannote.cpp?

pyannote.cpp is a pure C++ implementation of the pyannote-audio speaker diarization pipeline, optimized for CPU inference without requiring Python at runtime. It answers the question "who spoke when?" by:

Voice Activity Detection (VAD): Detecting speech segments using PyanNet (SincNet + LSTM)
Speaker Embedding: Extracting speaker features using Mel-filterbanks + ResNet34
Clustering: Assigning speaker IDs via greedy cosine-distance clustering

Features

Pure C++ Inference: No Python dependency at runtime
GGML-based: Leverages the efficient ggml tensor library
GGUF Format: Models stored in the portable GGUF format
Cross-platform: Supports x86_64, ARM, Apple Silicon
CPU Optimized: Efficient CPU inference (GPU support planned)

Prerequisites

For Model Conversion (Python required)

pip install torch numpy gguf asteroid-filterbanks

For Building (C++ toolchain)

CMake 3.10+
GCC/Clang or MSVC
Git

Quick Start

1. Download and Convert Models

Download the PyTorch models from HuggingFace:

Segmentation model: pyannote/speaker-diarization-3.1 (subfolder segmentation/)
Embedding model: pyannote/speaker-diarization-3.1 (subfolder embedding/)

Or use the automated download script:

python download_models.py

Convert to GGUF format:

# Convert segmentation model
python examples/python/convert_pyannote_to_ggml.py \
  models/segmentation/pytorch_model.bin \
  models/pyannote-segmentation.gguf

# Convert embedding model
python examples/python/convert_embedding_to_ggml.py \
  models/embedding/pytorch_model.bin \
  models/pyannote-embedding.gguf

2. Build

mkdir build && cd build
cmake ..
make -j$(nproc)

On macOS:

cmake ..
make -j$(sysctl -n hw.ncpu)

Note: Currently only CPU backend is supported. GPU acceleration (CUDA, Metal, Vulkan) support is planned for future releases.

3. Run Inference

./build/bin/pyannote-diarization \
  models/pyannote-segmentation.gguf \
  models/pyannote-embedding.gguf \
  audio.wav

Note: Input audio must be 16kHz WAV format.

Output Format

[00:00.500 --> 00:04.250] SPEAKER_00
[00:04.500 --> 00:08.750] SPEAKER_01
[00:09.000 --> 00:12.500] SPEAKER_00

Project Structure

pyannote.cpp/
├── src/               # ggml core library
├── include/           # ggml headers
├── examples/          # pyannote implementation
│   ├── main.cpp       # main diarization pipeline
│   ├── vbx.cpp        # clustering algorithm
│   ├── python/        # model conversion scripts
│   └── CMakeLists.txt
├── models/            # converted GGUF models (you create this)
└── CMakeLists.txt

Advanced Usage

Streaming Mode

For real-time processing:

./build/bin/pyannote-diarization-streaming \
  models/pyannote-segmentation.gguf \
  models/pyannote-embedding.gguf \
  audio.wav

License

This project is based on ggml and follows the MIT License. See LICENSE for details.

Acknowledgments

pyannote-audio: Original PyTorch implementation
ggml: Efficient tensor library for machine learning

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

中文

基于 ggml 实现的高性能 C/C++ pyannote-audio 说话人日志推理引擎。

什么是 pyannote.cpp？

pyannote.cpp 是 pyannote-audio 说话人日志流水线的纯 C++ 实现，针对 CPU 推理进行了优化，运行时无需 Python 依赖。它通过以下步骤回答"谁在何时说话"的问题：

语音活动检测 (VAD)：使用 PyanNet (SincNet + LSTM) 检测语音片段
说话人特征提取：使用 Mel 滤波器组 + ResNet34 提取说话人特征
聚类：通过贪心余弦距离聚类分配说话人 ID

特性

纯 C++ 推理：运行时无需 Python 依赖
基于 GGML：利用高效的 ggml 张量库
GGUF 格式：模型以可移植的 GGUF 格式存储
跨平台：支持 x86_64、ARM、Apple Silicon
CPU 优化：高效的 CPU 推理（GPU 支持规划中）

环境要求

模型转换（需要 Python）

pip install torch numpy gguf asteroid-filterbanks

编译构建（需要 C++ 工具链）

CMake 3.10+
GCC/Clang 或 MSVC
Git

快速开始

1. 下载和转换模型

从 HuggingFace 下载 PyTorch 模型：

分割模型：pyannote/speaker-diarization-3.1（segmentation/ 子目录）
嵌入模型：pyannote/speaker-diarization-3.1（embedding/ 子目录）

或使用自动下载脚本：

python download_models.py

转换为 GGUF 格式：

# 转换分割模型
python examples/python/convert_pyannote_to_ggml.py \
  models/segmentation/pytorch_model.bin \
  models/pyannote-segmentation.gguf

# 转换嵌入模型
python examples/python/convert_embedding_to_ggml.py \
  models/embedding/pytorch_model.bin \
  models/pyannote-embedding.gguf

2. 编译

mkdir build && cd build
cmake ..
make -j$(nproc)

macOS 系统：

cmake ..
make -j$(sysctl -n hw.ncpu)

注意：当前仅支持 CPU backend。GPU 加速（CUDA、Metal、Vulkan）支持计划在未来版本中实现。

3. 运行推理

./build/bin/pyannote-diarization \
  models/pyannote-segmentation.gguf \
  models/pyannote-embedding.gguf \
  audio.wav

注意：输入��频必须是 16kHz WAV 格式。

输出格式

[00:00.500 --> 00:04.250] SPEAKER_00
[00:04.500 --> 00:08.750] SPEAKER_01
[00:09.000 --> 00:12.500] SPEAKER_00

项目结构

pyannote.cpp/
├── src/               # ggml 核心库
├── include/           # ggml 头文件
├── examples/          # pyannote 实现
│   ├── main.cpp       # 主要日志流水线
│   ├── vbx.cpp        # 聚类算法
│   ├── python/        # 模型转换脚本
│   └── CMakeLists.txt
├── models/            # 转换后的 GGUF 模型（需自行创建）
└── CMakeLists.txt

高级用法

流式模式

用于实时处理：

./build/bin/pyannote-diarization-streaming \
  models/pyannote-segmentation.gguf \
  models/pyannote-embedding.gguf \
  audio.wav

许可证

本项目基于 ggml，遵循 MIT 许可证。详见 LICENSE。

致谢

pyannote-audio：原始 PyTorch 实现
ggml：高效的机器学习张量库

贡献

欢迎贡献！请随时提交 issue 和 pull request。

Name		Name	Last commit message	Last commit date
Latest commit History 3,242 Commits
ci		ci
cmake		cmake
docs		docs
examples		examples
include		include
scripts		scripts
src		src
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
AUTHORS		AUTHORS
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ggml.pc.in		ggml.pc.in
requirements.txt		requirements.txt

License

lhpqaq/pyannote.cpp

Folders and files

Latest commit

History

Repository files navigation

pyannote.cpp

English

What is pyannote.cpp?

Features

Prerequisites

For Model Conversion (Python required)

For Building (C++ toolchain)

Quick Start

1. Download and Convert Models

2. Build

3. Run Inference

Output Format

Project Structure

Advanced Usage

Streaming Mode

License

Acknowledgments

Contributing

中文

什么是 pyannote.cpp？

特性

环境要求

模型转换（需要 Python）

编译构建（需要 C++ 工具链）

快速开始

1. 下载和转换模型

2. 编译

3. 运行推理

输出格式

项目结构

高级用法

流式模式

许可证

致谢

贡献

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 540

Uh oh!

Languages

Packages