| Project Page | Documentation | Datasets | English | 简体中文 |
Latest News 🔥
- [2025.08.28] 🎉 Released UltraRAG 2.0! UltraRAG 2.0 is fully upgraded: build high-performance RAG with just a few dozen lines of code, empowering researchers to focus on ideas and innovation!
Retrieval-Augmented Generation (RAG) systems are evolving from early-stage simple concatenations of “retrieval + generation” to complex knowledge systems integrating adaptive knowledge organization, multi-turn reasoning, and dynamic retrieval (typical examples include DeepResearch and Search-o1). However, this increase in complexity imposes high engineering costs on researchers when it comes to method reproduction and rapid iteration of new ideas.
To address this challenge, THUNLP, NEUIR, OpenBMB, and AI9stars jointly launched UltraRAG 2.0 (UR-2.0) — the first RAG framework based on the Model Context Protocol (MCP) architecture design. This design allows researchers to declare complex logic such as sequential, loop, and conditional branching simply by writing YAML files, enabling rapid implementation of multi-stage reasoning systems with minimal code.
Its core ideas are:
- Modular encapsulation: Encapsulate RAG core components as standardized independent MCP Servers;
- Flexible invocation and extension: Provide function-level Tool interfaces to support flexible function calls and extensions;
- Lightweight workflow orchestration: Use MCP Client to build a top-down simplified linkage;
Compared with traditional frameworks, UltraRAG 2.0 significantly lowers the technical threshold and learning cost of complex RAG systems, allowing researchers to focus more on experimental design and algorithm innovation rather than lengthy engineering implementations.
-
🚀 Low-Code Construction of Complex Pipelines
Natively supports sequential, loop, conditional branching and other inference control structures. Developers only need to write YAML files to build iterative RAG workflows with dozens of lines of code (e.g., Search-o1). -
⚡ Rapid Reproduction and Functional Extension
Based on the MCP architecture, all modules are encapsulated as independent, reusable Servers.- Users can customize Servers as needed or directly reuse existing modules;
- Each Server’s functions are registered as function-level Tools, and new functions can be integrated into the complete workflow by adding a single function;
- It also supports calling external MCP Servers, easily extending pipeline capabilities and application scenarios.
-
📊 Unified Evaluation and Comparison
Built-in standardized evaluation workflows and metric management, out-of-the-box support for 17 mainstream scientific benchmarks.- Continuously integrate the latest baselines;
- Provide leaderboard results;
- Facilitate systematic comparison and optimization experiments for researchers.
In different RAG systems, core capabilities such as retrieval and generation share high functional similarity, but due to diverse implementation strategies by developers, modules often lack unified interfaces, making cross-project reuse difficult. The Model Context Protocol (MCP) is an open protocol that standardizes the way to provide context for large language models (LLMs) and adopts a Client–Server architecture, enabling MCP-compliant Server components to be seamlessly reused across different systems.
Inspired by this, UltraRAG 2.0 is based on the MCP architecture, abstracting and encapsulating core functions such as retrieval, generation, and evaluation in RAG systems into independent MCP Servers, and invoking them through standardized function-level Tool interfaces. This design ensures flexible module function extension and allows new modules to be “hot-plugged” without invasive modifications to global code. In scientific research scenarios, this architecture enables researchers to quickly adapt new models or algorithms with minimal code while maintaining overall system stability and consistency.
Developing complex RAG inference frameworks is significantly challenging. UltraRAG 2.0’s ability to support complex systems under low-code conditions lies in its native support for multi-structured pipeline workflow control. Whether sequential, loop, or conditional branching, all control logic can be defined and orchestrated at the YAML level, covering various workflow expression forms needed for complex inference tasks. During runtime, inference workflow scheduling is executed by the built-in Client, whose logic is fully described by user-written external Pipeline YAML scripts, achieving decoupling from the underlying implementation. Developers can call instructions like loop and step as if using programming language keywords, quickly constructing multi-stage inference workflows in a declarative manner.
By deeply integrating the MCP architecture with native workflow control, UltraRAG 2.0 makes building complex RAG systems as natural and efficient as “orchestrating workflows.” Additionally, the framework includes 17 mainstream benchmark tasks and multiple high-quality baselines, combined with a unified evaluation system and knowledge base support, further enhancing system development efficiency and experiment reproducibility.
Create a virtual environment using Conda:
conda create -n ultrarag python=3.11
conda activate ultraragClone the project locally or on a server via git:
git clone https://github.com/OpenBMB/UltraRAG.git
cd UltraRAGWe recommend using uv for package management, providing faster and more reliable Python dependency management:
pip install uv
uv pip install -e .If you prefer pip, you can directly run:
pip install -e .[Optional] UR-2.0 supports rich Server components; developers can flexibly install dependencies according to actual tasks:
# If you want to use faiss for vector indexing:
# You need to manually compile and install the CPU or GPU version of FAISS depending on your hardware environment:
# CPU version:
uv pip install faiss-cpu
# GPU version (example: CUDA 12.x)
uv pip install faiss-gpu-cu12
# For other CUDA versions, install the corresponding package (e.g., faiss-gpu-cu11 for CUDA 11.x).
# If you want to use infinity_emb for corpus encoding and indexing:
uv pip install -e ."[infinity_emb]"
# If you want to use lancedb vector database:
uv pip install -e ."[lancedb]"
# If you want to deploy models with vLLM service:
uv pip install -e ."[vllm]"
# If you want to use corpus document parsing functionality:
uv pip install -e ."[corpus]"
# ====== Install all dependencies (except faiss) ======
uv pip install -e ."[all]"Run the following command to verify a successful installation:
# If the installation was successful, you should see the welcome message 'Hello, UltraRAG 2.0!'
ultrarag run examples/sayhello.yamlWe provide a complete set of tutorials ranging from beginner to advanced. Visit the tutorial documentation to quickly get started with UltraRAG 2.0!
Read the Quick Start guide to learn the UltraRAG workflow, which consists of three steps: (1) compile the Pipeline file to generate the parameter configuration, (2) modify the parameter file, and (3) run the Pipeline file.
In addition, we have prepared a directory of commonly used research functions, where you can directly jump to the desired module:
- Corpus Embedding and Indexing with Retriever
- Deploying Retriever
- Deploying LLM
- Baseline Reproduction
- Case Study of Experimental Results
- Debugging Guide
UltraRAG 2.0 is ready to use out-of-the-box, natively supporting the most commonly used public evaluation datasets, large-scale corpus, and typical baseline methods in the current RAG field, facilitating rapid reproduction and extension of experiments for researchers. You can also refer to the Data Format Specification to flexibly customize and add any datasets or corpus. The full datasets are available for access and download through this link.
| Task Type | Dataset Name | Original Data Size | Evaluation Sample Size |
|---|---|---|---|
| QA | NQ | 3,610 | 1,000 |
| QA | TriviaQA | 11,313 | 1,000 |
| QA | PopQA | 14,267 | 1,000 |
| QA | AmbigQA | 2,002 | 1,000 |
| QA | MarcoQA | 55,636 | 1,000 |
| QA | WebQuestions | 2,032 | 1,000 |
| Multi-hop QA | HotpotQA | 7,405 | 1,000 |
| Multi-hop QA | 2WikiMultiHopQA | 12,576 | 1,000 |
| Multi-hop QA | Musique | 2,417 | 1,000 |
| Multi-hop QA | Bamboogle | 125 | 125 |
| Multi-hop QA | StrategyQA | 2,290 | 1,000 |
| Multiple-choice | ARC | 3,548 | 1,000 |
| Multiple-choice | MMLU | 14,042 | 1,000 |
| Long-form QA | ASQA | 948 | 948 |
| Fact-verification | FEVER | 13,332 | 1,000 |
| Dialogue | WoW | 3,054 | 1,000 |
| Slot-filling | T-REx | 5,000 | 1,000 |
| Corpus Name | Document Count |
|---|---|
| wiki-2018 | 21,015,324 |
| wiki-2024 | Under preparation, coming soon |
| Baseline Name | Script |
|---|---|
| Vanilla LLM | examples/vanilla.yaml |
| Vanilla RAG | examples/rag.yaml |
| IRCoT | examples/IRCoT.yaml |
| IterRetGen | examples/IterRetGen.yaml |
| RankCoT | examples/RankCoT.yaml |
| R1-searcher | examples/r1_searcher.yaml |
| Search-o1 | examples/search_o1.yaml |
| Search-r1 | examples/search_r1.yaml |
| WebNote | examples/webnote.yaml |
Thanks to the following contributors for their code submissions and testing. We also welcome new members to join us in collectively building a comprehensive RAG ecosystem!
You can contribute by following the standard process: fork this repository, submit issues, and create pull requests (PRs).
If you find this repository helpful for your research, please consider giving us a ⭐ to show your support.
- For technical issues and feature requests, please use GitHub Issues.
- For questions about usage, feedback, or any discussions related to RAG technologies, you are welcome to join our WeChat group, Feishu group, and Discord to exchange ideas with us.