Skip to content

Mignonmy/UltraRAG

 
 

UltraRAG

Less Code, Lower Barrier, Faster Deployment

| Project Page | Documentation | Datasets | English | 简体中文 |


Latest News 🔥

  • [2025.08.28] 🎉 Released UltraRAG 2.0! UltraRAG 2.0 is fully upgraded: build high-performance RAG with just a few dozen lines of code, empowering researchers to focus on ideas and innovation!

UltraRAG 2.0: Accelerating RAG for Scientific Research

Retrieval-Augmented Generation (RAG) systems are evolving from early-stage simple concatenations of “retrieval + generation” to complex knowledge systems integrating adaptive knowledge organization, multi-turn reasoning, and dynamic retrieval (typical examples include DeepResearch and Search-o1). However, this increase in complexity imposes high engineering costs on researchers when it comes to method reproduction and rapid iteration of new ideas.

To address this challenge, THUNLP, NEUIR, OpenBMB, and AI9stars jointly launched UltraRAG 2.0 (UR-2.0) — the first RAG framework based on the Model Context Protocol (MCP) architecture design. This design allows researchers to declare complex logic such as sequential, loop, and conditional branching simply by writing YAML files, enabling rapid implementation of multi-stage reasoning systems with minimal code.

Its core ideas are:

  • Modular encapsulation: Encapsulate RAG core components as standardized independent MCP Servers;
  • Flexible invocation and extension: Provide function-level Tool interfaces to support flexible function calls and extensions;
  • Lightweight workflow orchestration: Use MCP Client to build a top-down simplified linkage;

Compared with traditional frameworks, UltraRAG 2.0 significantly lowers the technical threshold and learning cost of complex RAG systems, allowing researchers to focus more on experimental design and algorithm innovation rather than lengthy engineering implementations.

🌟 Key Highlights

  • 🚀 Low-Code Construction of Complex Pipelines
    Natively supports sequential, loop, conditional branching and other inference control structures. Developers only need to write YAML files to build iterative RAG workflows with dozens of lines of code (e.g., Search-o1).

  • Rapid Reproduction and Functional Extension
    Based on the MCP architecture, all modules are encapsulated as independent, reusable Servers.

    • Users can customize Servers as needed or directly reuse existing modules;
    • Each Server’s functions are registered as function-level Tools, and new functions can be integrated into the complete workflow by adding a single function;
    • It also supports calling external MCP Servers, easily extending pipeline capabilities and application scenarios.
  • 📊 Unified Evaluation and Comparison
    Built-in standardized evaluation workflows and metric management, out-of-the-box support for 17 mainstream scientific benchmarks.

    • Continuously integrate the latest baselines;
    • Provide leaderboard results;
    • Facilitate systematic comparison and optimization experiments for researchers.

The Secret Sauce: MCP Architecture and Native Pipeline Control

In different RAG systems, core capabilities such as retrieval and generation share high functional similarity, but due to diverse implementation strategies by developers, modules often lack unified interfaces, making cross-project reuse difficult. The Model Context Protocol (MCP) is an open protocol that standardizes the way to provide context for large language models (LLMs) and adopts a Client–Server architecture, enabling MCP-compliant Server components to be seamlessly reused across different systems.

Inspired by this, UltraRAG 2.0 is based on the MCP architecture, abstracting and encapsulating core functions such as retrieval, generation, and evaluation in RAG systems into independent MCP Servers, and invoking them through standardized function-level Tool interfaces. This design ensures flexible module function extension and allows new modules to be “hot-plugged” without invasive modifications to global code. In scientific research scenarios, this architecture enables researchers to quickly adapt new models or algorithms with minimal code while maintaining overall system stability and consistency.

UltraRAG

Developing complex RAG inference frameworks is significantly challenging. UltraRAG 2.0’s ability to support complex systems under low-code conditions lies in its native support for multi-structured pipeline workflow control. Whether sequential, loop, or conditional branching, all control logic can be defined and orchestrated at the YAML level, covering various workflow expression forms needed for complex inference tasks. During runtime, inference workflow scheduling is executed by the built-in Client, whose logic is fully described by user-written external Pipeline YAML scripts, achieving decoupling from the underlying implementation. Developers can call instructions like loop and step as if using programming language keywords, quickly constructing multi-stage inference workflows in a declarative manner.

By deeply integrating the MCP architecture with native workflow control, UltraRAG 2.0 makes building complex RAG systems as natural and efficient as “orchestrating workflows.” Additionally, the framework includes 17 mainstream benchmark tasks and multiple high-quality baselines, combined with a unified evaluation system and knowledge base support, further enhancing system development efficiency and experiment reproducibility.

Installation

Create a virtual environment using Conda:

conda create -n ultrarag python=3.11
conda activate ultrarag

Clone the project locally or on a server via git:

git clone https://github.com/OpenBMB/UltraRAG.git
cd UltraRAG

We recommend using uv for package management, providing faster and more reliable Python dependency management:

pip install uv
uv pip install -e .

If you prefer pip, you can directly run:

pip install -e .

[Optional] UR-2.0 supports rich Server components; developers can flexibly install dependencies according to actual tasks:

# If you want to use faiss for vector indexing:
# You need to manually compile and install the CPU or GPU version of FAISS depending on your hardware environment:
# CPU version:
uv pip install faiss-cpu
# GPU version (example: CUDA 12.x)
uv pip install faiss-gpu-cu12
# For other CUDA versions, install the corresponding package (e.g., faiss-gpu-cu11 for CUDA 11.x).

# If you want to use infinity_emb for corpus encoding and indexing:
uv pip install -e ."[infinity_emb]"

# If you want to use lancedb vector database:
uv pip install -e ."[lancedb]"

# If you want to deploy models with vLLM service:
uv pip install -e ."[vllm]"

# If you want to use corpus document parsing functionality:
uv pip install -e ."[corpus]"

# ====== Install all dependencies (except faiss) ======
uv pip install -e ."[all]"

Run the following command to verify a successful installation:

# If the installation was successful, you should see the welcome message 'Hello, UltraRAG 2.0!'
ultrarag run examples/sayhello.yaml

Quick Start

We provide a complete set of tutorials ranging from beginner to advanced. Visit the tutorial documentation to quickly get started with UltraRAG 2.0!

Read the Quick Start guide to learn the UltraRAG workflow, which consists of three steps: (1) compile the Pipeline file to generate the parameter configuration, (2) modify the parameter file, and (3) run the Pipeline file.

In addition, we have prepared a directory of commonly used research functions, where you can directly jump to the desired module:

Support

UltraRAG 2.0 is ready to use out-of-the-box, natively supporting the most commonly used public evaluation datasets, large-scale corpus, and typical baseline methods in the current RAG field, facilitating rapid reproduction and extension of experiments for researchers. You can also refer to the Data Format Specification to flexibly customize and add any datasets or corpus. The full datasets are available for access and download through this link.

1. Supported Datasets

Task Type Dataset Name Original Data Size Evaluation Sample Size
QA NQ 3,610 1,000
QA TriviaQA 11,313 1,000
QA PopQA 14,267 1,000
QA AmbigQA 2,002 1,000
QA MarcoQA 55,636 1,000
QA WebQuestions 2,032 1,000
Multi-hop QA HotpotQA 7,405 1,000
Multi-hop QA 2WikiMultiHopQA 12,576 1,000
Multi-hop QA Musique 2,417 1,000
Multi-hop QA Bamboogle 125 125
Multi-hop QA StrategyQA 2,290 1,000
Multiple-choice ARC 3,548 1,000
Multiple-choice MMLU 14,042 1,000
Long-form QA ASQA 948 948
Fact-verification FEVER 13,332 1,000
Dialogue WoW 3,054 1,000
Slot-filling T-REx 5,000 1,000

2. Supported Corpus

Corpus Name Document Count
wiki-2018 21,015,324
wiki-2024 Under preparation, coming soon

3. Supported Baseline Methods (Continuously Updated)

Baseline Name Script
Vanilla LLM examples/vanilla.yaml
Vanilla RAG examples/rag.yaml
IRCoT examples/IRCoT.yaml
IterRetGen examples/IterRetGen.yaml
RankCoT examples/RankCoT.yaml
R1-searcher examples/r1_searcher.yaml
Search-o1 examples/search_o1.yaml
Search-r1 examples/search_r1.yaml
WebNote examples/webnote.yaml

Contributing

Thanks to the following contributors for their code submissions and testing. We also welcome new members to join us in collectively building a comprehensive RAG ecosystem!

You can contribute by following the standard process: fork this repository, submit issues, and create pull requests (PRs).

Support Us

If you find this repository helpful for your research, please consider giving us a ⭐ to show your support.

Contact Us

  • For technical issues and feature requests, please use GitHub Issues.
  • For questions about usage, feedback, or any discussions related to RAG technologies, you are welcome to join our WeChat group, Feishu group, and Discord to exchange ideas with us.

About

UltraRAG 2.0: Accelerating RAG for Scientific Research

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 87.7%
  • Jinja 12.1%
  • Shell 0.2%