Skip to main content

UltraRAG 3.0: No More Black Boxes, Full Transparency in Reasoning

· 12 min read
Sen Mei
TsinghuaNLP

"Validating an algorithm prototype takes a week, but building a usable system takes months." This seemingly sarcastic quip is the real predicament every algorithm engineer must face.

Today, Tsinghua University's THUNLP Lab, Northeastern University's NEUIR Lab, OpenBMB, ModelBest and AI9Stars jointly release UltraRAG 3.0, addressing these pain points with a developer-centric technical framework featuring 3 core advantages:

  • One-click leap from logic to prototype, letting algorithm engineers focus on "algorithms": Provides a "what you see is what you get" Pipeline builder that automatically handles tedious interface encapsulation. Just focus on logic orchestration, and static code instantly becomes an interactive demo system.

  • Full-chain white-box transparency, "pixel-level" visualization of reasoning traces: Creates a "transparent" reasoning verification window, presenting in real-time every loop, branch, and decision detail of the model during complex long-chain tasks.

  • Built-in intelligent development assistant, your "interactive development guide": Embeds an AI assistant that understands the framework, assisting in generating Pipeline configurations and optimizing Prompts through natural language interaction, greatly lowering the barrier to entry.

Logic as Application — A "Zero-Distance" Experience from Orchestration to Interaction

Let the endpoint of algorithms no longer be cold console logs. UltraRAG 3.0 automatically handles tedious interface encapsulation and parameter integration, ensuring that the moment logic orchestration is complete, an interactive demo interface is simultaneously generated:

  • Configuration as Application: Simply define the Pipeline's YAML configuration file, and the framework automatically parses and transforms it into a standard interactive Demo.
  • Dual-Mode Builder: To balance ease of use and flexibility, we've built a construction engine with real-time synchronization between visual and code modes:
    • Canvas Mode: Intuitively assemble complex logic like Loop and Branch through UI components, like building blocks.
    • Code Mode: Directly edit YAML configuration files with the canvas view rendering updates in real-time, meeting developers' needs for precise parameter fine-tuning.
  • One-Click Build & Verify: After construction, click the "Build" button at the top. The system automatically performs logic self-checks and syntax validation, dynamically generating parameter configuration panels. The instant parameters are ready, static algorithm logic instantly transforms into an interactive system, truly achieving "what you write is what you get, what you get is what you use."

Reject "Black Boxes" — Making Complex RAG Reasoning Traces Clearly Visible

As RAG technology evolves from simple single-round retrieval to multi-round dynamic decision-making, reasoning chains often extend to hundreds of Steps. Without intermediate state monitoring, the debugging process is like starting over in the fog, with error localization relying entirely on "guessing."

UltraRAG 3.0 redefines the Chat interface — it's not just the user interaction entry point, but also a logic verification window. We deeply understand that for developers, knowing "what the result is" is far from enough; seeing "how the result came about" is the key to optimization.

Through the "Show Thinking" panel, we provide pixel-level real-time visualization of the system's entire "thinking" process — from complex loop branches to specific tool calls, all intermediate states are presented in a structured streaming format. Even for complex long-process tasks like DeepResearch, developers can track execution progress in real-time, making the process no longer a dark wait. When Bad Cases appear, developers no longer need to dig through backend logs; they can directly compare retrieval slices with final answers on the interface, quickly determining whether the problem lies in "data layer noise" or "model layer hallucination," greatly shortening the optimization iteration cycle.

Here we selected two typical scenarios from the AgentCPM-Report workflow to demonstrate the practical effects of "white-box" debugging:

Breaking Free from "Framework Shackles" in Custom Development

Wanting to try a new algorithm logic often requires diving deep into the framework's internals and rewriting large amounts of inherited classes. To achieve 10% core algorithm innovation, one has to bear 90% of framework learning costs.

UltraRAG 3.0 embeds the entire development documentation and best practices into the framework's built-in intelligent assistant. While it may not help you write an entire project like Cursor, it is absolutely the most efficient assistive tool that understands UltraRAG. Through natural language interaction, it helps you completely bridge the cognitive gap between "reading documentation" and "writing configurations":

  • Configuration Generation: Just describe your requirements (e.g., "I want a pipeline with multi-way recall and reranking"), and the assistant automatically generates a standard Pipeline structure draft that can be used directly with minor adjustments.
  • Prompt Tuning: The assistant provides targeted Prompt optimization suggestions based on the current task context, quickly adapting to specific business scenarios.
  • Understanding Assistance: Can't understand a parameter or logic? No need to open a browser and browse through documentation. Just ask and get development suggestions and code examples, keeping the coding process uninterrupted.

Practical Demo: What It Can Do For You

We demonstrate four real interaction scenarios here, showing how it transforms natural language into "executable logic":

1. Structural Adjustment: Modify Pipeline with One Sentence

User: "Please help me modify the current Pipeline to add a Citation module for fact-checking the generated content."

2. Scenario Adaptation: Targeted Prompt Optimization

User: "I need to optimize the current Prompt for the legal domain. Please adjust the prompt so that the generated answers are more professional and accurate in terms of terminology and logical reasoning in this field."

3. Configuration Adjustment: Easily Modify Underlying Parameters

User: "I want to switch the generation backend configuration. Please change the generation model backend to OpenAI, change the model name to qwen3-32b, and the API service is deployed on port 65503."

4. Free Tuning: Shortcut from Concept to Implementation

User: "I want to reference this paper: https://arxiv.org/pdf/2410.08821 (DeepNote), to redesign my RAG pipeline. Please analyze the core ideas in the article and help me build a similar Pipeline architecture."

UltraRAG 2.1: Deep Knowledge Integration, Cross-Modal Support

· 10 min read
Sen Mei
TsinghuaNLP

In the process of building knowledge bases, setting up experimental systems, and evaluating results, researchers always encounter similar challenges: How to achieve multimodal retrieval and generation within a unified framework? How to efficiently integrate multi-source knowledge? And how to make complex RAG experiments easier to build and reproduce?

UltraRAG 2.1 addresses these research challenges with comprehensive upgrades focused on practical needs. This update brings core enhancements in three directions: native multimodal support, automated knowledge integration and corpus construction, and unified build-and-evaluate RAG workflows:

  • Native Multimodal Support: Unified Retriever, Generation, and Evaluation modules with full multimodal retrieval and generation support; new VisRAG Pipeline enabling a complete closed-loop from local PDF indexing to multimodal retrieval and generation.
  • Automated Knowledge Integration & Corpus Construction: Supports multi-format document parsing and chunked indexing, seamlessly integrating MinerU for easy construction of personalized knowledge bases.
  • Unified Build & Evaluate RAG Workflows: Compatible with multiple retrieval and generation inference engines, providing a standardized evaluation system with full-chain visual analysis, achieving a unified process from model invocation to result verification.

Native Multimodal Support

Previously, multimodal RAG often relied on multiple independent tools: text tasks and visual tasks belonged to different workflows, requiring researchers to switch between feature extraction, retrieval, generation, and evaluation tools, with inconsistent interfaces and difficult reproducibility.

UltraRAG 2.1 systematically integrates the multimodal RAG pipeline. All core Servers — Retriever, Generation, and Evaluation — now natively support multimodal tasks and can flexibly connect to various visual, text, or cross-modal models. Researchers can freely orchestrate their own multimodal pipelines within the unified framework — whether for document QA, image-text retrieval, or cross-modal generation — all achievable with minimal effort for end-to-end integration. Additionally, the framework's built-in Benchmarks cover various tasks including visual QA, with a unified evaluation system for researchers to quickly conduct and compare multimodal experiments.

Building on this, UltraRAG 2.1 introduces the VisRAG Pipeline, enabling a complete closed-loop from local PDF indexing to multimodal retrieval and generation. This feature is based on the research in "VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents," which proposes a vision-enhanced retrieval-augmented generation framework for multimodal documents. By jointly modeling document image information (such as charts, formulas, layout structures) with text content, it significantly improves content understanding and QA capabilities for complex scientific documents. UltraRAG integrates this approach, enabling researchers to reproduce VisRAG experiments directly on real PDF document scenarios and further extend multimodal retrieval-generation research and applications.

Automated Knowledge Integration & Corpus Construction

During RAG development, developers need to repeatedly parse, clean, and chunk materials from different sources. As a result, the RAG construction process is often slowed by trivial engineering details, compressing the space for research innovation.

UltraRAG 2.1's Corpus Server makes all of this simple. Users can import corpora from different sources in one go without writing complex scripts — whether Word documents, e-books, or web archives — all automatically parsed into a unified text format. For PDF parsing, UltraRAG seamlessly integrates MinerU, accurately recognizing complex layouts and multi-column structures for high-fidelity text restoration. For mixed image-text files, it also supports converting PDFs page-by-page to images, making visual layouts part of the knowledge. For chunking strategies, Corpus Server offers multi-granularity options: supporting token-level, sentence-level, and custom rules, enabling fine-grained control of semantic boundaries while naturally adapting to structured text like Markdown.

UltraRAG 2.1 图示 1

Through this automated pipeline, Corpus Server modularizes the corpus import, parsing, and chunking process, reducing manual scripting and format adaptation work, enabling knowledge base construction to be directly integrated into the standardized RAG pipeline workflow.

Unified Build & Evaluate RAG Workflows

"Chunking, indexing, retrieval, generation, evaluation — each step requires different scripts, too cumbersome!" "Every time I change a parameter or switch a model, do I need to rebuild the entire pipeline?" "After the experiment finally runs, how do I keep evaluation results consistent and comparable?"

These questions are frustrations that almost every RAG researcher has experienced. Existing frameworks often provide fragmented and incompatible support for retrieval, model integration, and evaluation, forcing researchers to repeatedly switch between different tools, with every modification potentially triggering a rebuild of the entire experimental chain. UltraRAG 2.1's goal is to make complex workflows clear and unified again.

At the retrieval level, the framework supports sparse, dense, hybrid, and multimodal retrieval, compatible with multiple backend engines including Infinity, Sentence-Transformers, and OpenAI. Researchers can freely combine retrieval strategies and models for flexible pipeline design. For model generation, UltraRAG 2.1 simultaneously supports vLLM offline inference and Hugging Face local debugging, while maintaining full compatibility with the OpenAI interface, making model switching and deployment require no code changes. For evaluation, UltraRAG builds a unified Evaluation Server that can compute metrics like ACC and ROUGE for generated results, and supports TREC evaluation and significance analysis for retrieval results. Combined with the visual Case Study UI, researchers can intuitively compare the performance of different models and strategies, making "debugging" truly become "understanding."

Furthermore, UltraRAG achieves full-chain integration from data import to retrieval, generation, and evaluation through a YAML configuration-driven workflow mechanism. Researchers only need to write minimal configuration files to quickly define and reproduce experimental workflows.

UltraRAG 2.1 图示 2

UltraRAG 2.0: Minimal Code, Maximum Innovation

· 10 min read
Sen Mei
TsinghuaNLP
Chunyi Peng
Chunyi Peng
NEUIR

Retrieval-Augmented Generation (RAG) systems are evolving from the early simple "retrieval + generation" concatenation toward complex knowledge systems integrating adaptive knowledge organization, multi-round reasoning, and dynamic retrieval (typical examples include DeepResearch and Search-o1). However, this increase in complexity creates high engineering implementation costs for developers when reproducing methods and rapidly iterating on new ideas.

To address this pain point, Tsinghua University's THUNLP Lab, Northeastern University's NEUIR Lab, OpenBMB, and AI9Stars jointly launch UltraRAG 2.0 (UR-2.0) — the first RAG framework designed with Model Context Protocol (MCP) architecture. This design allows researchers to declare complex logic such as serial execution, loops, and conditional branches directly by writing YAML files, enabling rapid implementation of multi-stage reasoning systems with minimal code.

UltraRAG 2.0 highlights at a glance:

  • 🧩 Component-based Encapsulation: Encapsulates core RAG components as standardized independent MCP Servers;

  • 🔌 Flexible Invocation & Extension: Provides function-level Tool interfaces supporting flexible invocation and extension of capabilities;

  • 🪄 Lightweight Pipeline Orchestration: Leverages MCP Client to establish streamlined top-down pipeline construction; Compared to traditional frameworks, UltraRAG 2.0 significantly lowers the technical threshold and learning cost of complex RAG systems, allowing researchers to invest more energy in experimental design and algorithm innovation rather than getting bogged down in lengthy engineering implementation.

Simplifying Complexity — Only 5% Code for Low-Barrier Reproduction

The value of "simplicity" is particularly intuitive in practice. Taking IRCoT (https://arxiv.org/abs/2212.10509), a classic method, as an example — it relies on CoT generated by the model for multi-round retrieval until producing the final answer, making the overall process quite complex.

In the official implementation, the Pipeline portion alone requires nearly 900 lines of handwritten logic; even using other RAG frameworks still requires over 110 lines of code. In contrast, UltraRAG 2.0 achieves equivalent functionality with only about 50 lines of code. More notably, approximately half of that is YAML pseudo-code for orchestration, dramatically lowering the development threshold and implementation cost.

Simple Yet Extraordinary — Dozens of Lines of Code for High-Performance RAG Systems

For UltraRAG 2.0, "simplicity" does not mean limited functionality. Leveraging the MCP architecture and flexible YAML pipeline definitions, UltraRAG 2.0 provides researchers with a high-performance, extensible experimental platform. Researchers can build multi-stage reasoning systems similar to DeepResearch in a very short time, supporting advanced capabilities like dynamic retrieval, conditional judgment, and multi-round interaction.

In the example, we concatenate Retriever, Generation, Router and other modules through YAML to build a reasoning pipeline with both loops and conditional branches, implementing key steps like Plan Generation → Knowledge Organization → Sub-question Generation — all in under 100 lines of code.

UltraRAG 2.0 图示 1

In terms of performance, this system achieves a ~12% performance improvement over Vanilla RAG on complex multi-hop questions, fully validating UltraRAG 2.0's potential in rapidly building complex reasoning systems.

UltraRAG 2.0 图示 2

UltraRAG 2.0 makes building complex reasoning systems truly low-code, high-performance, and production-ready. Users can not only achieve performance improvements in research tasks but also quickly deploy in industry applications such as intelligent customer service, educational tutoring, and medical QA, delivering more reliable knowledge-enhanced answers.

MCP Architecture and Native Pipeline Control

Across different RAG systems, core capabilities like retrieval and generation are functionally highly similar, but due to varying developer implementation strategies, modules often lack unified interfaces and are difficult to reuse across projects. Model Context Protocol (MCP), as an open protocol, standardizes the way context is provided for Large Language Models (LLMs) and adopts a Client-Server architecture, enabling Server components developed following this protocol to be seamlessly reused across different systems.

Inspired by this, UltraRAG 2.0 abstracts and encapsulates core RAG functions such as retrieval, generation, and evaluation as mutually independent MCP Servers based on the MCP architecture, with invocation through standardized function-level Tool interfaces. This design ensures flexibility in module capability extension while allowing new modules to be integrated in a "hot-pluggable" manner without invasive modifications to the overall codebase. In research scenarios, this architecture enables researchers to rapidly adapt to new models or algorithms with minimal code while maintaining the stability and consistency of the overall system.

UltraRAG 2.0 图示 3

The development of complex RAG reasoning frameworks is significantly challenging, and the reason UltraRAG 2.0 can support complex system construction under low-code conditions lies in its underlying native support for multi-structure Pipeline flow control. Whether serial, loop, or conditional branch, all control logic can be defined and scheduled at the YAML level, covering various process expression methods required by complex reasoning tasks. During actual execution, reasoning process scheduling is performed by the built-in Client, whose logic is entirely described by external Pipeline YAML scripts written by users, achieving decoupling from the underlying implementation. Developers can invoke instructions like loop and step like programming language keywords, rapidly building multi-stage reasoning pipelines in a declarative manner.

By deeply integrating MCP architecture with native process control, UltraRAG 2.0 makes building complex RAG systems as natural and efficient as "orchestrating workflows." Additionally, the framework includes 17 mainstream benchmark tasks and multiple high-quality baselines, combined with a unified evaluation system and knowledge base support, further improving system development efficiency and experimental reproducibility.