Skip to main content

3 posts tagged with "UltraRAG"

UltraRAG related posts

View All Tags

UltraRAG 3.0: No More Black Boxes, Full Transparency in Reasoning

· 12 min read
Sen Mei
TsinghuaNLP

"Validating an algorithm prototype takes a week, but building a usable system takes months." This seemingly sarcastic quip is the real predicament every algorithm engineer must face.

Today, Tsinghua University's THUNLP Lab, Northeastern University's NEUIR Lab, OpenBMB, ModelBest and AI9Stars jointly release UltraRAG 3.0, addressing these pain points with a developer-centric technical framework featuring 3 core advantages:

  • One-click leap from logic to prototype, letting algorithm engineers focus on "algorithms": Provides a "what you see is what you get" Pipeline builder that automatically handles tedious interface encapsulation. Just focus on logic orchestration, and static code instantly becomes an interactive demo system.

  • Full-chain white-box transparency, "pixel-level" visualization of reasoning traces: Creates a "transparent" reasoning verification window, presenting in real-time every loop, branch, and decision detail of the model during complex long-chain tasks.

  • Built-in intelligent development assistant, your "interactive development guide": Embeds an AI assistant that understands the framework, assisting in generating Pipeline configurations and optimizing Prompts through natural language interaction, greatly lowering the barrier to entry.

UltraRAG 2.1: Deep Knowledge Integration, Cross-Modal Support

· 10 min read
Sen Mei
TsinghuaNLP

In the process of building knowledge bases, setting up experimental systems, and evaluating results, researchers always encounter similar challenges: How to achieve multimodal retrieval and generation within a unified framework? How to efficiently integrate multi-source knowledge? And how to make complex RAG experiments easier to build and reproduce?

UltraRAG 2.1 addresses these research challenges with comprehensive upgrades focused on practical needs. This update brings core enhancements in three directions: native multimodal support, automated knowledge integration and corpus construction, and unified build-and-evaluate RAG workflows:

  • Native Multimodal Support: Unified Retriever, Generation, and Evaluation modules with full multimodal retrieval and generation support; new VisRAG Pipeline enabling a complete closed-loop from local PDF indexing to multimodal retrieval and generation.
  • Automated Knowledge Integration & Corpus Construction: Supports multi-format document parsing and chunked indexing, seamlessly integrating MinerU for easy construction of personalized knowledge bases.
  • Unified Build & Evaluate RAG Workflows: Compatible with multiple retrieval and generation inference engines, providing a standardized evaluation system with full-chain visual analysis, achieving a unified process from model invocation to result verification.

Native Multimodal Support

Previously, multimodal RAG often relied on multiple independent tools: text tasks and visual tasks belonged to different workflows, requiring researchers to switch between feature extraction, retrieval, generation, and evaluation tools, with inconsistent interfaces and difficult reproducibility.

UltraRAG 2.1 systematically integrates the multimodal RAG pipeline. All core Servers — Retriever, Generation, and Evaluation — now natively support multimodal tasks and can flexibly connect to various visual, text, or cross-modal models. Researchers can freely orchestrate their own multimodal pipelines within the unified framework — whether for document QA, image-text retrieval, or cross-modal generation — all achievable with minimal effort for end-to-end integration. Additionally, the framework's built-in Benchmarks cover various tasks including visual QA, with a unified evaluation system for researchers to quickly conduct and compare multimodal experiments.

Building on this, UltraRAG 2.1 introduces the VisRAG Pipeline, enabling a complete closed-loop from local PDF indexing to multimodal retrieval and generation. This feature is based on the research in "VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents," which proposes a vision-enhanced retrieval-augmented generation framework for multimodal documents. By jointly modeling document image information (such as charts, formulas, layout structures) with text content, it significantly improves content understanding and QA capabilities for complex scientific documents. UltraRAG integrates this approach, enabling researchers to reproduce VisRAG experiments directly on real PDF document scenarios and further extend multimodal retrieval-generation research and applications.

Automated Knowledge Integration & Corpus Construction

During RAG development, developers need to repeatedly parse, clean, and chunk materials from different sources. As a result, the RAG construction process is often slowed by trivial engineering details, compressing the space for research innovation.

UltraRAG 2.1's Corpus Server makes all of this simple. Users can import corpora from different sources in one go without writing complex scripts — whether Word documents, e-books, or web archives — all automatically parsed into a unified text format. For PDF parsing, UltraRAG seamlessly integrates MinerU, accurately recognizing complex layouts and multi-column structures for high-fidelity text restoration. For mixed image-text files, it also supports converting PDFs page-by-page to images, making visual layouts part of the knowledge. For chunking strategies, Corpus Server offers multi-granularity options: supporting token-level, sentence-level, and custom rules, enabling fine-grained control of semantic boundaries while naturally adapting to structured text like Markdown.

UltraRAG 2.0: Minimal Code, Maximum Innovation

· 10 min read
Sen Mei
TsinghuaNLP
Chunyi Peng
Chunyi Peng
NEUIR

Retrieval-Augmented Generation (RAG) systems are evolving from the early simple "retrieval + generation" concatenation toward complex knowledge systems integrating adaptive knowledge organization, multi-round reasoning, and dynamic retrieval (typical examples include DeepResearch and Search-o1). However, this increase in complexity creates high engineering implementation costs for developers when reproducing methods and rapidly iterating on new ideas.

To address this pain point, Tsinghua University's THUNLP Lab, Northeastern University's NEUIR Lab, OpenBMB, and AI9Stars jointly launch UltraRAG 2.0 (UR-2.0) — the first RAG framework designed with Model Context Protocol (MCP) architecture. This design allows researchers to declare complex logic such as serial execution, loops, and conditional branches directly by writing YAML files, enabling rapid implementation of multi-stage reasoning systems with minimal code.

UltraRAG 2.0 highlights at a glance:

  • 🧩 Component-based Encapsulation: Encapsulates core RAG components as standardized independent MCP Servers;

  • 🔌 Flexible Invocation & Extension: Provides function-level Tool interfaces supporting flexible invocation and extension of capabilities;

  • 🪄 Lightweight Pipeline Orchestration: Leverages MCP Client to establish streamlined top-down pipeline construction; Compared to traditional frameworks, UltraRAG 2.0 significantly lowers the technical threshold and learning cost of complex RAG systems, allowing researchers to invest more energy in experimental design and algorithm innovation rather than getting bogged down in lengthy engineering implementation.

Simplifying Complexity — Only 5% Code for Low-Barrier Reproduction

The value of "simplicity" is particularly intuitive in practice. Taking IRCoT (https://arxiv.org/abs/2212.10509), a classic method, as an example — it relies on CoT generated by the model for multi-round retrieval until producing the final answer, making the overall process quite complex.

In the official implementation, the Pipeline portion alone requires nearly 900 lines of handwritten logic; even using other RAG frameworks still requires over 110 lines of code. In contrast, UltraRAG 2.0 achieves equivalent functionality with only about 50 lines of code. More notably, approximately half of that is YAML pseudo-code for orchestration, dramatically lowering the development threshold and implementation cost.

Simple Yet Extraordinary — Dozens of Lines of Code for High-Performance RAG Systems

For UltraRAG 2.0, "simplicity" does not mean limited functionality. Leveraging the MCP architecture and flexible YAML pipeline definitions, UltraRAG 2.0 provides researchers with a high-performance, extensible experimental platform. Researchers can build multi-stage reasoning systems similar to DeepResearch in a very short time, supporting advanced capabilities like dynamic retrieval, conditional judgment, and multi-round interaction.

In the example, we concatenate Retriever, Generation, Router and other modules through YAML to build a reasoning pipeline with both loops and conditional branches, implementing key steps like Plan Generation → Knowledge Organization → Sub-question Generation — all in under 100 lines of code.