RAG-Fusion: Enhancing RAG using multiple retrieval strategies

Page content

RAG-Fusion: The Future of Information Retrieval

Retrieval-Augmented Generation (RAG) is a powerful technique that enhances the performance of language models by integrating document retrieval into text generation. However, traditional RAG models rely on a single retrieval step, limiting their effectiveness when dealing with diverse or ambiguous queries. RAG-Fusion improves upon this by incorporating multiple retrieval strategies and fusing their results, leading to more comprehensive and accurate responses.

In this blog post, we will explore RAG-Fusion, its advantages over traditional RAG, and how to implement it using Python. We will use the code we’ve developed as a working example throughout the post.


Why RAG-Fusion?

Traditional RAG models work as follows:

  1. Convert a query into an embedding.
  2. Retrieve the top N documents from a knowledge base.
  3. Feed the retrieved documents into a language model to generate an answer.

While effective, this approach has limitations:

  • Single retrieval method bias: If the retrieval method fails, the generation suffers.
  • Lack of multi-perspective fusion: Queries often require diverse perspectives for better answers.
  • Fixed retrieval ranking: The system may prioritize suboptimal results.

How RAG-Fusion Solves These Issues

RAG-Fusion overcomes these limitations by:

  • Using multiple retrievers (e.g., BM25, FAISS, TF-IDF) for diverse document selection.
  • Employing Reciprocal Rank Fusion (RRF) to merge rankings from different retrieval strategies.
  • Optimizing retrieval results for multi-stage fusion, ensuring a robust final ranking.

Implementation Overview

We have designed a modular implementation of RAG-Fusion. The core components include:

  1. Document Extraction & Splitting → Extracts and segments text from PDFs.
  2. Retrievers → Retrieves relevant documents using BM25, TF-IDF, and FAISS.
  3. Multi-Stage Fusion → Combines results from multiple retrievers.
  4. LLM-Based Answer Generation → Uses a language model to generate responses based on retrieved documents.

We will walk through each component with relevant code snippets.


Step 1: Extracting & Splitting Documents

Before we can retrieve information, we must first process and split the text efficiently. We have implemented a configurable document extractor:

class TextExtractor(ABC):
    @abstractmethod
    def extract(self, pdf_path):
        pass

    @staticmethod
    def create_extractor(mode):
        extractors = {
            "full": FullPDFExtractor(),
            "page": PagePDFExtractor(),
            "hybrid": HybridPDFExtractor()
        }
        return extractors.get(mode, PagePDFExtractor())  # Default to page processing

This allows us to process entire documents, per-page extraction, or a hybrid approach based on configuration settings.


Step 2: Splitting Text for Efficient Retrieval

Once extracted, we split the text into meaningful segments using different strategies:

class TextSplitter(ABC):
    @abstractmethod
    def split(self, page_id, text):
        pass

    @staticmethod
    def create_splitter(method):
        splitters = {
            "sliding_window": SlidingWindowSplitter(),
            "hierarchical": HierarchicalSplitter(),
            "semantic": SemanticSplitter(),
            "paragraph": ParagraphSplitter()
        }
        return splitters.get(method, SlidingWindowSplitter())  # Default to sliding window

With this setup, users can choose sliding window, hierarchical, semantic, or paragraph-based splitting to optimize retrieval accuracy.


Step 3: Implementing Multiple Retrievers

A critical aspect of RAG-Fusion is the use of multiple retrievers to maximize information diversity. We implement the following:

class Retriever(ABC):
    @abstractmethod
    def retrieve(self, query, corpus):
        pass

    @staticmethod
    def create_retriever(method, vector_store=None):
        retriever_methods = {
            "bm25": BM25Retriever(),
            "tfidf": TFIDFRetriever(),
            "faiss": FAISSRetriever(vector_store) if vector_store else None
        }
        return retriever_methods.get(method, BM25Retriever())  # Default to BM25

Each retriever works differently:

  • BM25 → Lexical matching.
  • TF-IDF → Weighted term frequency.
  • FAISS → Dense vector search.

RAG-Fusion allows dynamic selection via configuration.


Step 4: Evaluating RAG vs. RAG-Fusion

To measure the performance of RAG-Fusion against traditional RAG, we use several evaluation metrics:

  1. Retrieval Effectiveness

    • Recall@K → Measures how often the correct document appears in the top K results.
    • Precision@K → Measures how many retrieved documents are actually relevant.
  2. Generation Quality

    • BLEU Score → Measures similarity between generated and reference responses.
    • ROUGE Score → Measures overlap between generated and reference responses.
    • BERTScore → Uses embeddings to measure semantic similarity.
  3. Latency & Efficiency

    • Query Latency → Time taken to retrieve + generate an answer.
    • Memory Usage → RAM consumption for different retrieval methods.

Sample Evaluation Code

def evaluate_results(generated_answers, reference_answers):
    from rouge_score import rouge_scorer
    scorer = rouge_scorer.RougeScorer(["rouge1", "rouge2", "rougeL"], use_stemmer=True)
    
    scores = [scorer.score(gen, ref) for gen, ref in zip(generated_answers, reference_answers)]
    avg_scores = {
        "rouge1": np.mean([s["rouge1"].fmeasure for s in scores]),
        "rouge2": np.mean([s["rouge2"].fmeasure for s in scores]),
        "rougeL": np.mean([s["rougeL"].fmeasure for s in scores])
    }
    return avg_scores

This evaluation framework helps quantify improvements when moving from traditional RAG to RAG-Fusion.


Conclusion: Why RAG-Fusion Matters

By leveraging multiple retrieval methods, reciprocal rank fusion, and adaptive text segmentation, RAG-Fusion significantly improves upon traditional RAG pipelines. It ensures: Better retrieval diversity Higher-quality responses Configurable retrieval & splitting strategies