RAG-Fusion: Enhancing RAG using multiple retrieval strategies
RAG-Fusion: The Future of Information Retrieval
Retrieval-Augmented Generation (RAG) is a powerful technique that enhances the performance of language models by integrating document retrieval into text generation. However, traditional RAG models rely on a single retrieval step, limiting their effectiveness when dealing with diverse or ambiguous queries. RAG-Fusion improves upon this by incorporating multiple retrieval strategies and fusing their results, leading to more comprehensive and accurate responses.
In this blog post, we will explore RAG-Fusion, its advantages over traditional RAG, and how to implement it using Python. We will use the code we’ve developed as a working example throughout the post.
Why RAG-Fusion?
Traditional RAG models work as follows:
- Convert a query into an embedding.
- Retrieve the top N documents from a knowledge base.
- Feed the retrieved documents into a language model to generate an answer.
While effective, this approach has limitations:
- Single retrieval method bias: If the retrieval method fails, the generation suffers.
- Lack of multi-perspective fusion: Queries often require diverse perspectives for better answers.
- Fixed retrieval ranking: The system may prioritize suboptimal results.
How RAG-Fusion Solves These Issues
RAG-Fusion overcomes these limitations by:
- Using multiple retrievers (e.g., BM25, FAISS, TF-IDF) for diverse document selection.
- Employing Reciprocal Rank Fusion (RRF) to merge rankings from different retrieval strategies.
- Optimizing retrieval results for multi-stage fusion, ensuring a robust final ranking.
Implementation Overview
We have designed a modular implementation of RAG-Fusion. The core components include:
- Document Extraction & Splitting → Extracts and segments text from PDFs.
- Retrievers → Retrieves relevant documents using BM25, TF-IDF, and FAISS.
- Multi-Stage Fusion → Combines results from multiple retrievers.
- LLM-Based Answer Generation → Uses a language model to generate responses based on retrieved documents.
We will walk through each component with relevant code snippets.
Step 1: Extracting & Splitting Documents
Before we can retrieve information, we must first process and split the text efficiently. We have implemented a configurable document extractor:
class TextExtractor(ABC):
@abstractmethod
def extract(self, pdf_path):
pass
@staticmethod
def create_extractor(mode):
extractors = {
"full": FullPDFExtractor(),
"page": PagePDFExtractor(),
"hybrid": HybridPDFExtractor()
}
return extractors.get(mode, PagePDFExtractor()) # Default to page processing
This allows us to process entire documents, per-page extraction, or a hybrid approach based on configuration settings.
Step 2: Splitting Text for Efficient Retrieval
Once extracted, we split the text into meaningful segments using different strategies:
class TextSplitter(ABC):
@abstractmethod
def split(self, page_id, text):
pass
@staticmethod
def create_splitter(method):
splitters = {
"sliding_window": SlidingWindowSplitter(),
"hierarchical": HierarchicalSplitter(),
"semantic": SemanticSplitter(),
"paragraph": ParagraphSplitter()
}
return splitters.get(method, SlidingWindowSplitter()) # Default to sliding window
With this setup, users can choose sliding window, hierarchical, semantic, or paragraph-based splitting to optimize retrieval accuracy.
Step 3: Implementing Multiple Retrievers
A critical aspect of RAG-Fusion is the use of multiple retrievers to maximize information diversity. We implement the following:
class Retriever(ABC):
@abstractmethod
def retrieve(self, query, corpus):
pass
@staticmethod
def create_retriever(method, vector_store=None):
retriever_methods = {
"bm25": BM25Retriever(),
"tfidf": TFIDFRetriever(),
"faiss": FAISSRetriever(vector_store) if vector_store else None
}
return retriever_methods.get(method, BM25Retriever()) # Default to BM25
Each retriever works differently:
- BM25 → Lexical matching.
- TF-IDF → Weighted term frequency.
- FAISS → Dense vector search.
RAG-Fusion allows dynamic selection via configuration.
Step 4: Evaluating RAG vs. RAG-Fusion
To measure the performance of RAG-Fusion against traditional RAG, we use several evaluation metrics:
-
Retrieval Effectiveness
- Recall@K → Measures how often the correct document appears in the top K results.
- Precision@K → Measures how many retrieved documents are actually relevant.
-
Generation Quality
- BLEU Score → Measures similarity between generated and reference responses.
- ROUGE Score → Measures overlap between generated and reference responses.
- BERTScore → Uses embeddings to measure semantic similarity.
-
Latency & Efficiency
- Query Latency → Time taken to retrieve + generate an answer.
- Memory Usage → RAM consumption for different retrieval methods.
Sample Evaluation Code
def evaluate_results(generated_answers, reference_answers):
from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(["rouge1", "rouge2", "rougeL"], use_stemmer=True)
scores = [scorer.score(gen, ref) for gen, ref in zip(generated_answers, reference_answers)]
avg_scores = {
"rouge1": np.mean([s["rouge1"].fmeasure for s in scores]),
"rouge2": np.mean([s["rouge2"].fmeasure for s in scores]),
"rougeL": np.mean([s["rougeL"].fmeasure for s in scores])
}
return avg_scores
This evaluation framework helps quantify improvements when moving from traditional RAG to RAG-Fusion.
Conclusion: Why RAG-Fusion Matters
By leveraging multiple retrieval methods, reciprocal rank fusion, and adaptive text segmentation, RAG-Fusion significantly improves upon traditional RAG pipelines. It ensures: Better retrieval diversity Higher-quality responses Configurable retrieval & splitting strategies