Efficient Similarity Search with FAISS and SQLite in Python

Summary

This is another component in SmartAnswer and enhanced LLM interface.

In this blog post, we introduce a wrapper class, FaissDB, which integrates FAISS with SQLite or any database to manage document embeddings and enable efficient similarity search. This approach combines FAISS’s vector search capabilities with the storage and querying power of a database, making it ideal for applications such as Retrieval-Augmented Generation (RAG) and recommendation systems.

It builds up this tool PaperSearch.

Automating Paper Retrieval and Processing with PaperSearch

Summary

This is part on in a series of blog post working towards SmartAnswer a comprehensive improvement to how Large Language Models LLMs answer questions.

This tool will be the source of data for SmartAnswer and allow it to find and research better data when generating answers.

I want this tool to be included in that solution but I dot want all the code from this tool distracting from the SmartAnswer solution. Hence this post.

SQLite: the small database that packs a big punch

Summary

SQLite is one of the most widely used database engines in the world, powering everything from mobile applications (Android, iOS) to browsers (Google Chrome, Mozilla Firefox), IoT devices, and even gaming consoles. Unlike traditional client-server databases (e.g., MySQL, PostgreSQL), SQLite is an embedded, serverless database that stores data in a single file, making it easy to manage and deploy.

Python developers frequently choose SQLite for its inherent simplicity and portability, leveraging the built-in sqlite3 module for effortless database integration.

RAFT: Reward rAnked FineTuning - A New Approach to Generative Model Alignment

Summary

This post is an explanation of this paper:RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment.

Generative foundation models, such as Large Language Models (LLMs) and diffusion models, have revolutionized AI by achieving human-like content generation. However, they often suffer from

  1. Biases – Models can learn and reinforce societal biases present in the training data (e.g., gender, racial, or cultural stereotypes).
  2. Ethical Concerns – AI-generated content can be misused for misinformation, deepfakes, or spreading harmful narratives.
  3. Alignment Issues – The model’s behavior may not match human intent, leading to unintended or harmful outputs despite good intentions.

Traditionally, Reinforcement Learning from Human Feedback (RLHF) has been used to align these models, but RLHF comes with stability and efficiency challenges. To address these limitations, RAFT (Reward rAnked FineTuning) was introduced as a more stable and scalable alternative. RAFT fine-tunes models using a ranking-based approach to filter high-reward samples, allowing generative models to improve without complex reinforcement learning setups.

Faiss: A Fast, Efficient Similarity Search Library

Summary

Searching through massive datasets efficiently is a challenge, whether in image retrieval, recommendation systems, or semantic search. Faiss (Facebook AI Similarity Search) is a powerful open-source library developed by Meta to handle high-dimensional similarity search at scale.

It’s well-suited for tasks like:

  • Image search: Finding visually similar images in a large database.
  • Recommendation systems: Recommending items (products, movies, etc.) to users based on their preferences.
  • Semantic search: Finding documents or text passages that are semantically similar to a given query.
  • Clustering: Grouping similar vectors together.

In many of the upcoming projects in this blog I will be using it. It is a good local developer solution.

K-Means Clustering

Summary

Imagine you have a dataset of customer profiles. How can you group similar customers together to tailor marketing campaigns? This is where K-Means clustering comes into play.

K-Means is a popular unsupervised learning algorithm used for clustering data points into distinct groups based on their similarities. It is widely used in various domains such as customer segmentation, image compression, and anomaly detection.

In this blog post, we’ll cover how K-Means works and demonstrate its implementation in Python using scikit-learn.

AI: The Future Interface to Technology

Summary

Imagine a world where you simply think of a task, and invisible devices seamlessly execute it. In fact most of what used to be your daily tasks you won’t even think about they will be automatically executed. Sounds like science fiction? This I believe is the future of human technology interaction. The technology disappears behind an AI driven interface.

Do we currently have Artificial Intelligence

Artificial intelligence refers to computer programs designed to mimic human cognitive abilities, 
such as understanding natural language, recognizing patterns, learning from data, and solving complex problems.
While AGI aims to replicate general human intelligence, 
narrow AI focuses on excelling at specific tasks within predefined parameters.

A common debate in AI discourse revolves around whether large language models (LLMs) truly qualify as artificial intelligence or if they are merely sophisticated algorithms mimicking human-like behavior. While discussions about Artificial General Intelligence (AGI) a theoretical form of AI capable of replicating human cognition across all domains are intriguing, they distract from the practical applications of AI that already exist today. AGI may never materialize, not because it’s unachievable, but because it lacks practical utility. A godlike AI with unrestricted capabilities offers little tangible benefit compared to specialized narrow AI systems. Instead, what we have now is narrow AI, which excels at specific tasks and operates within defined parameters. This AI can get broader through the use of Agents and can automatically self improve and learn as I have shown in previous blog posts.

Self-Learning LLMs for Stock Forecasting: A Python Implementation with Direct Preference Optimization

Summary

Forecasting future events is a critical task in fields like finance, politics, and technology. However, improving the forecasting abilities of large language models (LLMs) often requires extensive human supervision. In this post, we explore a novel approach from the paper LLMs Can Teach Themselves to Better Predict the Future that enables LLMs to teach themselves better forecasting skills using self-play and Direct Preference Optimization (DPO). We’ll walk through a Python implementation of this method, step by step.

Using Quantization to speed up and slim down your LLM

Summary

Large Language Models (LLMs) are powerful, but their size can lead to slow inference speeds and high memory consumption, hindering real-world deployment. Quantization, a technique that reduces the precision of model weights, offers a powerful solution. This post will explore how to use quantization techniques like bitsandbytes, AutoGPTQ, and AutoRound to dramatically improve LLM inference performance.

What is Quantization?

Quantization reduces the computational and storage demands of a model by representing its weights with lower-precision data types. Lets imagine data is water and we hold that water in buckets, most of the time we don’t need massive floating point buckets to hold data that can be represented by integers. Quantization is using smaller buckets to hold the same amount of water – you save space and can move the containers more quickly. Quantization trades a tiny amount of precision for significant gains in speed and memory efficiency.

Mastering LLM Fine-Tuning: A Practical Guide with LLaMA-Factory and LoRA

Summary

Large Language Models (LLMs) offer immense potential, but realizing that potential often requires fine-tuning them on task-specific data. This guide provides a comprehensive overview of LLM fine-tuning, focusing on practical implementation with LLaMA-Factory and the powerful LoRA technique.

What is Fine-Tuning?

Fine-tuning adapts a pre-trained model to a new, specific task or dataset. It leverages the general knowledge already learned by the model from a massive dataset (source domain) and refines it with a smaller, more specialized dataset (target domain). This approach saves time, resources, and data while often achieving superior performance.