Debugging Jupyter Notebooks in VS Code

Summary

Visual Studio Code is the most popular editor for development.

Jupyter Notebooks is the most widely used way to share, demonstrate and develop code in modern AI development.

Debugging code is not just used when you have a bug. After you have written any substantial piece of code I suggest stepping through it in the debugger if possible. This can help improve you understanding and the quality of the code you have written

DeepResearch Part 3: Getting the best web data for your research

Summary

This post details building a robust web data pipeline using SmolAgents. We’ll create tools to retrieve content from various web endpoints, convert it to a consistent format (Markdown), store it efficiently, and then evaluate its relevance and quality using Large Language Models (LLMs). This pipeline is crucial for building a knowledge base for LLM applications.

Web Data Convertor (MarkdownConverter)

We leverage the MarkdownConverter class, inspired by the one in autogen, to handle the diverse formats encountered on the web. This ensures consistency for downstream processing.

DeepResearch Part 2: Building a RAG Tool for arXiv PDFs

Summary

In this post, we’ll build a Retrieval Augmented Generation (RAG) tool to process the PDF files downloaded from arXiv in the previous post DeepResearch Part 1. This RAG tool will be capable of loading, processing, and semantically searching the document content. It’s a versatile tool applicable to various text sources, including web pages.

Building the RAG Tool

Following up on our arXiv downloader, we now need a tool to process the downloaded PDF’s. This post details the creation of such a tool.

DeepResearch Part 1: Building an arXiv Search Tool with SmolAgents

Summary

This post kicks off a series of three where we’ll build, extend, and use the open-source DeepResearch application inspired by the Hugging Face blog post. In this first part, we’ll focus on creating an arXiv search tool that can be used with SmolAgents.

DeepResearch aims to empower research by providing tools that automate and streamline the process of discovering and managing academic papers. This series will demonstrate how to build such tools, starting with a powerful arXiv search tool.

FFmpeg: A Practical Guide to Essential Command-Line Options

Introduction

FFmpeg is an incredibly versatile command-line tool for manipulating audio and video files. This post provides a practical collection of useful FFmpeg commands for common tasks.

FFmpeg Command Structure

The general structure of an FFmpeg command is:

ffmpeg [global_options] {[input_file_options] -i input_url} ... {[output_file_options] output_url} ...

Merging Video and Audio

Merging video and audio, with audio re-encoding

ffmpeg -i video.mp4 -i audio.wav -c:v copy -c:a aac output.mp4

Copying the audio without re-encoding

ffmpeg -i video.mp4 -i audio.wav -c copy output.mkv

Why copy audio?

Writing Neural Networks with PyTorch

Summary

This post provides a practical guide to building common neural network architectures using PyTorch. We’ll explore feedforward networks, convolutional neural networks (CNNs), recurrent neural networks (RNNs), LSTMs, transformers, autoencoders, and GANs, along with code examples and explanations.


1️⃣ Understanding PyTorch’s Neural Network Module

PyTorch provides the torch.nn module to build neural networks. It provides classes for defining layers, activation functions, and loss functions, making it easy to create and manage complex network architectures in a structured way.

Mastering Prompt Engineering: A Practical Guide

Summary

This post provides a comprehensive guide to prompt engineering, the art of crafting effective inputs for Large Language Models (LLMs). Mastering prompt engineering is crucial for maximizing the potential of LLMs and achieving desired results.

Effective prompting is the easiest way to enhance your experience with Large Language Models (LLMs).

The prompts we make are our interface to LLMs. This is how we communicate with them. This is why it is important to understand how to do it well.

Harnessing the Power of Stable Diffusion WebUI

Summary

In this blog I aim to try building using open source tools where possible. The benefits are price, control, knowledge and eventually quality. In the shorter term though the quality will trail the paid versions. My belief is we can construct AI applications to be self correcting sort of like how your camera auto focuses for you. This process will involve a lot of computation so using a paid service could be costly. This for me is the key reason to choose solutions using free tools.

Creating AI-Powered Paper Videos: From Research to YouTube

Summary

This post demonstrates how to automatically transform a scientific paper (or any text/audio content) into a YouTube video using AI. We’ll leverage several powerful tools, including large language models (LLMs), Whisper for transcription, Stable Diffusion for image generation, and FFmpeg for video assembly. This process can streamline content creation and make research more accessible.

Overview

Our pipeline involves these steps:

  1. Audio Generation (Optional): If starting from a text document, we’ll use a text-to-speech service (like NotebookLM, or others) to create an audio narration.
  2. Transcription: We’ll use Whisper to transcribe the audio into text, including timestamps for each segment.
  3. Database Storage: The transcribed text, timestamps, and metadata will be stored in an SQLite database for easy management.
  4. Text Chunking: We’ll divide the transcript into logical chunks (e.g., by sentence or time duration).
  5. Concept Summarization: An LLM will summarize the core concept of each chunk.
  6. Image Prompt Generation: Another LLM will create a detailed image prompt based on the summary.
  7. Image Generation: Stable Diffusion (or a similar tool) will generate images from the prompts.
  8. Video Assembly: FFmpeg will combine the images and audio into a final video.

Prerequisites

  • Hugging Face CLI: Install it to download the Whisper model: pip install huggingface_hub
  • Whisper: Install the whisper-timestamped package, or your preferred Whisper implementation.
  • Ollama: You’ll need a running instance of Ollama to access the LLMs.
  • Stable Diffusion WebUI (or similar): For image generation.
  • FFmpeg: For video and audio processing. Ensure it’s in your system’s PATH.
  • Python Libraries: Install necessary Python packages: pip install pydub sqlite3 requests Pillow (and any others as needed).

**1️⃣ Audio Generation (Optional)

If you’re starting with a text document, you’ll need to convert it to audio. Several cloud services and libraries can do this. For this example, we’ll assume you have an audio file (audio.wav).

Fast Poisson Disk Sampling in Arbitrary Dimensions

Summary

In this post I explore Robert Bridson’s paper: Fast Poisson Disk Sampling in Arbitrary Dimensions and provide an example python implementation. Additionally, I introduce an alternative method using Cellular Automata to generate Poisson disk distributions.

Poisson disk sampling is a widely used technique in computer graphics, particularly for applications like rendering, texture generation, and particle simulation. Its appeal lies in producing sample distributions with “blue noise” characteristics—random yet evenly spaced, avoiding clustering.