Image Generation

Harnessing the Power of Stable Diffusion WebUI

Summary In this blog I aim to try building using open source tools where possible. The benefits are price, control, knowledge and eventually quality. In the shorter term though the quality will trail the paid versions. My belief is we can construct AI applications to be self correcting sort of like how your camera auto focuses for you. This process will involve a lot of computation so using a paid service could be costly.

Creating AI-Powered Paper Videos: From Research to YouTube

Summary This post demonstrates how to automatically transform a scientific paper (or any text/audio content) into a YouTube video using AI. We’ll leverage several powerful tools, including large language models (LLMs), Whisper for transcription, Stable Diffusion for image generation, and FFmpeg for video assembly. This process can streamline content creation and make research more accessible. Overview Our pipeline involves these steps: Audio Generation (Optional): If starting from a text document, we’ll use a text-to-speech service (like NotebookLM, or others) to create an audio narration.