Deep Learning

Using Quantization to speed up and slim down your LLM

Summary Large Language Models (LLMs) are powerful, but their size can lead to slow inference speeds and high memory consumption, hindering real-world deployment. Quantization, a technique that reduces the precision of model weights, offers a powerful solution. This post will explore how to use quantization techniques like bitsandbytes, AutoGPTQ, and AutoRound to dramatically improve LLM inference performance. What is Quantization? Quantization reduces the computational and storage demands of a model by representing its weights with lower-precision data types.

Writing Neural Networks with PyTorch

Summary This post provides a practical guide to building common neural network architectures using PyTorch. We’ll explore feedforward networks, convolutional neural networks (CNNs), recurrent neural networks (RNNs), LSTMs, transformers, autoencoders, and GANs, along with code examples and explanations. 1. Understanding PyTorch’s Neural Network Module PyTorch provides the torch.nn module to build neural networks. It provides classes for defining layers, activation functions, and loss functions, making it easy to create and manage complex network architectures in a structured way.