Harnessing LLMs in Mixture of Experts for Smarter Trading Strategies

Page content

Introduction

Financial markets are complex and highly volatile, demanding robust and adaptive trading strategies. Traditional methods, such as statistical modeling and machine learning, often fall short in handling dynamic market conditions. To address these limitations, researchers have proposed a Mixture of Experts (MoE) approach, which distributes decision-making across multiple specialized models.

This paper LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading introduces LLMoE—a novel framework that integrates Large Language Models (LLMs) as routers within an MoE architecture. We are going to implement that isn this blog post.


Understanding Mixture of Experts (MoE)

MoE is a machine learning technique that partitions complex decision tasks among multiple expert models. A router assigns data instances to the most relevant expert, improving both efficiency and accuracy. While MoE has seen success in deep learning and NLP, its adoption in financial markets has been limited due to static routing mechanisms.

Limitations of Traditional MoE in Trading

  • Static Routing: Conventional MoE uses fixed neural networks as routers, which struggle with shifting market dynamics.
  • Unimodal Data Processing: Most MoE-based trading models rely only on numerical stock data, ignoring valuable textual insights from news articles.
  • Lack of Interpretability: Neural network-based routers operate as black boxes, making it difficult to understand their decision-making.

How LLMoE Overcomes These Challenges

LLMoE leverages pre-trained LLMs as dynamic routers, allowing them to:

  1. Integrate Multimodal Data: It processes both historical stock prices and textual news data.
  2. Provide Context-Aware Expert Selection: The LLM router dynamically classifies market sentiment as Optimistic or Pessimistic and assigns expert models accordingly.
  3. Improve Interpretability: LLM-generated natural language explanations enhance transparency in decision-making.

LLMoE Framework: Key Components

LLMoE consists of three key stages:

1️⃣ LLM-Based Router

The LLM router processes a rolling window of stock market data combined with news headlines. Given an input sequence:

\[ X(t-4:t) = \{ x_{t-4}, x_{t-3}, x_{t-2}, x_{t-1}, x_t \} \]

where each \( x_i \) is a combination of numerical stock features and corresponding news text, the router predicts Optimistic or Pessimistic market sentiment:

\[ y_t = \arg\max (LLM\_Prediction(X(t-4:t))) \]

Additionally, it generates natural language explanations for its classification, improving interpretability.

2️⃣ Expert Prediction

Based on the router’s classification, data is sent to either an Optimistic Expert or Pessimistic Expert. Each expert is a feedforward neural network (FNN) trained to analyze stock trends under specific market conditions. Experts process numerical stock indicators such as:

  • Price Ratios: Open, High, Low, Close
  • Daily Price Changes: Percentage changes in closing prices
  • Rolling Deviations: Moving averages over different time periods

3️⃣ Trading Strategy Generation

LLMoE employs an “All-in All-out” strategy:

  • Invest all capital if the expert predicts a price increase.
  • Liquidate holdings if the expert predicts a price decline.

This strategy ensures capital is dynamically adjusted to maximize returns.


Python Implementation of LLMoE

Let’s break down a Python implementation using PyTorch and Hugging Face Transformers.

Step 1: LLM-Based Router Implementation

We use Llama3.2 (or an equivalent LLM) to classify market sentiment based on numerical and textual data.

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

class LLMRouter:
    def __init__(self, model_name="meta-llama/Llama-3-8B"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
        self.model.eval()

    def classify_sentiment(self, text):
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True, padding=True)
        with torch.no_grad():
            outputs = self.model(**inputs)
        sentiment = torch.argmax(outputs.logits).item()
        return "Optimistic" if sentiment == 1 else "Pessimistic"

# Example usage
router = LLMRouter()
print(router.classify_sentiment("Stock prices are rising despite market uncertainty."))

Step 2: Expert Models for Market Prediction

We train separate Optimistic and Pessimistic experts using PyTorch.

import torch.nn as nn

class ExpertModel(nn.Module):
    def __init__(self, input_size=55):
        super(ExpertModel, self).__init__()
        self.fc1 = nn.Linear(input_size, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 32)
        self.fc4 = nn.Linear(32, 1)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.2)

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.dropout(self.relu(self.fc2(x)))
        x = self.dropout(self.relu(self.fc3(x)))
        return torch.sigmoid(self.fc4(x))

optimistic_expert = ExpertModel()
pessimistic_expert = ExpertModel()

Step 3: Decision-Making for Trading

We use the expert models to generate trading signals.

def generate_trade_signal(router_output, stock_features):
    if router_output == "Optimistic":
        prediction = optimistic_expert(stock_features)
    else:
        prediction = pessimistic_expert(stock_features)
    
    return "BUY" if prediction.item() > 0.5 else "SELL"

# Example usage
sample_stock_data = torch.rand(1, 55)  # Simulated stock features
decision = generate_trade_signal("Optimistic", sample_stock_data)
print(f"Trading Decision: {decision}")

Performance and Evaluation

Datasets Used

  • MSFT (Microsoft) Dataset: Includes 2,503 trading days with missing news headlines.
  • AAPL (Apple) Dataset: 2,482 trading days with more complete news coverage.

Key Metrics

  • Total Return (TR)
  • Sharpe Ratio (SR)
  • Maximum Drawdown (MDD)
  • Calmar Ratio (CR)

Experimental Results

LLMoE significantly outperforms traditional MoE and neural network models, achieving:

  • 65.44% Total Return on MSFT (vs. 33.92% for MLP)
  • Sharpe Ratio of 2.14 (vs. 1.39 for LSTM)
  • Lower Maximum Drawdown, improving risk management

Conclusion

LLMoE represents a paradigm shift in financial AI, demonstrating that LLMs can serve as intelligent routers in Mixture of Experts architectures. By dynamically integrating numerical and textual data, LLMoE enhances predictive accuracy and interpretability in stock trading.

For Python developers, implementing LLMoE presents exciting opportunities: Fine-tuning LLMs for routing tasks
Building multimodal trading models
Optimizing expert selection strategies

As AI-driven trading continues to evolve, LLMoE lays the foundation for future advancements in intelligent, context-aware financial decision-making.