Optimizing Prompt Generation with MARS and DSPy

Page content

πŸ•’ TL;DR

  • We explore MARS, a multi-agent prompt optimizer using Socratic dialogue.
  • We implement it using DSPy + Fin-R1 + EDGAR giving us an end-to-end financial reasoning pipeline.
  • We deploy the whole thing to Hugging Face Spaces with a Gradio UI.

🌟 Introduction

Prompt engineering has become the defining skill of the Large Language Model (LLM) era a delicate balance between science and art. Crafting the perfect prompt often feels like an exercise in intuition, trial, and error. But what if we could take the guesswork out of the process? What if prompts could optimize themselves?

Enter MARS: Multi-Agent framework incorpoRating Socratic guidance a groundbreaking system that automates prompt optimization through autonomous planning and dialogue-based reasoning. By simulating a classroom-like interaction between multiple agents (Planner, Teacher, Critic, and Student), MARS refines prompts iteratively, ensuring they are not only effective but also interpretable and adaptable.

In this post, we’ll explore:

  • 🧠 How MARS works: Breaking down its multi-agent architecture and the power of Socratic dialogue.
  • πŸ₯‡ Why it stands out: Comparing MARS to prior methods like APE, ProTeGi, and OPRO.
  • πŸ›  Hands-on implementation: a complete, hands-on Python example using open tools DSPy, Fin-R1, and EDGAR.

By the end, you’ll have a clear understanding of how MARS can transform financial reasoning tasks and why it’s a game changer for anyone working with LLMs.

🧰 Tools we’re using and why

In this post, we’re tackling a key challenge in applied LLMs: how to ask better questions to get better answers from the model?. Rather than relying on human prompt writing intuition, we’ll build an automated, interpretable reasoning system that learns how to craft better prompts through agent interaction and feedback.

To make that possible, we combine:

  • 🧠 MARS: A multi-agent framework that uses Socratic dialogue between agents (Planner, Teacher, Critic, Student) to refine and improve prompts.
  • βš™οΈ DSPy: A declarative Python framework that makes it easy to compose, trace, and optimize LLM workflows like MARS.
  • πŸ“Š SUFE-AIFLM-Lab/Fin-R1: A financial reasoning model fine-tuned for tasks like earnings analysis, trend detection, and signal generation.
  • 🧾 EDGAR + edgartools: A toolset for extracting real-world financial statements (like 10-Q reports) from public company filings.
  • πŸ’» Ollama / Hugging Face Spaces: Allow us to run the app either locally (with full control) or deploy it in the cloud with a GUI for testing and sharing.

Together, these tools help us build an automated financial analysis agent that not only gives an answer but shows how and why it got there.

This post will build upon the previous post Fin-R1: a Financial Reasoning LLM with Reinforcement Learning and CoT.


❓Why MARS?

Two common issues plague Automated Prompt Optimization (APO):

  1. Rigid Templates: existing approaches often use fixed structures that can’t adapt to diverse tasks.
  2. Inefficient Search: methods like beam search only scratch the surface of prompt space, often leading to suboptimal results.

MARS solves this with two core innovations:

  • Multi-Agent Architecture: each agent has a specific role in planning and optimizing prompts.
  • Socratic Guidance: A Teacher-Critic-Student dialogue improves prompts iteratively.

How Well Does MARS Work?

MARS was evaluated on:

  • 12 general tasks from BBH and MMLU
  • 5 domain-specific tasks (Chinese, Law, Math)

Results: MARS beat all prior state-of-the-art APO methods by a wide margin, including:

  • APE
  • ProTeGi
  • OPRO
  • PE2

It also scored the highest Prompt Efficiency (PE) a novel metric combining accuracy and token usage.


Why It Matters

MARS pushes the boundary of LLM performance without touching the model weights. Instead, it tunes the inputs a low-cost, high-impact optimization technique.

It offers:

  • Better generalization across tasks
  • Superior understanding via Socratic dialogue
  • Efficient use of LLM tokens and calls

Key point: In most of our interaction with models we won’t have access to the underlying details of the model. The prompts are the most reliable and easiest interface to tune model results with.


How MARS Works

MARS uses seven LLM agents working in a pipeline:

  1. Manager: Oversees everything.
  2. UserProxy: Accepts task input and initial prompt.
  3. Planner: Breaks down prompt optimization into substeps.
  4. Teacher: Asks Socratic questions to guide the Student.
  5. Critic: Ensures Teacher’s questions follow Socratic style.
  6. Student: Iteratively improves the prompt.
  7. Target: Evaluates prompt quality using a task model.

This process continues in iterations, where prompts evolve step-by-step.


Socratic guidance is core to MARS

The Teacher-Critic-Student pattern mimics classroom-style learning:

  • Teacher: Never gives direct answers, only probing questions.
  • Critic: Evaluates if those questions align with Socratic methods.
  • Student: Thinks aloud and generates better prompts.

MARS is training the model to think more clearly, like a student learning from guided dialogue.



πŸ” Simple example of what we want to do

import dspy
from dspy import Signature, InputField, OutputField, Module

lm = dspy.LM('ollama_chat/hf.co/ernanhughes/Fin-R1-Q8_0-GGUF', api_base='http://localhost:11434', api_key='')
dspy.configure(lm=lm)

# Step 1: Define DSPy Signature
class AnalyzeMargins(Signature):
    context = InputField(desc="Relevant financial data")
    question = InputField(desc="User's trading question")
    answer = OutputField(desc="Insightful, accurate answer")

# Step 2: Create a Module using the Signature
class MarginAnalyzer(Module):
    def __init__(self):
        super().__init__()
        self.chain = dspy.Predict(AnalyzeMargins)

    def forward(self, context, question):
        return self.chain(context=context, question=question)

context = '''
Tesla Inc Income Statement:
Q1 Revenue: $23B, Operating Income: $2.3B
Q2 Revenue: $24B, Operating Income: $2.5B
Q3 Revenue: $25B, Operating Income: $2.4B
'''
question = "Is Tesla's operating margin improving?"

analyzer = MarginAnalyzer()
result = analyzer(context=context, question=question)
print("πŸ“Š Answer:", result.answer)

This generated answer

πŸ“Š Answer: To determine if Tesla's operating margin is improving, we need to calculate the operating margin for each quarter provided and analyze the trend. Operating margin is calculated as Operating Income divided by Revenue, expressed as a percentage.

**Q1 Calculation:**
Operating Margin = (Operating Income / Revenue) * 100
= ($2.3B / $23B) * 100 β‰ˆ 10.04%
All right
**Q2 Calculation:**
Operating Margin = ($2.5B / $24B) * 100 β‰ˆ 10.42%

**Q3 Calculation:**
Operating Margin = ($2.4B / $25B) * 100 β‰ˆ 9.60%

**Analysis of Trend:**
- Q1: ~10.04%
- Q2: ~10.42% (increase from Q1)
- Q3: ~9.60% (decrease from Q2)

The operating margin increased slightly from Q1 to Q2 but then decreased in Q3. However, the overall trend shows a slight improvement over the three quarters despite the dip in Q3. This suggests that Tesla's operating margin is improving on average, with some volatility.

**Conclusion:**
Yes, Tesla's operating margin is improving when considering the upward movement from Q1 to Q2, even though there was a decline in Q3. The trend indicates an overall positive improvement over the quarters provided.

πŸ’Ό Get the companies financial statements

Why we chose The Income Statement for this analysis

The Income Statement provides the most actionable short-term signals for forecasting a company’s stock price over the next few months.

Statement Key Info Impact on Near-Term Stock Price
Income Statement Revenue, profit margins, EPS, growth trends βœ… High it reflects current performance & guidance
Balance Sheet Assets, liabilities, equity ⚠️ Moderate useful for long-term stability, not short-term movement
Cash Flow Operating, investing, financing flows πŸ”„ Medium strong if paired with income data for sustainability

Why the Income Statement wins (short-term):

  • Revenue growth beats can cause 10–20% spikes.
  • EPS surprises (even by a few cents) move prices immediately.
  • Margins affect sentiment on efficiency and pricing power.
  • It contains forward guidance language (usually buried in MD&A but hinted by trends).
  • Analysts update price targets and ratings based on it first.

Getting the last n income statements

import os
from dotenv import load_dotenv
import pandas as pd
from sqlalchemy import create_engine

from edgar import Company, set_identity

class EDGARFetcher:
    def __init__(self, ticker: str, form: str = "10-Q", n: int = 3):
        load_dotenv()
        self.identity = os.getenv("IDENTITY")
        self.engine = create_engine(self.pg_conn_str)
        self.ticker = ticker
        self.form = form
        self.n = n

        # Set identity for SEC API
        set_identity(self.identity)

    def fetch_markdown_statements(self):
        # Get company filings for last n quarters
        filings = Company(self.ticker).latest(form=self.form, n=self.n)
        statements = []
        for filing in filings:
            # convert to xbrl
            xbrl = XBRL.from_filing(filing)
            # extract incom statment
            income_statement = xbrl.statements.income_statement()
            # get data frame
            df = income_statement.to_dataframe()
            # conver to text
            statements.append(self.rich_report_to_text(df))
        return statements

    def rich_report_to_text(self, df: pd.DataFrame) -> str:
        """
        Convert a rich EDGAR report DataFrame to readable plain text for LLMs.
        """
        lines = []
        for _, row in df.iterrows():
            label = row.get("original_label") or row.get("label") or row.get("concept")
            values = [
                f"{col}: {row[col]}" for col in df.columns
                if isinstance(col, str) and col.startswith("20") and pd.notna(row[col])
            ]
            if values:
                lines.append(f"{label}: " + " | ".join(values))
        return "\n".join(lines)

    def run(self):
        statements = self.fetch_markdown_statements()
        markdowns = [statement for statement in statements]
        return markdowns

fetcher = EDGARFetcher(ticker="TSLA", n=3)
statements = fetcher.run()

Before feeding the data in to the model it’s a good idea to review its size. The whole 10-Q itself can be large. We covered text splitting in the previous post: Fin-R1: a Financial Reasoning LLM with Reinforcement Learning and CoT. In this case the data is alot smaller so we do not need to worry about splitting.


πŸ”’ Checking the token count for the data

Want to make sure that the number of tokens that are sent to the model is usable. This function calculates an approximate value for the statements.

# Create a utility function to estimate token count from a list of markdown statements

def estimate_token_count(markdown_list: list[str], chars_per_token: int = 4) -> int:
    """
    Estimate the number of tokens used by a list of markdown-formatted statements.

    Args:
        markdown_list (list[str]): A list of markdown text blocks.
        chars_per_token (int): Average number of characters per token. Default is 4.

    Returns:
        int: Estimated total token count.
    """
    combined_text = "\n\n".join(markdown_list)
    total_chars = len(combined_text)
    estimated_tokens = total_chars // chars_per_token
    return estimated_tokens

estimated = estimate_token_count(statements)
print(estimated)

We have the financial data now let’s turn it into a structured natural language prompt the model can reason about.


✍️ Building the analysis prompt

We want to generate a clean, structured natural language prompt from a list of markdown-formatted income statements, which can be used as input context for a financial reasoning LLM like Fin-R1.

def build_analysis_prompt(ticker: str, markdown_list: list[str]) -> str:
    header = f"You are a financial analysis model. Below are the last {len(statements)} income statements from {ticker}.\n\n"
    instructions = (
        "Analyze the trend in revenue and operating income.\n"
        "Decide if profitability is improving or declining.\n"
        "Then provide a trading signal.\n\n"
        "Respond with:\n"
        "Signal: <Bullish/Bearish/Neutral>\n"
        "Rationale: <short explanation>\n\n"
    )
    body = "\n\n".join(statements)
    return header + instructions + body

prompt = build_analysis_prompt(ticker, statements)
print(prompt[:300])

πŸ§ͺ Using the model for evaluation

We use this model ollama_chat/hf.co/ernanhughes/Fin-R1-Q8_0-GGUF for evaluation.

This is a GGUF version of SUFE-AIFLM-Lab/Fin-R1.

This allows us:

  1. To run the model locally
  2. Use the model from Ollama.

Configuring DSpy to use the model

This tells DSPy to use a local Fin-R1 model hosted via Ollama, and sets it as the default language model (lm) for all your modules.

dspy.configure(lm=dspy.LM('ollama_chat/hf.co/ernanhughes/Fin-R1-Q8_0-GGUF', api_base='http://localhost:11434', api_key=''))

🧾 The DSpy signature classes

In DSPy, a Signature class defines the contract for a language model module. It describes the inputs and outputs for a reasoning step in your pipeline.

AnalyzeMargins is used by the Student agent (MarginAnalyzer) to make a final trading decision.

FinancialTrendAnalysis Used by a helper module IncomeStatementAnalyzer to analyze raw statements without going through Teacher/Critic. When you want to quickly evaluate a prompt without the full MARS pipeline, or use this as a Target agent to validate another answer.

class AnalyzeMargins(Signature):
    context = InputField()
    question = InputField()
    signal = OutputField()
    rationale = OutputField()

class FinancialTrendAnalysis(Signature):
    statements = InputField()
    question = InputField()
    signal = OutputField()
    rationale = OutputField()

class PlannerSignature(Signature):
    base_question = InputField()
    steps = OutputField(desc="List of reasoning substeps to answer the question")

🧩 DSPy Modules in the MARS Pipeline

This implementation of the MARS architecture uses modular DSPy agents to simulate a multi-agent reasoning loop inspired by Socratic learning. Here’s how each module contributes:

🧠 PlannerModule

The Planner initiates the process by breaking down the user’s high-level task (e.g. “Is the company improving profitability?”) into a set of sub-questions or reasoning steps. This primes the system for structured analysis.

πŸ‘¨β€πŸ« TeacherQuestioner

The Teacher proposes a Socratic question that encourages deeper reasoning. For example, it might ask, “What does the change in operating margin over time suggest?”

🧠 CriticJudge

The Critic evaluates whether the Teacher’s question is truly Socratic β€” that is, whether it encourages thoughtful analysis rather than surface-level answers.

πŸ‘¨β€πŸŽ“ MarginAnalyzer

The Student conducts the actual financial reasoning, using the provided statements and teacher-guided prompt. It returns a trading signal (Bullish, Bearish, or Neutral) and a rationale.

πŸ“Š IncomeStatementAnalyzer

A simplified analysis module used outside the core loop for quick evaluations, baseline comparisons, or as a fallback Target agent.

Together, these agents simulate iterative financial reasoning, with opportunities for traceability, optimization, and modular extension all orchestrated using DSPy.

class IncomeStatementAnalyzer(Module):
    def __init__(self):
        super().__init__()
        self.analyze = Predict(FinancialTrendAnalysis)

    def forward(self, statements, question):
        return self.analyze(statements=statements, question=question)

class TeacherQuestion(Signature):
    prompt = InputField()
    question = OutputField()

class TeacherQuestioner(Module):
    def __init__(self, use_chain_of_thought: bool = True):
        super().__init__()
        self.generate = ChainOfThought(TeacherQuestion) if self.use_chain_of_thought else Predict(TeacherQuestion)

    def forward(self, prompt):
        return self.generate(prompt=prompt)

class CritiqueQuestion(Signature):
    question = InputField()
    critique = OutputField()

class CriticJudge(Module):
    def __init__(self):
        super().__init__()
        self.evaluate = Predict(CritiqueQuestion)

    def forward(self, question):
        return self.evaluate(question=question)

class MarginAnalyzer(Module):
    def __init__(self):
        super().__init__()
        self.analyze = ChainOfThought(AnalyzeMargins)

    def forward(self, context, question, teacher_question=None):
        if teacher_question:
            question = f"{question} Consider also: {teacher_question}"
        return self.analyze(context=context, question=question)

class PlannerModule(Module):
    def __init__(self):
        super().__init__()
        self.plan = ChainOfThought(PlannerSignature)

    def forward(self, base_question):
        return self.plan(base_question=base_question)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  UserProxy β”‚  ← receives base question
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Planner   β”‚  ← breaks down question into reasoning substeps
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Teacher   β”‚  ← generates Socratic question
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Critic    β”‚  ← evaluates the Teacher's question
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Student   β”‚  ← performs analysis (Fin-R1 via DSPy)
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Target?   β”‚  ← (optional) evaluates final quality
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ MARS analysis program

class MarsAnalysisProgram(dspy.Program):
    def __init__(self, planner, teacher, critic, student):
        super().__init__()
        self.planner = planner
        self.teacher = teacher
        self.critic = critic
        self.student = student

    def forward(self, context: str, base_question: str):
        plan_out = self.planner(base_question=base_question)
        teacher_out = self.teacher(prompt=context + "\n\n" + base_question)
        critic_out = self.critic(question=teacher_out.question)

        if "yes" in critic_out.critique.lower():
            final_question = f"{base_question} Consider also: {teacher_out.question}"
        else:
            final_question = base_question

        student_out = self.student(context=context, question=final_question)

        return {
            "plan": plan_out.steps,
            "teacher_question": teacher_out.question,
            "critique": critic_out.critique,
            "final_question": final_question,
            "signal": student_out.signal,
            "rationale": student_out.rationale
        }

We create a function analyze_ticker this performs the full pipeline.


def analyze_ticker(ticker: str):
    """
    Run the full MARS analysis pipeline for a given stock ticker.

    Args:
        ticker (str): Stock symbol (e.g. 'TSLA')

    Returns:
        dict: MARS pipeline result containing plan, teacher_question, critique,
              final_question, signal, and rationale
    """
    fetcher = EDGARFetcher(ticker=ticker)
    statements = fetcher.fetch_markdown_statements()
    prompt = build_analysis_prompt(ticker, statements)

    planner = PlannerModule()
    teacher = TeacherQuestioner()
    critic = CriticJudge()
    student = MarginAnalyzer()

    program = MarsAnalysisProgram(teacher, critic, student)
    result = program(
        context=prompt,
        base_question="Is the company improving its profitability?"
    )
    logger.info(f"Analysis for stock {ticker} :\n{result}")
    return result

analyze_ticker("TSLA")

🧠 What the MARS Pipeline Returns

After running analyze_ticker("TSLA"), you get a dictionary like this:

{
"plan": 
	"- Identify key profitability metrics (net income, gross profit margin, ROE).
	- Collect historical financial statements for three to five years.\n- Calculate annual gross profit margins and net income growth rates.
	- Analyze operating expenses trend.
	- Compute ROE annually.
	- Compare against industry benchmarks.\n- Evaluate qualitative factors influencing performance.
	- Synthesize results to answer the question.", 
"teacher_question":   
	"Is there a specific period where the company’s profitability improved despite other challenges?", 
"critique":   
	"The question asks whether there was a specific period during which the company's profitability improved despite facing other challenges. 
	To address this, I need to analyze the financial data provided over different time intervals. 
		First, I should identify key metrics such as net income or operating profit for each relevant period. 
		Then, cross-reference these figures with any mentioned challenges (e.g., market conditions, operational issues) during those periods. 
		It's crucial to ensure that the improvement in profitability is not attributed to one-time events but reflects sustainable growth. 
		Additionally, comparing year-over-year changes can highlight trends and confirm if the improvement was indeed a period-specific event rather than a gradual process.", 
"final_question":   
	"Is the company improving its profitability?", 
"signal":   
	"The company's net income decreased significantly from $2,539 million in Q1 2023 to $1,144 million in Q1 2024. 
	Basic EPS dropped from $0.80 to $0.37 over the same period. 
	These declines suggest a deterioration in profitability. 
	While one quarter does not definitively indicate a long-term trend, the sharp drop warrants further investigation into underlying causes such as operational issues, market conditions, or financial restructuring.", 
"rationale":   
	"Profitability is assessed by analyzing key financial metrics like net income, earnings per share (EPS), and other ratios over time. 
	The provided data shows:
		1. **Net Income**: Q1 2023: $2,539 million vs. Q1 2024: $1,144 million. This represents a decrease of approximately 55%.
		2. **Basic EPS**: Q1 2023: $0.80 vs. Q1 2024: $0.37. A decline in EPS indicates lower profitability per share.
		3. **Diluted EPS**: Q1 2023: $0.73 vs. Q1 2024: $0.34, further confirming the downward trend.
		
		Additional factors contributing to reduced profitability include:
		**Interest Expense**: Increased from -$29 million (Q1 2023) to -$76 million (Q1 2024), reflecting higher debt costs.
		**Tax Provision**: Increased from -261 million (Q1 2023) to -409 million (Q1 2024), indicating higher tax expenses relative to net income.
		
		These trends suggest the company may be facing challenges such as:
		Reduced revenue or market share.
		Increased operational costs.
		Financial restructuring leading to higher debt and interest obligations.
		One-time charges affecting profitability.
		
		While quarterly data is limited, the sharp decline in both net income and EPS raises red flags about potential long-term issues. 
		Further analysis of annual results or longer-term trends would be necessary to confirm this trend and identify underlying causes."
}

Highlighted some of the mentions in the consolidated statement below.

Income Statement

Let’s break down what each field means in the context of the MARS multi-agent reasoning process:

πŸ“Œ plan

  • What it is: A structured breakdown of reasoning steps generated by the Planner agent.
  • Purpose: Helps the system understand how to approach the user’s original question (e.g., β€œIs profitability improving?”).
  • Example: “Identify profitability metrics β†’ Calculate margin trends β†’ Compare YoY performance.”

πŸ‘¨β€πŸ« teacher_question

  • What it is: A Socratic question generated by the Teacher agent to help improve the original question.
  • Purpose: Promotes deeper reflection and encourages the Student to consider nuances.
  • Example: “Is there a period where profitability improved despite other challenges?”

πŸ§‘β€βš–οΈ critique

  • What it is: An analysis by the Critic of the Socratic question’s quality.
  • Purpose: Ensures that the Teacher’s question is truly Socratic β€” open-ended, thought-provoking, and useful.
  • Example: “The question encourages time-series reasoning and isolates exceptional performance scenarios.”

πŸ’‘ final_question

  • What it is: The version of the original question that the Student will ultimately answer.
  • How it’s formed: Either the original question, or enhanced by appending the Teacher’s Socratic question β€” if the Critic approves it.
  • Example: "Is the company improving its profitability? Consider also: Is there a period where profitability improved despite other challenges?"

πŸ“ˆ signal

  • What it is: The Student’s conclusion about the financial trend.
  • Options: One of: Bullish, Bearish, or Neutral.
  • Example: "Bearish"

🧾 rationale

  • What it is: The Student’s chain-of-thought explanation for the signal.
  • Purpose: Makes the prediction interpretable and auditable.
  • Example: “Net income and EPS dropped by ~55% YoY. Increased interest and tax expenses suggest structural issues. Indicates bearish sentiment.”

πŸ” Summary flow:

User β†’ Planner β†’ Teacher β†’ Critic β†’ Student β†’ Result
                    ↓         ↓        ↓
                Socratic     Evaluate   Answer

This structure makes your system modular, interpretable, and traceable all key for financial LLM applications.


☁️ Deploying the solution to Hugging Face

To deploy to Hugging face spaces we need to build a GUI.


import gradio as gr

from mars import analyze_ticker


def run_analysis(ticker):
    result = analyze_ticker(ticker)
    return f"""
### Plan
{result['plan']}

### Teacher Question
{result['teacher_question']}

### Critique
{result['critique']}

### Final Question
{result['final_question']}

### Signal
{result['signal']}

### Rationale
{result['rationale']}
""" 


with gr.Blocks() as iface:
    gr.Markdown("# MARS Financial Reasoning")
    ticker_input = gr.Textbox(label="Enter stock ticker", placeholder="e.g., TSLA")
    run_button = gr.Button("Analyze", variant="primary")
    output_box = gr.Markdown()

    run_button.click(fn=run_analysis,
                     inputs=ticker_input,
                     outputs=output_box,
                     show_progress=True)  # <-- shows loading and disables button

if __name__ == "__main__":
    iface.launch()

We also need to determine if the application is running in spaces and change how DSpy loads the model. This way the application will run in the cloud and locally.

if running_in_spaces:
    print("πŸ” Detected: Running in Hugging Face Spaces")
    dspy.configure(
        lm=LM(
            model='huggingface/SUFE-AIFLM-Lab/Fin-R1',
            api_base='https://api-inference.huggingface.co',
            api_key=os.getenv("HF_API_KEY")
        )
    )
else:
    print("πŸ’» Detected: Running locally")
    dspy.configure(
        lm=LM(
            model='ollama_chat/hf.co/ernanhughes/Fin-R1-Q8_0-GGUF',
            api_base='http://localhost:11434',
            api_key=''  # Ollama does not require key
        )
    )

πŸ”š Conclusion

βœ… What we covered

  • πŸš€ Introduced MARS, a multi-agent framework for prompt optimization using Socratic guidance.
  • πŸ“Š Used the Fin-R1 model for financial reasoning, deployed locally via Ollama or remotely via Hugging Face.
  • 🧠 Implemented key DSPy agents:
    • Planner to break down tasks
    • Teacher to generate Socratic questions
    • Critic to evaluate question quality
    • Student to perform financial analysis
  • 🧩 Explained how DSPy Signatures and Modules encapsulate LLM tasks cleanly.
  • πŸ“ˆ Pulled real income statements using a custom EDGARFetcher class.
  • πŸ›  Built a complete pipeline using DSPy’s Program and Teleprompter to simulate and trace MARS behavior.
  • πŸ§ͺ Demonstrated how to analyze a real company (Tesla) with prompt-driven LLM reasoning.
  • ☁️ Configured automatic LLM selection for local (Ollama) or cloud (Hugging Face) inference.
  • 🌐 Wrapped it all in a Gradio UI and deployed to Hugging Face Spaces.

πŸ”§ Code

The code used in the post can be found here:

mars github


πŸ“š References

MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization

MARS GitHub repository

DSPy GitHub repository

Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning

Fin-R1: Model page Hugging Face

Fin-R1 GitHub repository


πŸ“˜ Glossary of terms

Term Description
MARS Multi-Agent framework incorpoRating Socratic guidance; a system for automated prompt optimization using dialogue between agents.
DSPy A Python framework for composing and optimizing LLM programs using signatures, modules, and teleprompters.
Signature A DSPy class that defines the inputs and outputs of a module (e.g., question, context, signal).
Module A class in DSPy that wraps an LLM call using a signature (e.g., MarginAnalyzer, TeacherQuestioner).
Planner An agent in the MARS pipeline that decomposes the user’s task into reasoning steps.
Teacher Generates Socratic questions to guide the Student’s reasoning.
Critic Validates whether the Teacher’s question follows Socratic methodology.
Student Analyzes the input using LLM reasoning, guided by Teacher and Critic feedback.
Target (Optional) Final evaluator agent used to assess the quality of the Student’s answer.
AnalyzeMargins Signature used by the Student to analyze financial statements and return a trading signal.
Socratic Guidance The teaching method MARS is modeled afterβ€”using questions (not answers) to improve reasoning.
Prompt Optimization The process of refining LLM prompts for improved accuracy, efficiency, or interpretability.
Trace A DSPy object that stores detailed steps and outputs during a reasoning program’s execution.
Teleprompters A DSPy component that automates the process of prompt optimization. Essentially, they act as compilers for your LLM programs. Instead of manually crafting prompts, you define the desired behavior of your modules using declarative signatures (inputs and outputs), and the Teleprompter figures out the best prompts to achieve that behavior.
Hugging Face Spaces Cloud platform where you deploy apps like MARS using Gradio or Streamlit interfaces.