Optimizing Prompt Generation with MARS and DSPy
π TL;DR
- We explore MARS, a multi-agent prompt optimizer using Socratic dialogue.
- We implement it using DSPy + Fin-R1 + EDGAR giving us an end-to-end financial reasoning pipeline.
- We deploy the whole thing to Hugging Face Spaces with a Gradio UI.
π Introduction
Prompt engineering has become the defining skill of the Large Language Model (LLM) era a delicate balance between science and art. Crafting the perfect prompt often feels like an exercise in intuition, trial, and error. But what if we could take the guesswork out of the process? What if prompts could optimize themselves?
Enter MARS: Multi-Agent framework incorpoRating Socratic guidance a groundbreaking system that automates prompt optimization through autonomous planning and dialogue-based reasoning. By simulating a classroom-like interaction between multiple agents (Planner, Teacher, Critic, and Student), MARS refines prompts iteratively, ensuring they are not only effective but also interpretable and adaptable.
In this post, weβll explore:
- π§ How MARS works: Breaking down its multi-agent architecture and the power of Socratic dialogue.
- π₯ Why it stands out: Comparing MARS to prior methods like APE, ProTeGi, and OPRO.
- π Hands-on implementation: a complete, hands-on Python example using open tools DSPy, Fin-R1, and EDGAR.
By the end, youβll have a clear understanding of how MARS can transform financial reasoning tasks and why itβs a game changer for anyone working with LLMs.
π§° Tools we’re using and why
In this post, we’re tackling a key challenge in applied LLMs: how to ask better questions to get better answers from the model?. Rather than relying on human prompt writing intuition, weβll build an automated, interpretable reasoning system that learns how to craft better prompts through agent interaction and feedback.
To make that possible, we combine:
- π§ MARS: A multi-agent framework that uses Socratic dialogue between agents (Planner, Teacher, Critic, Student) to refine and improve prompts.
- βοΈ DSPy: A declarative Python framework that makes it easy to compose, trace, and optimize LLM workflows like MARS.
- π SUFE-AIFLM-Lab/Fin-R1: A financial reasoning model fine-tuned for tasks like earnings analysis, trend detection, and signal generation.
- π§Ύ EDGAR + edgartools: A toolset for extracting real-world financial statements (like 10-Q reports) from public company filings.
- π» Ollama / Hugging Face Spaces: Allow us to run the app either locally (with full control) or deploy it in the cloud with a GUI for testing and sharing.
Together, these tools help us build an automated financial analysis agent that not only gives an answer but shows how and why it got there.
This post will build upon the previous post Fin-R1: a Financial Reasoning LLM with Reinforcement Learning and CoT.
βWhy MARS?
Two common issues plague Automated Prompt Optimization (APO):
- Rigid Templates: existing approaches often use fixed structures that can’t adapt to diverse tasks.
- Inefficient Search: methods like beam search only scratch the surface of prompt space, often leading to suboptimal results.
MARS solves this with two core innovations:
- Multi-Agent Architecture: each agent has a specific role in planning and optimizing prompts.
- Socratic Guidance: A Teacher-Critic-Student dialogue improves prompts iteratively.
How Well Does MARS Work?
MARS was evaluated on:
- 12 general tasks from BBH and MMLU
- 5 domain-specific tasks (Chinese, Law, Math)
Results: MARS beat all prior state-of-the-art APO methods by a wide margin, including:
- APE
- ProTeGi
- OPRO
- PE2
It also scored the highest Prompt Efficiency (PE) a novel metric combining accuracy and token usage.
Why It Matters
MARS pushes the boundary of LLM performance without touching the model weights. Instead, it tunes the inputs a low-cost, high-impact optimization technique.
It offers:
- Better generalization across tasks
- Superior understanding via Socratic dialogue
- Efficient use of LLM tokens and calls
Key point: In most of our interaction with models we won’t have access to the underlying details of the model. The prompts are the most reliable and easiest interface to tune model results with.
How MARS Works
MARS uses seven LLM agents working in a pipeline:
- Manager: Oversees everything.
- UserProxy: Accepts task input and initial prompt.
- Planner: Breaks down prompt optimization into substeps.
- Teacher: Asks Socratic questions to guide the Student.
- Critic: Ensures Teacherβs questions follow Socratic style.
- Student: Iteratively improves the prompt.
- Target: Evaluates prompt quality using a task model.
This process continues in iterations, where prompts evolve step-by-step.
Socratic guidance is core to MARS
The Teacher-Critic-Student pattern mimics classroom-style learning:
- Teacher: Never gives direct answers, only probing questions.
- Critic: Evaluates if those questions align with Socratic methods.
- Student: Thinks aloud and generates better prompts.
MARS is training the model to think more clearly, like a student learning from guided dialogue.
π Simple example of what we want to do
import dspy
from dspy import Signature, InputField, OutputField, Module
lm = dspy.LM('ollama_chat/hf.co/ernanhughes/Fin-R1-Q8_0-GGUF', api_base='http://localhost:11434', api_key='')
dspy.configure(lm=lm)
# Step 1: Define DSPy Signature
class AnalyzeMargins(Signature):
context = InputField(desc="Relevant financial data")
question = InputField(desc="User's trading question")
answer = OutputField(desc="Insightful, accurate answer")
# Step 2: Create a Module using the Signature
class MarginAnalyzer(Module):
def __init__(self):
super().__init__()
self.chain = dspy.Predict(AnalyzeMargins)
def forward(self, context, question):
return self.chain(context=context, question=question)
context = '''
Tesla Inc Income Statement:
Q1 Revenue: $23B, Operating Income: $2.3B
Q2 Revenue: $24B, Operating Income: $2.5B
Q3 Revenue: $25B, Operating Income: $2.4B
'''
question = "Is Tesla's operating margin improving?"
analyzer = MarginAnalyzer()
result = analyzer(context=context, question=question)
print("π Answer:", result.answer)
This generated answer
π Answer: To determine if Tesla's operating margin is improving, we need to calculate the operating margin for each quarter provided and analyze the trend. Operating margin is calculated as Operating Income divided by Revenue, expressed as a percentage.
**Q1 Calculation:**
Operating Margin = (Operating Income / Revenue) * 100
= ($2.3B / $23B) * 100 β 10.04%
All right
**Q2 Calculation:**
Operating Margin = ($2.5B / $24B) * 100 β 10.42%
**Q3 Calculation:**
Operating Margin = ($2.4B / $25B) * 100 β 9.60%
**Analysis of Trend:**
- Q1: ~10.04%
- Q2: ~10.42% (increase from Q1)
- Q3: ~9.60% (decrease from Q2)
The operating margin increased slightly from Q1 to Q2 but then decreased in Q3. However, the overall trend shows a slight improvement over the three quarters despite the dip in Q3. This suggests that Tesla's operating margin is improving on average, with some volatility.
**Conclusion:**
Yes, Tesla's operating margin is improving when considering the upward movement from Q1 to Q2, even though there was a decline in Q3. The trend indicates an overall positive improvement over the quarters provided.
πΌ Get the companies financial statements
Why we chose The Income Statement for this analysis
The Income Statement provides the most actionable short-term signals for forecasting a company’s stock price over the next few months.
Statement | Key Info | Impact on Near-Term Stock Price |
---|---|---|
Income Statement | Revenue, profit margins, EPS, growth trends | β High it reflects current performance & guidance |
Balance Sheet | Assets, liabilities, equity | β οΈ Moderate useful for long-term stability, not short-term movement |
Cash Flow | Operating, investing, financing flows | π Medium strong if paired with income data for sustainability |
Why the Income Statement wins (short-term):
- Revenue growth beats can cause 10β20% spikes.
- EPS surprises (even by a few cents) move prices immediately.
- Margins affect sentiment on efficiency and pricing power.
- It contains forward guidance language (usually buried in MD&A but hinted by trends).
- Analysts update price targets and ratings based on it first.
Getting the last n income statements
import os
from dotenv import load_dotenv
import pandas as pd
from sqlalchemy import create_engine
from edgar import Company, set_identity
class EDGARFetcher:
def __init__(self, ticker: str, form: str = "10-Q", n: int = 3):
load_dotenv()
self.identity = os.getenv("IDENTITY")
self.engine = create_engine(self.pg_conn_str)
self.ticker = ticker
self.form = form
self.n = n
# Set identity for SEC API
set_identity(self.identity)
def fetch_markdown_statements(self):
# Get company filings for last n quarters
filings = Company(self.ticker).latest(form=self.form, n=self.n)
statements = []
for filing in filings:
# convert to xbrl
xbrl = XBRL.from_filing(filing)
# extract incom statment
income_statement = xbrl.statements.income_statement()
# get data frame
df = income_statement.to_dataframe()
# conver to text
statements.append(self.rich_report_to_text(df))
return statements
def rich_report_to_text(self, df: pd.DataFrame) -> str:
"""
Convert a rich EDGAR report DataFrame to readable plain text for LLMs.
"""
lines = []
for _, row in df.iterrows():
label = row.get("original_label") or row.get("label") or row.get("concept")
values = [
f"{col}: {row[col]}" for col in df.columns
if isinstance(col, str) and col.startswith("20") and pd.notna(row[col])
]
if values:
lines.append(f"{label}: " + " | ".join(values))
return "\n".join(lines)
def run(self):
statements = self.fetch_markdown_statements()
markdowns = [statement for statement in statements]
return markdowns
fetcher = EDGARFetcher(ticker="TSLA", n=3)
statements = fetcher.run()
Before feeding the data in to the model it’s a good idea to review its size. The whole 10-Q
itself can be large. We covered text splitting in the previous post: Fin-R1: a Financial Reasoning LLM with Reinforcement Learning and CoT. In this case the data is alot smaller so we do not need to worry about splitting.
π’ Checking the token count for the data
Want to make sure that the number of tokens that are sent to the model is usable. This function calculates an approximate value for the statements.
# Create a utility function to estimate token count from a list of markdown statements
def estimate_token_count(markdown_list: list[str], chars_per_token: int = 4) -> int:
"""
Estimate the number of tokens used by a list of markdown-formatted statements.
Args:
markdown_list (list[str]): A list of markdown text blocks.
chars_per_token (int): Average number of characters per token. Default is 4.
Returns:
int: Estimated total token count.
"""
combined_text = "\n\n".join(markdown_list)
total_chars = len(combined_text)
estimated_tokens = total_chars // chars_per_token
return estimated_tokens
estimated = estimate_token_count(statements)
print(estimated)
We have the financial data now letβs turn it into a structured natural language prompt the model can reason about.
βοΈ Building the analysis prompt
We want to generate a clean, structured natural language prompt from a list of markdown-formatted income statements, which can be used as input context for a financial reasoning LLM like Fin-R1.
def build_analysis_prompt(ticker: str, markdown_list: list[str]) -> str:
header = f"You are a financial analysis model. Below are the last {len(statements)} income statements from {ticker}.\n\n"
instructions = (
"Analyze the trend in revenue and operating income.\n"
"Decide if profitability is improving or declining.\n"
"Then provide a trading signal.\n\n"
"Respond with:\n"
"Signal: <Bullish/Bearish/Neutral>\n"
"Rationale: <short explanation>\n\n"
)
body = "\n\n".join(statements)
return header + instructions + body
prompt = build_analysis_prompt(ticker, statements)
print(prompt[:300])
π§ͺ Using the model for evaluation
We use this model ollama_chat/hf.co/ernanhughes/Fin-R1-Q8_0-GGUF
for evaluation.
This is a GGUF version of SUFE-AIFLM-Lab/Fin-R1.
This allows us:
- To run the model locally
- Use the model from Ollama.
Configuring DSpy to use the model
This tells DSPy to use a local Fin-R1 model hosted via Ollama, and sets it as the default language model (lm) for all your modules.
dspy.configure(lm=dspy.LM('ollama_chat/hf.co/ernanhughes/Fin-R1-Q8_0-GGUF', api_base='http://localhost:11434', api_key=''))
π§Ύ The DSpy signature classes
In DSPy, a Signature class defines the contract for a language model module. It describes the inputs and outputs for a reasoning step in your pipeline.
AnalyzeMargins is used by the Student agent (MarginAnalyzer) to make a final trading decision.
FinancialTrendAnalysis Used by a helper module IncomeStatementAnalyzer
to analyze raw statements without going through Teacher/Critic. When you want to quickly evaluate a prompt without the full MARS pipeline, or use this as a Target agent to validate another answer.
class AnalyzeMargins(Signature):
context = InputField()
question = InputField()
signal = OutputField()
rationale = OutputField()
class FinancialTrendAnalysis(Signature):
statements = InputField()
question = InputField()
signal = OutputField()
rationale = OutputField()
class PlannerSignature(Signature):
base_question = InputField()
steps = OutputField(desc="List of reasoning substeps to answer the question")
π§© DSPy Modules in the MARS Pipeline
This implementation of the MARS architecture uses modular DSPy agents to simulate a multi-agent reasoning loop inspired by Socratic learning. Here’s how each module contributes:
π§ PlannerModule
The Planner initiates the process by breaking down the userβs high-level task (e.g. “Is the company improving profitability?”) into a set of sub-questions or reasoning steps. This primes the system for structured analysis.
π¨βπ« TeacherQuestioner
The Teacher proposes a Socratic question that encourages deeper reasoning. For example, it might ask, “What does the change in operating margin over time suggest?”
π§ CriticJudge
The Critic evaluates whether the Teacher’s question is truly Socratic β that is, whether it encourages thoughtful analysis rather than surface-level answers.
π¨βπ MarginAnalyzer
The Student conducts the actual financial reasoning, using the provided statements and teacher-guided prompt. It returns a trading signal (Bullish, Bearish, or Neutral) and a rationale.
π IncomeStatementAnalyzer
A simplified analysis module used outside the core loop for quick evaluations, baseline comparisons, or as a fallback Target agent.
Together, these agents simulate iterative financial reasoning, with opportunities for traceability, optimization, and modular extension all orchestrated using DSPy.
class IncomeStatementAnalyzer(Module):
def __init__(self):
super().__init__()
self.analyze = Predict(FinancialTrendAnalysis)
def forward(self, statements, question):
return self.analyze(statements=statements, question=question)
class TeacherQuestion(Signature):
prompt = InputField()
question = OutputField()
class TeacherQuestioner(Module):
def __init__(self, use_chain_of_thought: bool = True):
super().__init__()
self.generate = ChainOfThought(TeacherQuestion) if self.use_chain_of_thought else Predict(TeacherQuestion)
def forward(self, prompt):
return self.generate(prompt=prompt)
class CritiqueQuestion(Signature):
question = InputField()
critique = OutputField()
class CriticJudge(Module):
def __init__(self):
super().__init__()
self.evaluate = Predict(CritiqueQuestion)
def forward(self, question):
return self.evaluate(question=question)
class MarginAnalyzer(Module):
def __init__(self):
super().__init__()
self.analyze = ChainOfThought(AnalyzeMargins)
def forward(self, context, question, teacher_question=None):
if teacher_question:
question = f"{question} Consider also: {teacher_question}"
return self.analyze(context=context, question=question)
class PlannerModule(Module):
def __init__(self):
super().__init__()
self.plan = ChainOfThought(PlannerSignature)
def forward(self, base_question):
return self.plan(base_question=base_question)
ββββββββββββββ
β UserProxy β β receives base question
βββββββ¬βββββββ
β
βΌ
ββββββββββββββ
β Planner β β breaks down question into reasoning substeps
βββββββ¬βββββββ
β
βΌ
ββββββββββββββ
β Teacher β β generates Socratic question
βββββββ¬βββββββ
β
βΌ
ββββββββββββββ
β Critic β β evaluates the Teacher's question
βββββββ¬βββββββ
β
βΌ
ββββββββββββββ
β Student β β performs analysis (Fin-R1 via DSPy)
βββββββ¬βββββββ
β
βΌ
ββββββββββββββ
β Target? β β (optional) evaluates final quality
ββββββββββββββ
π MARS analysis program
class MarsAnalysisProgram(dspy.Program):
def __init__(self, planner, teacher, critic, student):
super().__init__()
self.planner = planner
self.teacher = teacher
self.critic = critic
self.student = student
def forward(self, context: str, base_question: str):
plan_out = self.planner(base_question=base_question)
teacher_out = self.teacher(prompt=context + "\n\n" + base_question)
critic_out = self.critic(question=teacher_out.question)
if "yes" in critic_out.critique.lower():
final_question = f"{base_question} Consider also: {teacher_out.question}"
else:
final_question = base_question
student_out = self.student(context=context, question=final_question)
return {
"plan": plan_out.steps,
"teacher_question": teacher_out.question,
"critique": critic_out.critique,
"final_question": final_question,
"signal": student_out.signal,
"rationale": student_out.rationale
}
We create a function analyze_ticker
this performs the full pipeline.
def analyze_ticker(ticker: str):
"""
Run the full MARS analysis pipeline for a given stock ticker.
Args:
ticker (str): Stock symbol (e.g. 'TSLA')
Returns:
dict: MARS pipeline result containing plan, teacher_question, critique,
final_question, signal, and rationale
"""
fetcher = EDGARFetcher(ticker=ticker)
statements = fetcher.fetch_markdown_statements()
prompt = build_analysis_prompt(ticker, statements)
planner = PlannerModule()
teacher = TeacherQuestioner()
critic = CriticJudge()
student = MarginAnalyzer()
program = MarsAnalysisProgram(teacher, critic, student)
result = program(
context=prompt,
base_question="Is the company improving its profitability?"
)
logger.info(f"Analysis for stock {ticker} :\n{result}")
return result
analyze_ticker("TSLA")
π§ What the MARS Pipeline Returns
After running analyze_ticker("TSLA")
, you get a dictionary like this:
{
"plan":
"- Identify key profitability metrics (net income, gross profit margin, ROE).
- Collect historical financial statements for three to five years.\n- Calculate annual gross profit margins and net income growth rates.
- Analyze operating expenses trend.
- Compute ROE annually.
- Compare against industry benchmarks.\n- Evaluate qualitative factors influencing performance.
- Synthesize results to answer the question.",
"teacher_question":
"Is there a specific period where the companyβs profitability improved despite other challenges?",
"critique":
"The question asks whether there was a specific period during which the company's profitability improved despite facing other challenges.
To address this, I need to analyze the financial data provided over different time intervals.
First, I should identify key metrics such as net income or operating profit for each relevant period.
Then, cross-reference these figures with any mentioned challenges (e.g., market conditions, operational issues) during those periods.
It's crucial to ensure that the improvement in profitability is not attributed to one-time events but reflects sustainable growth.
Additionally, comparing year-over-year changes can highlight trends and confirm if the improvement was indeed a period-specific event rather than a gradual process.",
"final_question":
"Is the company improving its profitability?",
"signal":
"The company's net income decreased significantly from $2,539 million in Q1 2023 to $1,144 million in Q1 2024.
Basic EPS dropped from $0.80 to $0.37 over the same period.
These declines suggest a deterioration in profitability.
While one quarter does not definitively indicate a long-term trend, the sharp drop warrants further investigation into underlying causes such as operational issues, market conditions, or financial restructuring.",
"rationale":
"Profitability is assessed by analyzing key financial metrics like net income, earnings per share (EPS), and other ratios over time.
The provided data shows:
1. **Net Income**: Q1 2023: $2,539 million vs. Q1 2024: $1,144 million. This represents a decrease of approximately 55%.
2. **Basic EPS**: Q1 2023: $0.80 vs. Q1 2024: $0.37. A decline in EPS indicates lower profitability per share.
3. **Diluted EPS**: Q1 2023: $0.73 vs. Q1 2024: $0.34, further confirming the downward trend.
Additional factors contributing to reduced profitability include:
**Interest Expense**: Increased from -$29 million (Q1 2023) to -$76 million (Q1 2024), reflecting higher debt costs.
**Tax Provision**: Increased from -261 million (Q1 2023) to -409 million (Q1 2024), indicating higher tax expenses relative to net income.
These trends suggest the company may be facing challenges such as:
Reduced revenue or market share.
Increased operational costs.
Financial restructuring leading to higher debt and interest obligations.
One-time charges affecting profitability.
While quarterly data is limited, the sharp decline in both net income and EPS raises red flags about potential long-term issues.
Further analysis of annual results or longer-term trends would be necessary to confirm this trend and identify underlying causes."
}
Highlighted some of the mentions in the consolidated statement below.
Letβs break down what each field means in the context of the MARS multi-agent reasoning process:
π plan
- What it is: A structured breakdown of reasoning steps generated by the Planner agent.
- Purpose: Helps the system understand how to approach the user’s original question (e.g., βIs profitability improving?β).
- Example: “Identify profitability metrics β Calculate margin trends β Compare YoY performance.”
π¨βπ« teacher_question
- What it is: A Socratic question generated by the Teacher agent to help improve the original question.
- Purpose: Promotes deeper reflection and encourages the Student to consider nuances.
- Example: “Is there a period where profitability improved despite other challenges?”
π§ββοΈ critique
- What it is: An analysis by the Critic of the Socratic questionβs quality.
- Purpose: Ensures that the Teacherβs question is truly Socratic β open-ended, thought-provoking, and useful.
- Example: “The question encourages time-series reasoning and isolates exceptional performance scenarios.”
π‘ final_question
- What it is: The version of the original question that the Student will ultimately answer.
- How it’s formed: Either the original question, or enhanced by appending the Teacherβs Socratic question β if the Critic approves it.
- Example:
"Is the company improving its profitability? Consider also: Is there a period where profitability improved despite other challenges?"
π signal
- What it is: The Studentβs conclusion about the financial trend.
- Options: One of:
Bullish
,Bearish
, orNeutral
. - Example:
"Bearish"
π§Ύ rationale
- What it is: The Studentβs chain-of-thought explanation for the
signal
. - Purpose: Makes the prediction interpretable and auditable.
- Example: “Net income and EPS dropped by ~55% YoY. Increased interest and tax expenses suggest structural issues. Indicates bearish sentiment.”
π Summary flow:
User β Planner β Teacher β Critic β Student β Result
β β β
Socratic Evaluate Answer
This structure makes your system modular, interpretable, and traceable all key for financial LLM applications.
βοΈ Deploying the solution to Hugging Face
To deploy to Hugging face spaces we need to build a GUI.
import gradio as gr
from mars import analyze_ticker
def run_analysis(ticker):
result = analyze_ticker(ticker)
return f"""
### Plan
{result['plan']}
### Teacher Question
{result['teacher_question']}
### Critique
{result['critique']}
### Final Question
{result['final_question']}
### Signal
{result['signal']}
### Rationale
{result['rationale']}
"""
with gr.Blocks() as iface:
gr.Markdown("# MARS Financial Reasoning")
ticker_input = gr.Textbox(label="Enter stock ticker", placeholder="e.g., TSLA")
run_button = gr.Button("Analyze", variant="primary")
output_box = gr.Markdown()
run_button.click(fn=run_analysis,
inputs=ticker_input,
outputs=output_box,
show_progress=True) # <-- shows loading and disables button
if __name__ == "__main__":
iface.launch()
We also need to determine if the application is running in spaces and change how DSpy loads the model. This way the application will run in the cloud and locally.
if running_in_spaces:
print("π Detected: Running in Hugging Face Spaces")
dspy.configure(
lm=LM(
model='huggingface/SUFE-AIFLM-Lab/Fin-R1',
api_base='https://api-inference.huggingface.co',
api_key=os.getenv("HF_API_KEY")
)
)
else:
print("π» Detected: Running locally")
dspy.configure(
lm=LM(
model='ollama_chat/hf.co/ernanhughes/Fin-R1-Q8_0-GGUF',
api_base='http://localhost:11434',
api_key='' # Ollama does not require key
)
)
π Conclusion
β What we covered
- π Introduced MARS, a multi-agent framework for prompt optimization using Socratic guidance.
- π Used the Fin-R1 model for financial reasoning, deployed locally via Ollama or remotely via Hugging Face.
- π§ Implemented key DSPy agents:
- Planner to break down tasks
- Teacher to generate Socratic questions
- Critic to evaluate question quality
- Student to perform financial analysis
- π§© Explained how DSPy Signatures and Modules encapsulate LLM tasks cleanly.
- π Pulled real income statements using a custom EDGARFetcher class.
- π Built a complete pipeline using DSPyβs
Program
andTeleprompter
to simulate and trace MARS behavior. - π§ͺ Demonstrated how to analyze a real company (Tesla) with prompt-driven LLM reasoning.
- βοΈ Configured automatic LLM selection for local (Ollama) or cloud (Hugging Face) inference.
- π Wrapped it all in a Gradio UI and deployed to Hugging Face Spaces.
π§ Code
The code used in the post can be found here:
π References
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning
Fin-R1: Model page Hugging Face
π Glossary of terms
Term | Description |
---|---|
MARS | Multi-Agent framework incorpoRating Socratic guidance; a system for automated prompt optimization using dialogue between agents. |
DSPy | A Python framework for composing and optimizing LLM programs using signatures, modules, and teleprompters. |
Signature | A DSPy class that defines the inputs and outputs of a module (e.g., question , context , signal ). |
Module | A class in DSPy that wraps an LLM call using a signature (e.g., MarginAnalyzer , TeacherQuestioner ). |
Planner | An agent in the MARS pipeline that decomposes the userβs task into reasoning steps. |
Teacher | Generates Socratic questions to guide the Studentβs reasoning. |
Critic | Validates whether the Teacherβs question follows Socratic methodology. |
Student | Analyzes the input using LLM reasoning, guided by Teacher and Critic feedback. |
Target | (Optional) Final evaluator agent used to assess the quality of the Student’s answer. |
AnalyzeMargins | Signature used by the Student to analyze financial statements and return a trading signal. |
Socratic Guidance | The teaching method MARS is modeled afterβusing questions (not answers) to improve reasoning. |
Prompt Optimization | The process of refining LLM prompts for improved accuracy, efficiency, or interpretability. |
Trace | A DSPy object that stores detailed steps and outputs during a reasoning programβs execution. |
Teleprompters | A DSPy component that automates the process of prompt optimization. Essentially, they act as compilers for your LLM programs. Instead of manually crafting prompts, you define the desired behavior of your modules using declarative signatures (inputs and outputs), and the Teleprompter figures out the best prompts to achieve that behavior. |
Hugging Face Spaces | Cloud platform where you deploy apps like MARS using Gradio or Streamlit interfaces. |