Prompt Engineering

General Reasoner: The smarter Local Agent

🔧 Summary

The General Reasoner paper shows how we can train LLMs to reason across domains using diverse data and a generative verifier. In this post, I walk through our open-source implementation showing how we built a modular reasoning agent capable of generating multiple hypotheses, evaluating them with an LLM-based judge, and selecting the best answer.

🧠 What We Built

We built a GeneralReasonerAgent that:

Dynamically generates multiple hypotheses using different reasoning strategies (e.g., cot, debate, verify_then_answer, etc.)
Evaluates each pair of hypotheses using either a local LLM judge or our custom MR.Q evaluator
Classifies the winning hypothesis using rubric dimensions
Logs structured results to a PostgreSQL-backed system

All of this was integrated with our existing co_ai framework, which includes:

Summary

This post provides a comprehensive guide to prompt engineering, the art of crafting effective inputs for Large Language Models (LLMs). Mastering prompt engineering is crucial for maximizing the potential of LLMs and achieving desired results.

Effective prompting is the easiest way to enhance your experience with Large Language Models (LLMs).

The prompts we make are our interface to LLMs. This is how we communicate with them. This is why it is important to understand how to do it well.