Technical Guides

Dimensions of Thought: A Smarter Way to Evaluate AI

Dimensions of Thought: A Smarter Way to Evaluate AI

📖 Summary

This post introduces a multidimensional reward modeling pipeline built on top of the CO_AI framework. It covers:

  • ✅ Structured Evaluation Setup How to define custom evaluation dimensions using YAML or database-backed rubrics.

  • 🧠 Automated Scoring with LLMs Using the ScoreEvaluator to produce structured, rationale-backed scores for each dimension.

  • 🧮 Embedding-Based Hypothesis Indexing Efficiently embedding hypotheses and comparing them for contrastive learning using similarity.

  • 🔄 Contrast Pair Generation Creating training pairs where one hypothesis outperforms another on a given dimension.

A Novel Approach to Autonomous Research: Implementing NOVELSEEK with Modular AI Agents

A Novel Approach to Autonomous Research: Implementing NOVELSEEK with Modular AI Agents

Summary

AI research tools today are often narrow: one generates summaries, another ranks models, a third suggests ideas. But real scientific discovery isn’t a single step—it’s a pipeline. It’s iterative, structured, and full of feedback loops.

In this post, I show how to build a modular AI system that mirrors this full research lifecycle. From initial idea generation to method planning, each phase is handled by a specialized agent working in concert.

Self-Improving Agents: Applying the Sharpening Framework to Local LLMs

Self-Improving Agents: Applying the Sharpening Framework to Local LLMs

This is the second post in a 100-part series, where we take breakthrough AI papers and turn them into working code building the next generation of AI, one idea at a time.

🔧 Summary

In my previous post, I introduced co_ai a modular implementation of the AI co-scientist concept, inspired by DeepMind’s recent paper Towards an AI Co-Scientist.

But now, we’re going deeper.

This isn’t just about running prompts through an agent system it’s about building something radically different: