Thinking in Primitives: Why AI Reasoning Should Learn to Point
From visual primitives to context-filtered reasoning, grounded verification, and AI movie repair
TL;DR
This post argues that AI reasoning should not operate over everything it can see, read, or detect. It should operate over the right primitives for the current task.
The paper Thinking with Visual Primitives shows that multimodal models reason better when they can point to visual entities using boxes and points. That solves a Reference Gap: language is often too vague to anchor reasoning to the right part of an image.