The Eye That Sees

The Eye That Sees
Page content

Using AI to Decode Symbols Without Assuming Meaning

Executive Summary: From Symbol to System

We set out to understand a single image. We ended up building a system that can understand structure itself.

We started with a constraint:

Assume we do not understand the symbol.

No prior knowledge. No accepted interpretations.

Just an image: an eye, a triangle, rays, an unfinished pyramid.

From that starting point, we:

  • decomposed the symbol into observable parts
  • grounded those parts in documented history
  • separated structure from speculation

That produces a defensible analysis.

But it also exposes something more important:

the symbol is not a message, it is a composition.

That shift changed the problem.

We stopped asking:

What does this symbol mean?

And started asking:

What is it made of and can that structure be detected automatically?

To answer that, we built a working system:

  • detect symbolic primitives
  • represent them in a shared form (emoji)
  • compare symbols by structure
  • encode them into a searchable surface (ZeroModel)

This is not an interpretation engine.

It is a structure engine.

The result is simple: symbols can now be decomposed, compared, and searched without assuming meaning.


1. Solving the Symbol

The Eye of Providence first appeared as a promising test case for symbolic analysis. Its persistence, recognizability, and reputation for hidden meaning made it a natural candidate for experimentation.

The historical record, however, is more explicit than expected. Documentation around the Great Seal describes the major elements of the symbol in clear terms:

  • an eye, enclosed in a triangle
  • surrounded by rays of light
  • positioned above an unfinished pyramid
  • accompanied by a date marking the founding period

The structure also developed over time. Early versions included the Eye without the pyramid, indicating that the final composition was assembled rather than discovered as a single unit.

This detail changes how the symbol should be approached.

Instead of treating it as a fixed image with a single meaning, it is more useful to treat it as a constructed arrangement of parts.

Each component carries a recognizable role:

  • the eye: observation or providence
  • the triangle: enclosure or structure
  • the rays: illumination or influence
  • the pyramid: a layered system or foundation
  • the spatial separation: perspective or external vantage

These elements combine into a coherent whole, but they remain separable. Their roles are stable even when the composition changes.

At that point, the central question shifts. Interpretation becomes less important than structure.

What matters is not what the symbol means, but how it is built and where those same components appear elsewhere.

That shift from meaning to composition is what makes the system in the next sections possible.


2. Symbols Are Compositions

The structure identified in the previous section suggests a different way to think about symbols.

The Eye of Providence does not behave like a single object with a fixed meaning. It behaves like a composition built from smaller, reusable elements. Those elements retain their roles even when the surrounding image changes.

This observation can be made concrete by isolating the components already identified:

Primitive Role
👁️ Eye observation / perception
🔺 Triangle enclosure / structure
✨ Rays illumination / influence
🏛️ Pyramid layered system / foundation
⬆️ Above external perspective / separation

These primitives are not interpretations in the abstract sense. They are functional roles inferred from visible structure and documented context. Each can appear independently or in combination with others.

Once expressed in this form, comparison becomes straightforward.

The Eye of Providence can be represented as:

👁️ 🔺 ✨ 🏛️ ⬆️

A related symbol, such as a Masonic eye variant, typically resolves to a subset:

👁️ ✨ ⬆️

The overlap is explicit. Three primitives are shared. Two are absent. The relationship between the symbols can be described in structural terms rather than narrative ones.

This representation makes it possible to compare symbols without relying on surface appearance. Similarity emerges from shared components, not visual likeness alone.

The choice of emoji as a representation layer is practical rather than theoretical.

Emoji provide:

  • discrete, visually distinct tokens
  • a shared vocabulary that requires no additional training
  • enough semantic weight to act as stand-ins for symbolic roles

Other representations are possible. Embedding vectors or custom ontologies could serve the same purpose with greater precision. Emoji are sufficient to demonstrate the underlying idea while remaining inspectable and easy to manipulate.

The key property is not the specific symbols used, but the existence of a shared intermediate form.

Once a symbol can be decomposed into primitives and expressed in that form, it can be:

  • compared against other symbols
  • clustered by structural similarity
  • indexed for retrieval

At that point, the problem shifts again.

The task is no longer limited to explaining a single symbol. It becomes a question of identifying recurring structures across many symbols and measuring how those structures vary.

That transition leads directly into the implementation.


3. The Representation Layer

At this point we have primitives.

The remaining problem is not conceptual, it is technical.

How do we encode these primitives so they can be stored, compared, and searched?

The representation must satisfy four constraints:

  • Discrete: each element should be clearly separable
  • Composable: combinations should preserve structure
  • Comparable: similarity should be measurable
  • Inspectable: the output should remain understandable without additional tooling

The primitive set already provides structure. What remains is a way to encode it consistently.

Emoji provide a workable solution. Emoji are used here as compact symbolic tokens: inspectable by humans, comparable by machines, and sufficient for a first intermediate representation.

Each primitive can be mapped to a token:

  • 👁️ → eye
  • 🔺 → triangle
  • ✨ → rays
  • 🏛️ → pyramid
  • ⬆️ → above

A symbol then becomes a sequence of tokens rather than a single image.

For example:

👁️ 🔺 ✨ 🏛️ ⬆️

This sequence captures the presence of components, but not yet their strength or relative importance. That information is preserved earlier in the pipeline as scores.

A more complete representation combines both:

eye=0.91, triangle=0.88, rays=0.72, pyramid=0.95, above=0.63
→ 👁️ 🔺 ✨ 🏛️ ⬆️

The numeric layer retains precision. The emoji layer provides a simplified view suitable for comparison and inspection.

This separation is intentional.

  • The score vector acts as the underlying data
  • The emoji sequence acts as a readable projection

The system operates on the scores. The emoji make the output usable.

At this stage, two symbols can be compared directly by evaluating overlap in their primitive sets. A simple measure such as intersection over union is sufficient to establish similarity.

For example:

A: 👁️ 🔺 ✨ 🏛️ ⬆️  
B: 👁️ ✨ ⬆️  

overlap = 3 / 5

This produces a structural similarity score without requiring any interpretation of meaning.

The same approach extends beyond images.

Any input that can be reduced to a set of primitives can be mapped into the same representation. The primitives themselves may differ by modality, but the encoding layer remains consistent.

  • text → semantic roles
  • video → actions and interactions
  • audio → tempo and intensity patterns

The representation does not depend on the source. It depends only on the ability to extract a stable set of components.

At that point, the system gains a useful property.

Different types of data can be compared within a shared space, as long as they resolve to compatible primitives. The representation acts as a bridge between modalities.

This is sufficient for a prototype.

The goal is not to define a complete symbolic language. The goal is to establish a minimal intermediate form that supports decomposition, comparison, and extension.

With that in place, the next step is to show that the primitives can be extracted in practice.

This is the key transition.

The system does not treat symbols as labels.

It treats them as distributions over structure.

That changes what comparison means.

Two symbols no longer need to match exactly. They only need to share enough structure to overlap.

Similarity becomes a matter of shared composition, not identical identity.

This is what allows symbols to be compared, clustered, and searched without requiring a single “correct” interpretation.


4. Building the Prototype

At this point, the system is defined. This section demonstrates that the structure can be implemented.

The prototype takes an image as input and produces a set of primitive scores, along with a corresponding symbolic representation. The implementation relies on classical computer vision techniques rather than pretrained models. This keeps the process transparent and allows each step to be inspected.

The pipeline follows a fixed sequence:

  1. Preprocessing The image is converted to grayscale and normalized. Edge detection is applied to highlight structural boundaries.

  2. Contour Extraction Contours are identified from the edge map. These serve as candidates for geometric primitives.

  3. Shape Approximation Each contour is approximated to a geometric form:

    • ellipses suggest the presence of an eye
    • triangles suggest enclosure or structure
    • radial patterns suggest rays
  4. Spatial Analysis The relative positions of detected shapes are evaluated:

    • vertical separation indicates “above”
    • central alignment increases confidence in composite structures
  5. Scoring Each primitive is assigned a score based on detection confidence and spatial consistency. The result is a vector of primitive strengths.

A typical output from this process looks like this:

{
  "eye": 0.91,
  "triangle": 0.88,
  "rays": 0.72,
  "pyramid": 0.95,
  "above": 0.63
}

This vector is the primary output of the system. It captures the structural composition of the image in a form that can be compared numerically.

The emoji representation is derived from this vector:

👁️ 🔺 ✨ 🏛️ ⬆️

The prototype was evaluated against a small test set to validate behavior across different inputs.

Examples:

eye_symbol_1:
detected: ['triangle', 'eye', 'rays', 'above', 'sun']
expected: ['triangle', 'eye', 'rays']
precision: 0.6, recall: 1.0

star:
detected: ['sun', 'eye', 'star', 'rays']
expected: ['star']
precision: 0.25, recall: 1.0

tree:
detected: ['eye', 'tree', 'rays']
expected: ['tree']
precision: 0.33, recall: 1.0

The results show a consistent pattern.

Recall remains high across cases. The system detects relevant primitives reliably. Precision is lower, particularly where primitives overlap conceptually. For example, stars and suns both trigger radial and brightness-based features, leading to shared detections.

This behavior is expected.

The prototype does not enforce strict classification. It identifies structural features and assigns them to multiple primitives when appropriate. Overlap is preserved rather than suppressed.

This allows the output to reflect ambiguity present in the input, rather than forcing a single label.

At this stage, the system satisfies the core requirement:

  • primitives can be detected from real images
  • detections can be expressed as structured scores
  • scores can be converted into a shared representation

This is sufficient to move beyond individual symbols.

The next step is to examine how overlapping detections affect comparison.

Original Grayscale Edges Contours

Figure 1 — From Image to Symbolic Signature

The implementation can be summarized as a short symbolic pipeline: structural and semantic signals are merged into primitives, then projected into a shared representation.

    %%{init: {
  'theme': 'base',
  'themeVariables': {
    'primaryColor': '#e3f2fd',
    'primaryTextColor': '#0d2135',
    'primaryBorderColor': '#1e88e5',
    'lineColor': '#1565c0',
    'secondaryColor': '#bbdefb',
    'tertiaryColor': '#90caf9',
    'background': '#f0f8ff',
    'mainBkg': '#e3f2fd',
    'textColor': '#0d2135'
  },
  'flowchart': {
    'curve': 'basis',
    'padding': 20
  }
}}%%

graph TB
    subgraph Input["🖼️ INPUT"]
        A[("📷 Image<br><small>Eye of Providence</small>")]
        style A fill:#42a5f5,stroke:#1565c0,stroke-width:3px,color:#fff
    end

    subgraph Detection["🔍 DETECTION LAYER"]
        direction TB
        B1["🧠 CLIP<br><small>Semantic Scoring</small>"]
        B2["👁️ CV2<br><small>Structural Features</small>"]
        style B1 fill:#7e57c2,stroke:#4527a0,stroke-width:2px,color:#fff
        style B2 fill:#7e57c2,stroke:#4527a0,stroke-width:2px,color:#fff
    end

    subgraph Primitives["🧩 PRIMITIVE EXTRACTION"]
        direction LR
        C1["👁️ Eye<br><small>0.91</small>"]
        C2["🔺 Triangle<br><small>0.88</small>"]
        C3["✨ Rays<br><small>0.72</small>"]
        C4["🏛️ Pyramid<br><small>0.95</small>"]
        C5["⬆️ Above<br><small>0.63</small>"]
        style C1 fill:#26c6da,stroke:#00695c,stroke-width:2px,color:#fff
        style C2 fill:#26c6da,stroke:#00695c,stroke-width:2px,color:#fff
        style C3 fill:#26c6da,stroke:#00695c,stroke-width:2px,color:#fff
        style C4 fill:#26c6da,stroke:#00695c,stroke-width:2px,color:#fff
        style C5 fill:#26c6da,stroke:#00695c,stroke-width:2px,color:#fff
    end

    subgraph Graph["🕸️ SYMBOLIC GRAPH"]
        direction TB
        D1["🔺 contains 👁️"]
        D2["👁️ emits ✨"]
        D3["👁️ above 🏛️"]
        style D1 fill:#ffa726,stroke:#e65100,stroke-width:2px,color:#0d2135
        style D2 fill:#ffa726,stroke:#e65100,stroke-width:2px,color:#0d2135
        style D3 fill:#ffa726,stroke:#e65100,stroke-width:2px,color:#0d2135
    end

    subgraph IR["🔤 INTERMEDIATE REPRESENTATION"]
        E["👁️ 🔺 ✨ 🏛️ ⬆️<br><small>Emoji Sequence</small>"]
        style E fill:#ef5350,stroke:#b71c1c,stroke-width:3px,color:#fff
    end

    subgraph ZeroModel["🗺️ ZEROMODEL INDEXING"]
        direction LR
        F1["📊 Score Matrix"]
        F2["🖼️ PNG Surface"]
        F3["🧭 Spatial Navigation"]
        style F1 fill:#66bb6a,stroke:#1b5e20,stroke-width:2px,color:#fff
        style F2 fill:#66bb6a,stroke:#1b5e20,stroke-width:2px,color:#fff
        style F3 fill:#66bb6a,stroke:#1b5e20,stroke-width:2px,color:#fff
    end

    subgraph Output["🎯 OUTPUT"]
        G["🔎 Similar Symbols<br><small>Masonic Eye, Sun, Star...</small>"]
        style G fill:#42a5f5,stroke:#1565c0,stroke-width:3px,color:#fff
    end

    A --> B1
    A --> B2
    B1 --> C1
    B1 --> C2
    B1 --> C3
    B1 --> C4
    B2 --> C1
    B2 --> C2
    B2 --> C3
    B2 --> C4
    B2 --> C5

    C1 --> D1
    C2 --> D1
    C1 --> D2
    C3 --> D2
    C1 --> D3
    C4 --> D3
    C5 --> D3

    D1 --> E
    D2 --> E
    D3 --> E

    E --> F1
    F1 --> F2
    F2 --> F3
    F3 --> G

    %% Subgraph styling
    style Input fill:#e3f2fd,stroke:#1e88e5,stroke-width:2px
    style Detection fill:#ede7f6,stroke:#4527a0,stroke-width:2px
    style Primitives fill:#e0f7fa,stroke:#00695c,stroke-width:2px
    style Graph fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style IR fill:#ffebee,stroke:#b71c1c,stroke-width:2px
    style ZeroModel fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px
    style Output fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
  

The original image is progressively transformed into grayscale, edge maps, and contour overlays to expose its underlying geometry. Each stage removes visual noise while preserving structure, making implicit shapes explicit.
This sequence shows how the system moves from appearance to measurable structure—the foundation for all later symbolic detection.


A Concrete Example From Image to Structure

To make the pipeline explicit, we can trace a single image all the way through the system.

Input

Eye of Providence (Great Seal variant)

Step 1 Structural Extraction (CV2)

The image is transformed through a sequence of structural filters:

Stage Output Description
Grayscale Normalized intensity, removes color noise
Edges Canny edge map highlights boundaries
Contours Geometric shapes extracted from edges

At this stage, the system is not identifying meaning, it is identifying geometry and structure.

From these signals, candidate primitives emerge:

  • elliptical contour → potential eye
  • triangular contour → potential triangle
  • radial edge dispersion → potential rays
  • lower mass region → potential pyramid

Step 2 Semantic Scoring (CLIP)

In parallel, the image is compared against primitive concepts:

"eye", "triangle", "rays", "pyramid", "above"

This produces similarity scores based on learned visual-language alignment.


Step 3 Combined Primitive Scores

The structural and semantic signals are combined:

eye       = 0.91  
triangle  = 0.88  
rays      = 0.72  
pyramid   = 0.95  
above     = 0.63  

This is the core representation of the image.

Not a label. Not a classification.

A distribution of symbolic primitives.


Step 4 Graph Construction

The system then evaluates relationships between primitives:

triangle → contains → eye  
eye      → emits    → rays  
eye      → above    → pyramid  

This converts the flat vector into a structured symbolic graph.


Step 5 Intermediate Representation

The graph is projected into the shared symbolic layer:

👁️ 🔺 ✨ 🏛️ ⬆️

This is not decoration.

It is a compressed, comparable encoding of structure.


Step 6 Comparison Against Another Symbol

Now consider a related image:

Masonic Eye Variant

The system produces:

eye       = 0.84  
rays      = 0.69  
above     = 0.71  
triangle  = 0.12  
pyramid   = 0.08  

→ 👁️ ✨ ⬆️

Step 7 Structural Similarity

We can now compare the two symbols:

A: 👁️ 🔺 ✨ 🏛️ ⬆️  
B: 👁️ ✨ ⬆️  

Overlap: 3 / 5 primitives

This establishes similarity without relying on visual matching.

The images differ in appearance, but share structural components.


What This Demonstrates

This example shows the full pipeline:

  1. Image → structural features
  2. Features → primitive scores
  3. Scores → symbolic graph
  4. Graph → intermediate representation
  5. Representation → comparable structure

The system does not decode meaning.

It extracts structure and makes that structure measurable.


5. Symbolic Overlap

The prototype does not produce clean, mutually exclusive labels. A single image often activates multiple primitives at once, even when only one was expected.

This is visible in the evaluation results:

star:
detected: ['sun', 'eye', 'star', 'rays']

sun:
detected: ['eye', 'halo', 'sun', 'star', 'above', 'rays']

At first glance, this looks like reduced precision. A star should not necessarily be a sun. A sun should not necessarily include an eye.

That interpretation assumes that primitives are exclusive categories. The prototype shows that they are not.

The detection process operates on structural features:

  • radial symmetry
  • brightness gradients
  • central focus
  • contour density

These features are shared across multiple symbolic interpretations. A sun and a star both exhibit radial structure. Rays of light appear in both cases. The system responds to those shared signals.

As a result, the output is not a single label. It is a set of overlapping activations.

This behavior is consistent across other examples:

tree:
detected: ['eye', 'tree', 'rays']

eagle:
detected: ['eagle', 'eye', 'above', 'rays']

The additional primitives are not random. They correspond to partial structural matches:

  • “eye” can be triggered by elliptical regions or high-contrast focal points
  • “rays” can be triggered by directional edge patterns
  • “above” can be triggered by vertical separation

The system is capturing fragments of structure rather than enforcing a single interpretation.

This leads to a different way of reading the output.

Instead of asking which label is correct, it becomes more useful to ask:

  • which primitives are strongly activated
  • which primitives co-occur
  • how the activation pattern compares to other inputs

A symbol can then be represented as a signature, not a classification.

For example:

Eye of Providence:
[eye, triangle, rays, pyramid, above]

Masonic variant:
[eye, rays, above]

Star:
[star, rays, sun]

Sun:
[sun, rays, star, above]

These signatures overlap in structured ways. Similar symbols share subsets of primitives. Dissimilar symbols diverge.

This has two immediate consequences.

First, similarity becomes a matter of degree, not identity. Two symbols do not need to match exactly to be related. Shared structure is sufficient.

Second, ambiguity is preserved rather than removed. The system does not force a decision between “sun” and “star” when both are supported by the data.

This behavior is closer to how continuous representations operate in embedding models. Multiple related concepts occupy nearby regions and can be activated together. The difference here is that the representation remains explicit and interpretable.

The prototype therefore establishes an important property:

Symbolic meaning is not exclusive. It is compositional and overlapping.

This property becomes useful in the next step.

If symbols can be represented as overlapping sets of primitives, then those sets can be encoded, stored, and compared at scale.

Example Results: Symbolic Overlap in Practice

The table shows detected primitives compared to expected ones across multiple images. While recall remains high, precision varies due to overlapping structural features shared between symbols.
This illustrates that symbols are not exclusive categories but compositional structures with shared components.

Image Expected Symbols Detected Symbols Precision Recall
Eye #1 rays, eye, triangle above, triangle, rays, eye, sun 0.60 1.00
Eye #2 rays, eye, triangle above, triangle, rays, eye, sun 0.60 1.00
Eye #3 rays, eye, triangle star, above, rays, triangle, eye, sun 0.50 1.00
Star star star, sun, eye, rays 0.25 1.00
Sun sun, rays star, above, rays, eye, sun, halo 0.33 1.00
Tree tree tree, eye, rays 0.33 1.00
Eagle eagle eagle, above, eye, rays 0.25 1.00
Landscape above, tree, eye, rays 0.00 0.00
Dog above, eye, rays 0.00 0.00

The system treats symbols as distributions over structure. That is what makes comparison possible even when identity is ambiguous.


6. ZeroModel: From Symbols to Searchable Structure

Up to this point, we have a working pipeline:

  • an image goes in
  • primitives are detected
  • the result is a structured set of scores

For example:

eye       = 0.91  
triangle  = 0.88  
rays      = 0.72  
pyramid   = 0.95  
above     = 0.63  

This is already useful. It tells us what structures are present.

But it is not yet scalable.

We cannot compare millions of these objects efficiently by looping through vectors. We need a representation that is compact, indexable, and fast to query.

That is where ZeroModel enters.


The Core Idea

ZeroModel does something very simple:

It takes a matrix of primitive scores and encodes it into an image.

That’s it.

  • Each row represents an item (an image, a document, a scene)
  • Each column represents a primitive or metric
  • Each value is encoded as an intensity

The result is a dense visual structure typically stored as a PNG.

This is not just a visualization.

It is the data structure itself.

The same representation that can be inspected visually is also the structure the system operates on.


From Vectors to a Surface

Instead of storing:

[0.91, 0.88, 0.72, 0.95, 0.63, ...]

we encode it as:

  • pixel intensity
  • spatial position
  • channel value (if needed)

So a dataset becomes:

  • a 2D grid of rows (items)
  • a wide axis of columns (primitives)
  • intensity representing strength

At small scale, this looks like an image.

At large scale, it becomes a structured surface of meaning.


Why This Matters

Once the data is encoded this way, something changes.

We are no longer working with:

  • lists
  • dictionaries
  • embeddings in isolation

We are working with a spatially organized representation.

That enables:

  • fast slicing (rows, columns, regions)
  • locality-based search
  • hardware-accelerated operations (image-level ops)

In other words:

We can navigate structure instead of scanning data.

Once structure is spatial, similarity becomes distance.


The Important Constraint

ZeroModel only works because of what came before it.

If the primitives were noisy or inconsistent, the encoding would collapse into randomness.

But because:

  • primitives are reusable
  • scores are comparable
  • overlap is meaningful

the encoded surface retains structure.

Clusters form naturally.

Similar items appear near each other.

Outliers stand out immediately.


What This Gives Us

With this representation, we can:

  1. Store large numbers of symbolic decompositions compactly
  2. Compare items by spatial proximity instead of pairwise computation
  3. Build indexes that operate on structure instead of raw data
  4. Query patterns directly (e.g. “high eye + high rays + above”)

This is the first point where the system stops being a prototype and starts becoming infrastructure.


A Concrete Way to Think About It

Think of ZeroModel like this:

Instead of searching through a library book by book, you build a map of the entire library.

You don’t ask:

“Does this book match?”

You ask:

“Where on the map do similar books live?”

The PNG is that map.

Another way to understand this is in terms of resonance.

Because all items are encoded into the same surface, a query does not search the dataset.

It resonates with it.

The query activates regions where its structure is already present.

Similarity is not computed from scratch.

It emerges from where the structure “vibrates” most strongly on the surface.


What It Is Not

It is not:

  • a trained model
  • a neural network
  • a learned embedding space

It does not learn weights.

It preserves structure.

That is why we call it ZeroModel.


Where This Leads

Once primitive scores can be encoded into a shared surface:

  • images can be compared to images
  • images can be compared to text (via the same primitives)
  • patterns can be detected across large datasets

And crucially:

search becomes navigation.

That transition from comparison to navigation is what makes the next step possible.


7. Indexing: From Storage to Navigation

Once primitive scores are encoded into a ZeroModel surface, the problem changes.

We are no longer asking how to store data.

We are asking:

How do we move through it?


The Shift

In a traditional system, search looks like this:

  • take a query
  • compare it against every item
  • rank the results

Even with optimizations, this is still a form of scanning.

ZeroModel allows a different approach.

Because the data is already organized spatially, we can treat it as something to navigate.


The Structure

The encoded surface is not just a flat image.

It is organized:

  • rows represent items
  • columns represent primitives
  • regions represent clusters of similar structure

This gives us a natural hierarchy.

At a high level:

  • broad regions correspond to coarse patterns
  • smaller regions refine those patterns
  • individual rows represent specific items

So instead of searching linearly, we move:

  • from large regions
  • into smaller regions
  • down to specific matches

How a Query Works

A query in this system is just another set of primitive scores.

For example:

eye       = high  
rays      = high  
triangle  = medium  
above     = present  

This can be encoded in the same way as the dataset.

Once encoded, the query becomes a position on the surface.

Search becomes:

  1. locate the region that matches the query pattern
  2. move toward areas of higher similarity
  3. retrieve nearby items

There is no need to compare against every row.

We move directly to the relevant area.


Why This Is Fast

The speed comes from two properties:

1. Locality

Similar structures are stored near each other.

That means:

  • you don’t need global comparisons
  • you only explore a small region

2. Fixed Representation

Everything is already encoded in the same format.

There is no transformation step at query time.

The system is always “ready to search.”


What This Enables

With indexing in place, we can:

  • retrieve structurally similar symbols instantly
  • cluster large datasets by primitive composition
  • explore variations of a pattern without predefined labels
  • detect outliers as regions with low density

More importantly:

we can search for structure directly.

Not keywords. Not labels. Not categories.

Structure.


A Simple Example

If we query for:

  • strong eye
  • strong rays
  • clear separation (above)

we land in the region containing:

  • Eye of Providence variants
  • Masonic eye symbols
  • other “observer + illumination” structures

Even if the images look different at the surface level, they cluster together because their primitives align.


The Practical Outcome

At this point, the system has three properties:

  1. Decomposition complex inputs become primitives
  2. Representation primitives become a shared surface
  3. Indexing the surface becomes navigable

That combination is what allows the system to scale.


The Key Transition

Before this point, we were analyzing individual symbols.

After this point, we can explore entire datasets.

That is the transition:

from understanding one symbol to navigating symbolic structure at scale


Where This Leads

Once navigation is possible:

  • we can search across images
  • extend the same primitives to text
  • eventually unify multiple modalities

The system is no longer tied to a single input type.

It operates on structure.


8. The Bigger Question: Why Do These Patterns Persist?

Up to this point, we have stayed grounded in what we can measure:

  • symbols can be decomposed into primitives
  • those primitives can be detected across images
  • different symbols share overlapping structure
  • that structure can be encoded and indexed

That is the system.

What remains is the question that motivated it.


The Observation

Certain symbols appear again and again:

  • the eye
  • the star
  • the sun
  • the eagle
  • hierarchical structures

They persist across:

  • cultures
  • time periods
  • mediums

These are not isolated occurrences.

They are recurring patterns.


What We Can Now Say (Carefully)

Before this work, we could only describe that persistence.

Now we can begin to measure it.

We can:

  • extract primitives from large sets of images
  • compare their structure
  • cluster symbols by similarity
  • track how those clusters appear over time

This turns a qualitative observation into a quantitative one.


What This Changes

Before this system, symbolic analysis had two modes:

  • manual interpretation (subjective, slow)
  • visual similarity (fragile, surface-level)

This introduces a third:

structural comparison

Symbols can now be compared based on what they are made of, not how they look or how they are described.

That makes new types of analysis possible:

  • tracking primitive frequency across datasets
  • identifying recurring structural patterns
  • clustering symbols without predefined labels

This does not answer why symbols persist.

But it makes that question measurable.


The Hypothesis

A reasonable hypothesis emerges:

Persistent symbols are composed of reusable structural primitives, and those primitives reflect recurring patterns in how humans represent ideas.

This does not claim hidden meaning.

It does not require conspiracy.

It only requires that:

  • humans reuse structures that work
  • those structures become stable over time

Why This Matters

If symbolic structure can be measured, then it can be studied.

For example:

  • Do certain primitives appear more frequently in specific eras?
  • Do clusters of symbols shift during periods of change?
  • Do different cultures converge on similar structures independently?

These questions were previously difficult to approach in a systematic way.

Now they are at least technically feasible.


The Important Boundary

This system does not interpret symbols.

It does not assign meaning.

It identifies structure.

Any interpretation comes after.


Where This Leads

The immediate application is clear:

  • large-scale symbolic analysis
  • structural comparison across datasets
  • discovery of related forms without predefined labels

Beyond that, there is a broader direction:

using structure as a lens to study human expression.

That includes:

  • images
  • text
  • eventually video and audio

The system does not depend on the medium.

It depends on decomposition and representation.


The Position We Take

We are not claiming to explain why these symbols exist.

We are claiming something more modest and more useful:

we now have a way to detect and compare the structures that make them persist.

That is enough to move forward.

Figure 2: Full System Architecture

By this point, the Eye of Providence is only the entry point. The broader system can be understood as a general pipeline for extracting, representing, comparing, and indexing symbolic structure across modalities

    
%%{init: {
  'theme': 'base',
  'themeVariables': {
    'primaryColor': '#e3f2fd',
    'primaryTextColor': '#0d2135',
    'primaryBorderColor': '#1e88e5',
    'lineColor': '#1565c0',
    'secondaryColor': '#bbdefb',
    'tertiaryColor': '#90caf9',
    'background': '#f0f8ff',
    'mainBkg': '#e3f2fd',
    'textColor': '#0d2135'
  },
  'flowchart': {
    'curve': 'basis',
    'padding': 20
  }
}}%%

flowchart TD
    subgraph Inputs["🌐 INPUT MODALITIES"]
        direction LR
        A1["🖼️ Images"]
        A2["📝 Text"]
        A3["🎬 Video"]
        A4["🎵 Audio / Music"]
        style A1 fill:#42a5f5,stroke:#1565c0,stroke-width:2px,color:#fff
        style A2 fill:#42a5f5,stroke:#1565c0,stroke-width:2px,color:#fff
        style A3 fill:#42a5f5,stroke:#1565c0,stroke-width:2px,color:#fff
        style A4 fill:#42a5f5,stroke:#1565c0,stroke-width:2px,color:#fff
    end

    subgraph Detection["🔍 PRIMITIVE DETECTION LAYER"]
        B["🧩 Extract Primitives<br><small>CV2 + CLIP / Semantic + Structural</small>"]
        style B fill:#7e57c2,stroke:#4527a0,stroke-width:2px,color:#fff
    end

    A1 --> B
    A2 --> B
    A3 --> B
    A4 --> B

    subgraph Scores["📊 PRIMITIVE SCORE VECTORS"]
        C["👁️ 0.91  🔺 0.88  ✨ 0.72  🏛️ 0.95  ⬆️ 0.63 ..."]
        style C fill:#26c6da,stroke:#00695c,stroke-width:2px,color:#fff
    end

    B --> C

    C --> D["🔤 Intermediate Representation<br/><small>Emoji / Symbolic Tokens</small>"]
    style D fill:#ef5350,stroke:#b71c1c,stroke-width:3px,color:#fff

    C --> E["🕸️ Symbol Graph Construction"]
    style E fill:#ffa726,stroke:#e65100,stroke-width:2px,color:#0d2135

    subgraph GraphAnalysis["🧠 GRAPH ANALYSIS"]
        direction LR
        F["📐 Graph Similarity"]
        G["🔬 Invariant Extraction"]
        H["🧬 Clustering"]
        style F fill:#66bb6a,stroke:#1b5e20,stroke-width:2px,color:#fff
        style G fill:#66bb6a,stroke:#1b5e20,stroke-width:2px,color:#fff
        style H fill:#66bb6a,stroke:#1b5e20,stroke-width:2px,color:#fff
    end

    E --> F
    E --> G
    E --> H

    subgraph Shared["🤝 SHARED SYMBOLIC SIGNATURE"]
        I["👁️ 🔺 ✨ 🏛️ ⬆️"]
        style I fill:#ab47bc,stroke:#6a1b9a,stroke-width:3px,color:#fff
    end

    D --> I
    F --> I
    G --> I
    H --> I

    subgraph ZeroModel["🗺️ ZEROMODEL ENCODING"]
        direction LR
        J["📊 Score Matrix"]
        K["🖼️ PNG Surface"]
        L["🧭 Spatial Navigation"]
        style J fill:#8d6e63,stroke:#4e342e,stroke-width:2px,color:#fff
        style K fill:#8d6e63,stroke:#4e342e,stroke-width:2px,color:#fff
        style L fill:#8d6e63,stroke:#4e342e,stroke-width:2px,color:#fff
    end

    I --> J
    J --> K
    K --> L

    subgraph Search["🔎 SEARCH & RETRIEVAL"]
        direction LR
        M["🧱 Structure Search"]
        N["🔍 Pattern Retrieval"]
        O["🔁 Cross‑Modal Comparison"]
        style M fill:#42a5f5,stroke:#1565c0,stroke-width:2px,color:#fff
        style N fill:#42a5f5,stroke:#1565c0,stroke-width:2px,color:#fff
        style O fill:#42a5f5,stroke:#1565c0,stroke-width:2px,color:#fff
    end

    L --> M
    L --> N
    L --> O

    O --> P["💡 Research Direction<br/><small>persistent symbolic structure<br/>across human expression</small>"]
    style P fill:#e3f2fd,stroke:#1e88e5,stroke-width:3px,color:#0d2135

    %% Subgraph backgrounds
    style Inputs fill:#e3f2fd,stroke:#1e88e5,stroke-width:2px
    style Detection fill:#ede7f6,stroke:#4527a0,stroke-width:2px
    style Scores fill:#e0f7fa,stroke:#00695c,stroke-width:2px
    style GraphAnalysis fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px
    style Shared fill:#f3e5f5,stroke:#6a1b9a,stroke-width:2px
    style ZeroModel fill:#efebe9,stroke:#4e342e,stroke-width:2px
    style Search fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
  

A single image is processed through structural extraction, semantic scoring, graph construction, and symbolic projection. The result is a compact representation of the image as a set of primitives and relationships.
This demonstrates that symbolic structure can be derived directly from visual input without requiring predefined labels.


8.5 A Symbolic Weather Station

If symbolic structure can be measured, then it can also be tracked.

Across time. Across datasets. Across cultures.

This suggests a different kind of system:

a symbolic weather station.

Instead of measuring temperature or pressure, it measures:

  • primitive frequency
  • structural overlap
  • symbolic density

Spikes in certain primitives or patterns may indicate shifts in:

  • cultural focus
  • collective attention
  • representation of authority or perception

This is not interpretation.

It is measurement.

But it creates the conditions for something new:

the ability to observe how symbolic structure evolves over time.


9. Conclusion: From One Symbol to a System

We started with a single image.

A symbol that appears on a dollar bill. A symbol that has been discussed, interpreted, and debated for decades.

The original goal was simple:

use AI to understand what it meant.


What We Found

The historical answer exists.

The Eye of Providence is not an unsolved mystery. Its components are documented, its evolution is known, and its meaning at least in its original context is well understood.

That could have been the end of the investigation.

It wasn’t.


The Shift in Perspective

Once the symbol was broken down into its components, the problem changed.

Instead of asking:

“What does this symbol mean?”

we began asking:

“What is this symbol made of?”

That shift made the system possible.


What We Built

We built a pipeline that:

  • extracts primitives from images
  • represents them in a shared form
  • compares symbols by structure
  • encodes those structures into a searchable surface
  • and navigates that surface efficiently

Each step is simple on its own.

Together, they form something new:

a way to work with symbolic structure directly.


What It Shows

The results are not perfect.

Precision is low in some cases. Symbols overlap. Different images share the same primitives.

But that behavior is consistent.

And more importantly, it is explainable.

The system is not guessing.

It is detecting structure.


What Changed

At the beginning, the symbol looked like something to decode.

By the end, it looks like something to decompose.

That difference matters.

Because once something can be decomposed:

  • it can be compared
  • it can be indexed
  • it can be searched
  • it can be studied at scale

What This Enables

The immediate use case is clear:

  • finding structurally similar symbols
  • clustering visual patterns
  • exploring variation without predefined labels

Beyond that, the same approach can extend to:

  • text
  • video
  • other forms of human expression

The system does not depend on the medium.

It depends on structure.


Where This Leaves Us

We did not set out to build a general system.

We set out to understand one symbol.

That constraint kept the problem grounded.

It also revealed something broader:

the method scales beyond the original question.


The Final Position

We are not claiming to have explained symbolic meaning.

We are claiming something simpler:

symbolic structure can be extracted, represented, and compared in a consistent way.

That is enough to move forward.


The Closing Line

We started with a symbol.

Something fixed. Something already interpreted.

We expected to explain it.

Instead, we decomposed it.

And once it could be decomposed:

  • it could be compared
  • it could be indexed
  • it could be searched

The Eye of Providence was not the answer.

It was the first example.

The system is what we were actually building.


The Eye is not watching. It is the thing that made us start seeing structure.

The Recursive Observation

There is one final detail worth noting.

The symbol we chose to analyze is itself a representation of observation.

An eye, positioned above a structure, emitting rays.

A system that looks at a system.

That is exactly what this pipeline does.

It observes structure from above, decomposes it into components, and maps relationships between them.

In that sense, the Eye of Providence was not just a test case.

It was a blueprint.

We did not just analyze the symbol.

We built the mechanism it represents.


10. Methods

This section formalizes the system described in the post. The goal is to define the pipeline in precise, reproducible terms.


10.1 Problem Definition

Given an input \( x \) (image, text, video, or audio), we aim to compute a structured representation:

$$ x \rightarrow S(x) $$

Where:

  • \( S(x) \) is a symbolic signature
  • composed of primitives \( p_i \in P \)
  • each associated with a score \( s_i \in [0,1] \)
$$ S(x) = \{ (p_i, s_i) \} $$

The objective is not classification, but decomposition into reusable structural components.


10.2 Primitive Space

Let \( P \) be a finite set of primitives:

$$ P = \{ \text{eye}, \text{triangle}, \text{rays}, \text{pyramid}, \text{above}, \dots \} $$

Each primitive is defined by:

  • a symbolic identity
  • a detection interface
  • optional aliases and configuration

Primitives are:

  • reusable
  • composable
  • non-exclusive

10.3 Detection Function

We define a detection function:

$$ D(x, p_i) \rightarrow s_i $$

Where:

  • \( x \) is the input
  • \( p_i \) is a primitive
  • \( s_i \) is the confidence score

In the prototype, detection is hybrid:

$$ s_i = w_{cv2} \cdot s_i^{cv2} + w_{clip} \cdot s_i^{clip} $$

Where:

  • \( s_i^{cv2} \): structural score (edges, contours, geometry)
  • \( s_i^{clip} \): semantic score (vision-language similarity)

10.4 Structural Representation

The primitive scores form a vector:

$$ \mathbf{s} = [s_1, s_2, \dots, s_n] $$

This vector is interpreted as a distribution over structure, not a label.


10.5 Graph Construction

A symbolic graph \( G(x) \) is constructed:

$$ G(x) = (V, E) $$

Where:

  • \( V = \{p_i\} \) are primitives
  • \( E \subseteq V \times R \times V \) are relations

Example relations:

  • contains
  • emits
  • above

Edges are added based on thresholded scores and spatial constraints.


10.6 Intermediate Representation

A projection function maps primitives to tokens:

$$ \phi: P \rightarrow T $$

Where \( T \) is a set of symbolic tokens (emoji in the prototype).

$$ S_{IR}(x) = \{ \phi(p_i) \mid s_i > \tau \} $$

This produces a discrete, comparable representation.


10.7 Similarity Metric

Similarity between two inputs is computed over primitives or graph structure.

Primitive Overlap (Jaccard-style)

$$ \text{sim}(A, B) = \frac{|P_A \cap P_B|}{|P_A \cup P_B|} $$

Graph Similarity

$$ \text{sim}(G_A, G_B) = \frac{\sum \min(w_A(e), w_B(e))}{|E_A \cup E_B|} $$

This captures relational similarity, not just presence.


10.8 ZeroModel Encoding

Primitive score vectors are aggregated into a matrix:

$$ M \in \mathbb{R}^{N \times |P|} $$

Where:

  • \( N \) = number of inputs
  • columns = primitives

The matrix is encoded as an image:

$$ M \rightarrow I_{png} $$

Where:

  • pixel intensity represents score magnitude
  • spatial layout preserves structure

This creates a symbolic surface.


10.9 Search via Navigation

Given a query \( q \):

  1. compute \( S(q) \)
  2. encode into surface space
  3. locate region of highest similarity

Search becomes:

$$ q \rightarrow \text{position on surface} \rightarrow \text{local traversal} $$

This replaces global comparison with localized navigation.


10.10 Summary

The full pipeline can be expressed as:

$$ x \rightarrow D(x) \rightarrow S(x) \rightarrow G(x) \rightarrow IR(x) \rightarrow Z(x) $$

Where:

  • \( D \): detection
  • \( S \): symbolic signature
  • \( G \): graph
  • \( IR \): intermediate representation
  • \( Z \): ZeroModel encoding

The system transforms raw input into structured, comparable, and navigable symbolic representations.


11. Limitations and Future Work

The system described in this post is a working prototype. While it demonstrates the feasibility of symbolic decomposition and structural comparison, it has important limitations.


11.1 Limited Primitive Vocabulary

The current system operates on a small, manually defined set of primitives.

Limitations:

  • incomplete coverage of symbolic space
  • bias toward predefined structures
  • limited generalization

Future Work:

  • automatic primitive discovery from data
  • hierarchical primitive taxonomies
  • dynamic expansion of symbolic vocabulary

11.2 Detection Noise and Overlap

The system intentionally preserves overlapping detections.

Limitations:

  • reduced precision
  • ambiguous outputs
  • sensitivity to shared structural features

While this behavior reflects compositional structure, it can make downstream interpretation difficult.

Future Work:

  • probabilistic modeling of primitive interactions
  • confidence calibration
  • context-aware disambiguation

11.3 Dependence on Heuristics

Graph construction and thresholds are currently rule-based.

Limitations:

  • brittle under distribution shift
  • manually tuned thresholds
  • limited adaptability

Future Work:

  • learned relation inference
  • adaptive thresholding
  • integration with graph neural networks

11.4 Limited Evaluation Scale

The prototype is evaluated on a small dataset.

Limitations:

  • lack of statistical validation
  • no large-scale benchmarking
  • unclear generalization across domains

Future Work:

  • evaluation on large symbolic datasets
  • cross-cultural symbol analysis
  • longitudinal studies of symbolic evolution

11.5 Cross-Modal Alignment (Early Stage)

The system proposes cross-modal mapping (image, text, audio), but implementation is limited.

Limitations:

  • primitives not yet standardized across modalities
  • alignment between modalities not validated
  • semantic consistency not guaranteed

Future Work:

  • unified primitive ontology across modalities
  • cross-modal training signals
  • shared symbolic embedding spaces

11.6 ZeroModel Constraints

ZeroModel provides a compact representation, but introduces trade-offs.

Limitations:

  • resolution vs precision trade-off
  • information loss during encoding
  • dependence on primitive quality

Future Work:

  • multi-resolution symbolic surfaces
  • adaptive encoding strategies
  • integration with learned representations

11.7 Interpretation Boundary

The system explicitly avoids assigning meaning.

This is a strength, but also a limitation.

Limitations:

  • no semantic grounding
  • no causal explanation
  • no interpretive layer

Future Work:

  • layered systems combining structure + interpretation
  • integration with language models for explanation
  • human-in-the-loop symbolic analysis

11.8 The Open Question

The system can detect and compare symbolic structure.

It cannot yet explain:

why certain structures persist across cultures and time.

This remains the central open problem.


11.9 Long-Term Direction

The long-term vision is to move from:

  • detecting symbolic structure

to:

  • modeling symbolic systems

This includes:

  • tracking symbolic evolution over time
  • measuring structural convergence across cultures
  • identifying emergent symbolic patterns

11.10 Closing Perspective

This work does not solve symbolic understanding.

It establishes a foundation.

symbols can be decomposed
structure can be measured
similarity can be computed

What comes next is to build on that foundation.


Appendix A: Core Code Architecture

The full prototype contains substantially more code than is useful to reproduce in the body of the post. This appendix extracts the smallest set of components needed to understand how the system works in practice: primitive definition, detector abstraction, graph construction, similarity scoring, and end-to-end interpretation.

The goal here is not to reproduce the entire repository. It is to make the symbolic pipeline readable.


A.1 Primitive Definition

The system begins with a simple abstraction: a primitive is a named symbolic unit with an emoji mapping, a type, and a set of aliases that detectors can use.

from dataclasses import dataclass, field
from typing import List, Dict

@dataclass(frozen=True)
class PrimitiveSpec:
    name: str
    emoji: str
    kind: str  # "shape", "object", "relation", "signal"
    aliases: List[str] = field(default_factory=list)
    config: Dict[str, float] = field(default_factory=dict)

This is the foundation of the whole design. Once a primitive is defined, it becomes available to every detector and every later stage of the pipeline.


A.2 Example Primitive Definitions

Each primitive lives in its own file. That allows the vocabulary to grow without changing the rest of the system.

from core.types import PrimitiveSpec

PRIMITIVE = PrimitiveSpec(
    name="eye",
    emoji="👁️",
    kind="object",
    aliases=["eye", "all seeing eye", "oval eye"],
)
from core.types import PrimitiveSpec

PRIMITIVE = PrimitiveSpec(
    name="triangle",
    emoji="🔺",
    kind="shape",
    aliases=[
        "a triangle shape",
        "a triangular structure",
        "a pyramid triangle"
    ]
)
from core.types import PrimitiveSpec

PRIMITIVE = PrimitiveSpec(
    name="rays",
    emoji="✨",
    kind="signal",
    aliases=["rays", "light rays", "radiance"],
)

The important idea is not the emoji themselves. The important idea is that each primitive has a stable, inspectable identity.


A.3 Detector Interface

Detection is separated from primitive definition. A detector is simply a module that scores the presence of primitives in an input image.

from abc import ABC, abstractmethod
from typing import Dict, Sequence
import numpy as np

class PrimitiveDetector(ABC):
    name: str

    @abstractmethod
    def detect(
        self,
        image_rgb: np.ndarray,
        primitives: Sequence[object],
    ) -> Dict[str, float]:
        raise NotImplementedError

This separation is important. It allows the symbolic language to remain stable while the perception layer improves.


A.4 Hybrid Detection

The prototype uses a hybrid detector that combines classical CV-based scores with CLIP-based semantic scores. Different primitives can weight those sources differently.

class HybridPrimitiveDetector(PrimitiveDetector):
    name = "hybrid"

    def __init__(self, cv2_detector, clip_detector, weights=None):
        self.cv2 = cv2_detector
        self.clip = clip_detector

        self.weights = weights or {
            "eye": (0.5, 0.5),
            "triangle": (0.7, 0.3),
            "rays": (0.4, 0.6),
            "pyramid": (0.7, 0.3),
            "above": (1.0, 0.0),
        }

    def detect(self, image_rgb, primitives, debug_prefix=None):
        cv2_scores = self.cv2.detect(image_rgb, primitives, debug_prefix=debug_prefix)
        clip_scores = self.clip.detect(image_rgb, primitives, debug_prefix=debug_prefix)

        final = {}

        for p in primitives:
            w_cv2, w_clip = self.weights.get(p.name, (0.5, 0.5))
            final[p.name] = (
                w_cv2 * cv2_scores.get(p.name, 0.0) +
                w_clip * clip_scores.get(p.name, 0.0)
            )

        return final

This is one of the key architectural decisions in the system. The symbolic reasoning layer does not depend on a single detector. It depends only on receiving comparable primitive scores.


A.5 Graph Construction

Once primitives have been scored, the system converts them into a symbolic graph. This is where the representation stops being a flat list of features and becomes a structured object.

from typing import Dict
from core.graph import SymbolGraph

def build_graph(scores: Dict[str, float]) -> SymbolGraph:
    g = SymbolGraph()

    for k, v in scores.items():
        g.add_node(k, v)

    if scores.get("eye", 0) > 0.2 and scores.get("triangle", 0) > 0.1:
        g.add_edge("triangle", "contains", "eye", min(scores["eye"], scores["triangle"]))

    if scores.get("eye", 0) > 0.2 and scores.get("rays", 0) > 0.1:
        g.add_edge("eye", "emits", "rays", min(scores["eye"], scores["rays"]))

    if (
        scores.get("eye", 0) > 0.2
        and scores.get("pyramid", 0) > 0.05
        and scores.get("above", 0) > 0.5
    ):
        g.add_edge(
            "eye",
            "above",
            "pyramid",
            min(scores["eye"], scores["pyramid"], scores["above"]),
        )

    return g

This graph construction step is where symbolic structure becomes explicit. It is no longer just “eye present” or “triangle present.” It becomes “triangle contains eye” or “eye above pyramid.”


A.6 Similarity and Invariants

Similarity is computed over graph edges rather than raw labels. Invariants are extracted by finding edges that recur across many graphs.

def graph_similarity(g1, g2):
    w1 = {f"{e.src}:{e.rel}:{e.dst}": e.weight for e in g1.edges}
    w2 = {f"{e.src}:{e.rel}:{e.dst}": e.weight for e in g2.edges}

    keys = set(w1) | set(w2)

    if not keys:
        return 1.0

    score = 0.0
    for k in keys:
        score += min(w1.get(k, 0.0), w2.get(k, 0.0))

    return score / len(keys)
from collections import Counter

def extract_invariants(graphs, min_support=0.6):
    counter = Counter()

    for g in graphs:
        edges = set(f"{e.src}:{e.rel}:{e.dst}" for e in g.edges)
        for e in edges:
            counter[e] += 1

    total = len(graphs)

    return {
        edge for edge, count in counter.items()
        if (count / total) >= min_support
    }

These two functions are central to the system’s behavior. They allow images to be compared by structure and recurring symbolic relations to be extracted across a dataset.


A.7 End-to-End Interpretation

The full interpretation path can be summarized in a short function: load the image, detect primitives, build the graph, and project the result into emoji.

def interpret_image(path: str, detector):
    image = load_image(path)

    primitives = load_primitive_specs()
    scores = detector.detect(image, primitives)

    graph = build_graph(scores)

    selected = [p.name for p in primitives if scores.get(p.name, 0) > 0.2]
    emojis = [p.emoji for p in primitives if p.name in selected]

    return {
        "path": path,
        "scores": scores,
        "selected": selected,
        "emojis": emojis,
        "edges": [(e.src, e.rel, e.dst) for e in graph.edges],
    }

This is the shortest useful summary of the entire prototype:

  • primitives are loaded from the registry
  • a detector scores them
  • scores become a symbolic graph
  • the graph is projected into a compact intermediate representation

That is the full pipeline in miniature.


A.8 Closing Note

These excerpts are enough to explain the core architecture of the prototype. The full implementation includes additional computer vision logic, CLIP prompt handling, evaluation scaffolding, debug image generation, and a larger primitive vocabulary. None of that changes the basic structure shown here.

The essential point is simple:

a symbol is decomposed into primitives, primitives become scores, scores become structure, and structure becomes searchable.


Appendix B: From Pixels to Primitives (Detection Pipeline)

Appendix A described the architecture of the system. This appendix focuses on how symbolic signals are extracted from raw images.

The goal is simple: take an image and produce a set of primitive scores that can be used to build a symbolic representation.


B.1 Two Sources of Signal

The prototype combines two complementary approaches:

  • Structural detection (CV2) extracts shapes, edges, and spatial relationships
  • Semantic detection (CLIP) measures similarity between the image and textual concepts

These two signals are combined to produce the final primitive scores.


B.2 Structural Detection (CV2 Pipeline)

The CV2 pipeline extracts geometric and spatial features directly from the image.

The process can be visualized as a sequence of transformations:

Step Description
Input Original RGB image
Grayscale Normalized intensity image
Edges Edge map via Canny detection
Contours Detected shapes and structures

In the implementation, each step is saved as a debug image:

  • *_01_gray.png
  • *_02_edges.png
  • *_03_contours.png

This allows the reader (and the developer) to see exactly what the system is responding to.


Pipeline Overview

gray = cv2.cvtColor(image_rgb, cv2.COLOR_RGB2GRAY)
edges = cv2.Canny(gray, low, high)
contours, _ = cv2.findContours(edges, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

Contours are filtered and approximated to detect structural primitives.


Shape Detection

Triangles and ellipses are detected using contour approximation and ellipse fitting:

approx = cv2.approxPolyDP(cnt, 0.04 * peri, True)

if 3 <= len(approx) <= 6:
    triangle_score = max(triangle_score, strength)

if len(cnt) >= 5:
    ellipse = cv2.fitEllipse(cnt)
    ellipse_score = max(ellipse_score, strength)

These signals contribute to primitives such as:

  • triangle → polygon detection
  • eye → ellipse detection
  • pyramid → lower-region mass
  • rays → edge density + radial dispersion

Spatial Relationships

The system also extracts positional relationships:

  • vertical ordering (eye above pyramid)
  • symmetry across the vertical axis
  • radial patterns around a central point

These signals produce relational primitives such as:

  • above
  • emits (via rays)

B.3 Semantic Detection (CLIP)

Where CV2 detects structure, CLIP detects meaning.

Each primitive is represented as one or more text prompts. The image is embedded using a pretrained vision-language model, and similarity is computed between the image and each prompt.

image_features = self.model.encode_image(image_tensor)
image_features /= image_features.norm(dim=-1, keepdim=True)

text_features = self.model.encode_text(text_tokens)
text_features /= text_features.norm(dim=-1, keepdim=True)

similarity = (image_features @ text_features.T).squeeze(0)
score = similarity.max().item()

This produces a score for each primitive based on semantic similarity.


What CLIP Adds

CLIP detects primitives that are difficult to capture geometrically:

  • eagle → object recognition
  • sun → semantic identity vs shape
  • halo → contextual light structures

This complements the CV2 pipeline, which is strong on geometry but limited in semantic interpretation.


B.4 Hybrid Scoring

The final primitive score is a weighted combination of CV2 and CLIP outputs.

final[p.name] = (
    w_cv2 * cv2_scores.get(p.name, 0.0) +
    w_clip * clip_scores.get(p.name, 0.0)
)

Different primitives rely on different signal sources:

Primitive Signal Source
above CV2 only (purely spatial)
triangle mostly CV2
rays hybrid
eye hybrid (shape + semantic)
eagle mostly CLIP

This allows the system to remain interpretable while still benefiting from learned representations.


B.5 Failure Modes and Overlap

The system does not produce clean, single-label outputs. Instead, it produces overlapping symbolic signals.

Examples from evaluation:

  • A sun may trigger sun, star, and rays
  • A star may trigger star, sun, and eye
  • Background images may produce weak eye or rays signals

This is expected.

Symbols are not mutually exclusive. They share structural and semantic features.

Rather than forcing a single label, the system produces a symbolic signature a set of overlapping primitives with associated scores.


B.6 Why This Works

The detection pipeline does not attempt to classify images.

It extracts a distributed representation of symbolic structure:

  • primitives are detected independently
  • signals are combined rather than resolved
  • structure emerges from relationships between primitives

This is what enables the rest of the system:

  • graph construction
  • similarity comparison
  • clustering
  • invariant extraction

Without this step, the system would reduce to standard image classification.


B.7 Closing Note

The CV2 and CLIP pipelines are not final. They are interchangeable components.

What matters is the interface they satisfy:

given an image, produce a consistent set of primitive scores

Everything else in the system graphs, similarity, clustering, and indexing depends only on that contract.

Glossary

This glossary defines the key concepts used throughout the post. The goal is not to introduce new ideas, but to make the system precise and reusable.


Primitive

A fundamental symbolic unit detected in an input.

Examples:

  • eye
  • triangle
  • rays
  • pyramid
  • above

Primitives are not meanings. They are structural components inferred from visual or semantic signals.


Primitive Score

A numeric value representing the confidence that a primitive is present in an input.

Example:

eye = 0.91
triangle = 0.88

Scores form a vector representation of structure.


Symbolic Signature

The set of primitives (and optionally scores) that describe an input.

Example:

[eye, triangle, rays, pyramid, above]

This replaces traditional classification with a compositional representation.


Intermediate Representation (IR)

A shared encoding used to represent primitives in a comparable form.

In this prototype:

  • emoji act as symbolic tokens

Example:

👁️ 🔺 ✨ 🏛️ ⬆️

The IR is:

  • human-readable
  • machine-comparable
  • modality-agnostic

Symbolic Graph

A structured representation of relationships between primitives.

Example:

triangle → contains → eye
eye → emits → rays
eye → above → pyramid

Graphs capture structure beyond presence.


Symbolic Overlap

The degree to which two symbolic signatures share primitives.

Example:


A: 👁️ 🔺 ✨ 🏛️ ⬆️
B: 👁️ ✨ ⬆️

Overlap = 3 / 5

This defines similarity without requiring identical representations.


Hybrid Detection

A detection approach combining:

  • CV2 (structural features) → shapes, edges, geometry
  • CLIP (semantic features) → learned visual-language similarity

Final scores are a weighted combination of both.


ZeroModel

A representation method that encodes primitive score matrices into images (e.g., PNG).

Properties:

  • no learned weights
  • fully inspectable
  • spatially organized

It converts data into a searchable surface rather than a list.


Symbolic Surface

The output of ZeroModel.

A 2D spatial structure where:

  • rows = items
  • columns = primitives
  • intensity = score

This enables navigation instead of scanning.


Structural Similarity

A measure of similarity based on shared primitives or graph structure.

Unlike visual similarity:

  • ignores surface appearance
  • focuses on composition

Invariant

A recurring structural relationship observed across multiple symbolic graphs.

Example:

triangle → contains → eye

Invariants represent stable symbolic patterns.


Symbolic Density (Conceptual)

A proposed measure of how frequently symbolic primitives and patterns appear within a dataset or time period.

Higher density may indicate:

  • cultural convergence
  • increased symbolic communication
  • structural reuse of ideas

Structure Engine

The system described in this post.

Unlike traditional models:

  • it does not interpret meaning
  • it extracts and compares structure

Decomposition

The process of breaking an input into primitives.

This is the foundational operation of the system.


Representation

The encoding of primitives into a shared form (scores + emoji).


Instead of comparing all items:

  • the system moves through a structured space

Search becomes:

locating regions of similar structure


References

Primary Sources

  1. The Great Seal of the United States
    https://www.greatseal.com/ Historical documentation of the Eye of Providence and its symbolic components.

  2. Eye of Providence (Wikipedia)
    Eye of Providence General overview and historical context.


System and Architecture

  1. ZeroModel: Visual AI You Can Scrutinize ZeroModel Describes the ZeroModel encoding approach used to convert symbolic scores into spatial representations.

Technical Foundations

  1. OpenCV (Computer Vision Library) Used for structural feature extraction (edges, contours, geometry).

  2. CLIP: Learning Transferable Visual Models From Natural Language Supervision
    https://arxiv.org/abs/2103.00020 Vision-language model used for semantic primitive detection.


  1. Compositional Representation in AI
    1902.09738 Research on representing complex structures as combinations of simpler components.

  2. Graph-Based Representation Learning
    1812.08434 Foundations for representing structured relationships between entities.


Interpretability and Structure

  1. The Mythos of Model Interpretability (Contextual)
    https://distill.pub/ Background on making machine learning systems understandable and inspectable.