An architecture for grounded memory
Bernard is the long-term research direction that our published work is building toward: an embodied cognitive system whose intelligence emerges from lived experience rather than static training.
The name is a placeholder for the architecture, not a product. What follows is a description of where the research programme is headed and what we've validated so far.
Current AI systems have no memory in any meaningful sense. A language model processes each conversation from scratch. A retrieval system finds documents by surface similarity. An agent executes tool calls without remembering what worked last time.
The missing ingredient isn't more parameters or longer context windows. It's experience — the accumulated structure that emerges from an agent actually doing things in the world over time. A human mechanic doesn't diagnose an engine by searching a manual. They hear a sound and recall the last time they heard something similar, in a different car, three years ago, and what turned out to be wrong. That recall is specific (that car, that day), associative (linked by experiential co-occurrence, not surface similarity), and grounded in physical interaction.
No existing architecture produces this kind of memory. We're building one that does.
The core of the architecture is two complementary JEPA-style predictors operating over a shared embedding space. Both make predictions in latent space rather than pixel space, following the Joint-Embedding Predictive Architecture framework. They differ in what they predict.
Faces forward into real time. Given the current state, it predicts the next state. This is a world model — it learns how things generally behave, what objects afford, how spatial layouts constrain movement. Over time, it builds what amounts to semantic memory: general knowledge about regularities in the world. Things that share functional properties cluster together in its embedding space. Drills and impact drivers end up nearby. Stairs and ladders end up nearby. This is similarity structure, and it's what current AI does well.
Faces sideways into memory. Given the current state, it predicts which past states are experientially reachable — which stored experiences are associatively linked to this moment, across the agent's entire history. The training signal is temporal co-occurrence: states that were experienced within the same temporal window become associated, regardless of whether they're representationally similar.
Same architecture, same prediction-in-latent-space mechanism, different target domain. The Outward predictor learns what things are like each other. The Inward predictor learns what things were experienced together. Both are forms of memory, built through different mechanisms.
The interesting behaviour emerges from their interaction.
The Outward predictor alone gives you recognition without context: "that looks like a drill." The Inward predictor alone gives you priming without content: "something about this moment reminds me of Tuesday." Neither is specific.
When both converge on the same target, you get episodic recall: "that's the drill that stripped the screw on Tuesday." The search space is divided by intersection. Where similarity retrieval returns all drills, and association retrieval returns everything from Tuesday, their intersection returns that specific drill on that specific occasion.
This is an architectural prediction, not yet an empirical result. Our published work validates the Inward channel in isolation — showing that temporal co-occurrence alone produces faithful associative recall across representational boundaries. The dual-channel interaction is the next experimental milestone.
Biological memory consolidation happens during sleep. The hippocampus replays recent experiences, and through repeated replay, specific episodic memories gradually transfer into generalised knowledge in the neocortex. This is the complementary learning systems framework: fast episodic encoding during waking, slow statistical extraction during sleep.
Bernard implements this as an explicit computational cycle:
The agent experiences the world. The Outward predictor processes incoming sensory data. The Inward predictor encodes new associations. Raw episodic traces accumulate in a memory buffer.
Experience replay. The system re-processes stored episodes, and the compression mechanism extracts recurring patterns. Specific associations ("the red mug was next to the keyboard this morning") either consolidate into lasting memory or decay. Patterns that recur across many episodes ("mugs are usually near keyboards in offices") transfer into the world model as general knowledge.
The concept discovery paper provides early evidence that this compression mechanism works. When PAM is trained on 10,000 novels, the compression from 25 million raw text chunks to k=100 cluster centroids forces the model to discover recurring narrative structures — the "concepts" that appear across thousands of different books. The same mechanism, applied to an agent's lived experience rather than text, would extract the functional regularities of the agent's world.
The research programme is building from foundations upward. Here's what's established and what remains open.
Temporal co-occurrence produces faithful associative recall across representational boundaries. The predictor returns a true associate 97% of the time where similarity scores zero.
Association and similarity produce different rankings, and association captures relationships that similarity systematically misses. This holds in both text and biological domains.
The compression mechanism discovers meaningful hierarchical structure without supervision. Clusters correspond to recognisable narrative functions and generalise to unseen material.
Contrastive learning on co-occurrence is PAM adapted to fixed, replayable timelines. Multi-epoch training is analogous to hippocampal sleep replay.
Dual-channel (Outward + Inward) interaction and whether specificity emerges from intersection as predicted.
Creative bridging across episodes via persistent entities — theoretically grounded but requires entity persistence that current benchmarks don't provide.
The sleep cycle as a computational mechanism — the concept discovery results suggest the compression works, but the full wake/sleep loop hasn't been implemented.
Scaling to continuous multimodal experience streams rather than pre-chunked text.
Affective and salience signals modulating association strength.
The dominant approach to AI memory is retrieval-augmented generation: embed everything, search by similarity, paste into context. This works for finding documents that resemble a query. It doesn't work for the kind of memory that makes intelligence useful in the real world — specific, experiential, associative, grounded in what actually happened.
An AI system that remembers what it experienced, from the perspectives at which it experienced it, and can recall those experiences based on structural association rather than surface similarity, would be qualitatively different from current systems. Not because it has more knowledge, but because its knowledge is organised by the structure of lived experience rather than the geometry of embedding space.
That's what we're building toward. The foundations are in place. The open questions are empirical, not conceptual — we know what to build and what to test. The answers will come from experiments, not from scaling.
Bernard is an active research direction at Eridos AI. The published work (PAM, AAR, concept discovery) establishes the foundations. If you're interested in the programme — as a researcher, collaborator, or funder — reach out at jason@eridos.ai.