Open Source AI Memory System

HA5H Crystal Clear Memory for AI Agents

Memories crystallize into position based on what they are. Six indexed dimensions. Zero API calls. Fully offline.

SimHash + FTS5 + sentence embeddings. Single SQLite file. Open source.

Drag to rotate the crystal

CRYSTAL MEMORY* SIX INDEXED FACETS* ZERO API CALLS* SINGLE SQLITE FILE* FULLY OFFLINE* OPEN SOURCE MIT* CRYSTAL MEMORY* SIX INDEXED FACETS* ZERO API CALLS* SINGLE SQLITE FILE* FULLY OFFLINE* OPEN SOURCE MIT*

How It Works

Write. Recall. Resume.

01

Crystallize

Feed a memory into the crystal. HA5H computes a 64-bit fingerprint, extracts entities, compresses into an inclusion, and indexes across six facets. 3.8ms. No LLM call. No API key. Your data stays local.

02

Recall

Query the crystal and it lights up everything relevant. FTS5 keyword search and SimHash band lookup run in parallel, then hybrid scoring ranks results. 9ms at 5,000 memories. Every result is the original, unmodified text.

03

Wake up

One call generates ~139 tokens of startup context. Identity line plus your top critical memories. Paste into any agent's system prompt. The crystal remembers so the agent doesn't have to.

The Metaphor

A crystal grows itself.

Every memory you store gets a fingerprint: a 64-bit SimHash computed from the text itself. Similar memories produce similar fingerprints automatically. No one decides where anything goes. No taxonomy. No filing.

The crystal has six faces you can look through. Rotate it: see memories by content similarity. Rotate again: see them by when they were true. Again: by who was involved, by importance, by origin, by meaning. Same crystal, six views. Each facet is independently indexed, so any query finds its answer through whichever face catches the light first.

As memories accumulate, the crystal grows. Similar memories cluster in the lattice. Contradictions are detected and the older fact is retired. Growth rings mark temporal epochs: sprint boundaries, project phases, context compression events. The structure emerges from the data, never imposed on it.

The 5 in HA5H is the five-fold quasicrystalline symmetry, the "impossible" structure Dan Shechtman discovered in 1982. The first five facets use SimHash fingerprinting and keyword indexing. The sixth facet adds semantic embeddings: 384-dimension sentence vectors that catch what keywords can't. "Auth provider" finds "Clerk" because the model understands they mean the same thing. Install with pip install ha5h[embeddings] or leave it off. Five facets still work on their own.

L0
Identity + manifest
~50 tokens
L1
Top-5 salience inclusions
~90 tokens
L2
Facet query results
On demand

The Architecture

Six Indexed Facets

Every memory exists in six-dimensional facet space. Each facet is independently queryable. Click any card to see the implementation.

FACET 01

Content

Semantic fingerprint of what was said

64-bit SimHash captures structural similarity without rewriting a single word. FTS5 handles keyword matches with prefix expansion. Together they find candidates in O(log n).
SimHash 64-bit + FTS5
FACET 02

Temporal

When it was true (validity windows)

Every memory has valid_from and optional valid_to. Invalidated memories stay in the crystal but are hidden from default queries. Growth rings mark epoch boundaries like geological strata.
B-tree range index
FACET 03

Relational

Entity graph connecting memories

People, tools, projects, decisions extracted automatically. Lattice edges connect related memories. Walk the graph to discover connections. Contradictions are detected and the older fact is auto-retired.
Entity table + lazy lattice
FACET 04

Salience

Importance weight (1–5 stars)

Critical decisions float to the top. Wake-up context is built from the highest-salience memories first. Salience is auto-detected from content signals like "decided," "critical," and "must."
Numeric index
FACET 05

Context

Origin: session, project, trigger

Know where every memory came from. Filter by project, session, or capture method: manual entry, auto-save hook, conversation import, or mining.
Tag-based index
FACET 06 New

Semantic

Meaning-level similarity via embeddings

SimHash catches structural similarity. FTS5 catches keywords. Neither catches paraphrases. "Auth provider" won't match "Clerk" without understanding meaning. The sixth facet adds 384-dim sentence embeddings for true semantic recall. Runs on CPU, ~5ms per memory. Optional: install with pip install ha5h[embeddings]. Degrades gracefully when not installed.
sentence-transformers/all-MiniLM-L6-v2 + cosine similarity

Early Benchmarks

Performance

0 ms write*
0 ms recall*
0 tokens wake-up
0 ms startup*
0 MB for 5K

Zero API calls. Zero LLM rewrites. Fully offline. Single SQLite file.

*Speed numbers from local testing at 5,000 memories. Reproducible benchmark script included in the repo. Accuracy benchmarks (LongMemEval) are in progress. We'll publish those numbers when they're real, not before.

Install

Quick Start

$ pip install ha5h
$ ha5h init .
  Crystal initialized at .ha5h

$ ha5h crystallize "Decided to use Clerk for auth" -s 5
  Crystallized [a1b2c3d4] ★★★★★

$ ha5h recall "auth decision"
  Found 1 memories

$ ha5h wake-up
  [L0:IDENTITY] HA5H crystal | 1 active memories
  [L1:CRITICAL] ★★★★★ Decided to use Clerk for auth

Claude Code

MCP Integration

claude mcp add ha5h -- python -m ha5h.mcp.server
ha5h_crystallizeStore a new memory
ha5h_recallSearch across all 6 facets
ha5h_invalidateMark memory as no longer valid
ha5h_lattice_walkTraverse memory connections
ha5h_wake_upGenerate startup context
ha5h_statsCrystal statistics