Memory SystemRelevance Gating

Relevance Gating

Relevance gating is how Chorum decides which memories to inject into each conversation—balancing context richness against speed and cost.

Why This Matters

Without intelligent gating, you’d face two bad options:

  1. Inject everything — Slow, expensive, and the AI ignores irrelevant noise
  2. Inject nothing — Fast but the AI lacks context and hallucinates

Chorum injects exactly the memory that makes this response better, and nothing more.


The Problem It Solves

Too Much ContextToo Little Context
Token costs explode ($0.50+ per message)AI lacks context, makes mistakes
Latency increases (more tokens = slower)You repeat yourself constantly
Signal drowns in noise”I told you this already” frustration

The goal: Match memory injection depth to query complexity.


How It Works

User Message

┌─────────────────────────────────────┐
│      Query Classification           │  ← Fast (local, <50ms)
│  (complexity, intent, domain)       │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│      Token Budget Assignment        │  ← Based on classification
│  (500 → 2K → 5K → 8K tokens)        │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│      Relevance Scoring              │  ← Embedding similarity + recency + type
│  (score each memory item 0-1)       │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│      Memory Selection               │  ← Greedy fill within budget
│  (highest relevance first)          │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│      Context Assembly               │  ← Format for injection
│  (structured, not raw dump)         │
└─────────────────────────────────────┘

Prompt sent to LLM

Step 1: Query Classification

Chorum classifies your query in under 50ms using local heuristics (no LLM call required).

Complexity Levels

ComplexityCharacteristicsExample
TrivialGreetings, thanks, short affirmations”hi”, “thanks!”
SimpleQuick questions, single concepts”What port does this run on?”
ModerateStandard development tasks”Write a function to validate email”
ComplexDebugging, multi-file work”Why is this test failing?”
DeepArchitecture, analysis”Review this system design”

Classification Signals

SignalWhat It Indicates
Message < 50 charsLikely trivial or simple
Contains code blocksNeeds accuracy, likely complex
References “we”, “our”, “before”History-dependent, needs memory
Conversation turn > 5Deep context already built
Technical jargon densityDomain-specific, needs patterns

Step 2: Token Budget Assignment

Each complexity level gets a memory token budget:

ComplexityMemory BudgetTypical Use Case
Trivial0 tokensSkip memory entirely
Simple500 tokensQuick factual questions
Moderate2,000 tokensStandard code generation
Complex5,000 tokensDebugging, multi-file work
Deep8,000 tokensArchitecture discussions

Budget Modifiers

The base budget is adjusted by:

  • +50% if you reference history (“as we discussed”)
  • +25% for long conversations (turn > 10)
  • -50% if you’ve enabled “prioritize speed” preference
  • Hard ceiling: 10,000 tokens — beyond this, context becomes noise

Step 3: Relevance Scoring

Every learning in your project memory is scored from 0.0 to 1.0.

Scoring Formula

Final Score = 
    (Semantic Similarity × 0.50) +
    (Recency Score × 0.15) +
    (Domain Match × 0.15) +
    (Usage Frequency × 0.05) +
    (Type Boost × varies)

Components

Semantic Similarity (50%) How closely does this learning match the meaning of your query? Uses vector embeddings for comparison.

Recency Score (15%) More recent learnings score higher. Decays exponentially over 30 days:

recencyScore = e^(-daysSince / 30)

Domain Match (15%) If your query is about TypeScript and a learning is tagged “typescript”, it gets a +0.15 boost.

Usage Frequency (5%) Patterns that get referenced frequently are more valuable. Capped at +0.15 after 10 uses.

Type Boost (varies)

TypeBoost
Invariant+0.25
Pattern+0.10
Decision+0.10
Antipattern+0.10

Invariants get the highest boost because they prevent mistakes.


Step 4: Memory Selection

With scores calculated, Chorum selects memories to fill the token budget:

  1. Sort all memories by score (highest first)
  2. Add highest-scoring items until budget is reached
  3. Skip items scoring below 0.30 (noise threshold)
  4. Exception: Always include invariants scoring > 0.70, even if over budget

The 0.30 Threshold

Items below 0.30 relevance are likely noise. Better to inject nothing than confuse the model with unrelated context.


Step 5: Context Assembly

Selected memories are formatted for clean injection:

<chorum_context>
## Active Invariants
- Always use Zod for runtime validation (learned: Jan 15)
- Never store secrets in environment variables without encryption
 
## Relevant Patterns
- This project uses the repository pattern for data access
- Error handling follows the Result<T, E> pattern
 
## Recent Decisions
- Chose PostgreSQL over SQLite for multi-user support (Jan 20)
 
## Project Facts
- Tech stack: Next.js, TypeScript, Drizzle ORM
</chorum_context>

Why structured format:

  • Clear sections help the model parse relevance
  • Labeled dates help the model weight recency
  • Consistent format allows the model to learn the pattern

Performance Targets

MetricTarget
Query classification< 50ms
Embedding generation< 100ms
Relevance scoring< 50ms
Memory selection< 10ms
Context assembly< 10ms
Total overhead< 220ms

Simple queries stay fast. Complex queries get rich context.


User Controls

Context Depth Toggle

You can override automatic gating:

ModeBehavior
Auto (default)Chorum decides based on query complexity
MinimalPrioritize speed, inject less
FullInject everything, ignore budget

Per-Query Override

In conversation, you can hint at depth:

  • “Quick question: what’s the port?” → Minimal context
  • “Help me understand the full auth flow” → Deep context

Why Invariants Get Priority

Invariants are special because violating them causes real problems:

  • They protect against security issues
  • They enforce team standards
  • They prevent known bugs

A relevant invariant is like a senior engineer tapping you on the shoulder: “Hey, don’t forget about this rule.”

Even if an invariant would push you over the token budget, if it scores > 0.70 relevance, it gets injected anyway.