Memory SystemRelevance Gating

Relevance Gating

Relevance gating is how Chorum decides which memories to inject into each conversation—balancing context richness against speed and cost.

This is the Tier 3 system used for large-context models. For small and medium models, Chorum uses Tiered Context with pre-compiled memory summaries.

Why This Matters

Without intelligent gating, you’d face two bad options:

  1. Inject everything — Slow, expensive, and the AI ignores irrelevant noise
  2. Inject nothing — Fast but the AI lacks context and hallucinates

Chorum injects exactly the memory that makes this response better, and nothing more.


The Problem It Solves

Too Much ContextToo Little Context
Token costs explode ($0.50+ per message)AI lacks context, makes mistakes
Latency increases (more tokens = slower)You repeat yourself constantly
Signal drowns in noise”I told you this already” frustration

The goal: Match memory injection depth to query complexity.


How It Works

User Message

┌─────────────────────────────────────┐
│      Query Classification           │  ← Fast (local, <50ms)
│  (complexity, intent, domain)       │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│      Token Budget Assignment        │  ← Based on classification
│  (500 → 2K → 5K → 8K tokens)        │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│      Relevance Scoring              │  ← Embedding similarity + recency + type
│  (score each memory item 0-1)       │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│      Memory Selection               │  ← Greedy fill within budget
│  (highest relevance first)          │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│      Context Assembly               │  ← Format for injection
│  (structured, not raw dump)         │
└─────────────────────────────────────┘

Prompt sent to LLM

Step 1: Query Classification

Chorum classifies your query in under 50ms using local heuristics (no LLM call required).

Complexity Levels

ComplexityCharacteristicsExample
TrivialGreetings, thanks, short affirmations”hi”, “thanks!”
SimpleQuick questions, single concepts”What port does this run on?”
ModerateStandard development tasks”Write a function to validate email”
ComplexDebugging, multi-file work, analysis”Why is this test failing?”, “Debug this error”
DeepArchitecture, analysis”Review this system design”

Query Intent

Chorum also classifies the intent of your query to adjust scoring weights:

IntentWhen DetectedScoring Adjustments
Question”what”, “how”, ”?”Balanced weights (default)
Generation”write”, “create”, “implement”Patterns +2.0x, Golden Paths +1.5x
Analysis”why”, “analyze”, “explain”Decisions +2.0x
Debugging”debug”, “fix”, “error”, “bug”, “trace”, “stack”, “exception”, “breakpoint”, etc.Recency +35%, Antipatterns +2.0x
DiscussionGeneral conversationBalanced weights
ContinuationFollow-up in long threadRecency +30%

Classification Signals

SignalWhat It Indicates
Message < 50 charsLikely trivial or simple
Contains code blocksNeeds accuracy, likely complex
References “we”, “our”, “before”History-dependent, needs memory
Conversation turn > 5Deep context already built
Technical jargon densityDomain-specific, needs patterns

Step 2: Token Budget Assignment

Each complexity level gets a memory token budget:

ComplexityMemory BudgetTypical Use Case
Trivial0 tokensSkip memory entirely
Simple500 tokensQuick factual questions
Moderate2,000 tokensStandard code generation
Complex5,000 tokensDebugging, multi-file work
Deep8,000 tokensArchitecture discussions

Budget Modifiers

The base budget is adjusted by:

  • +50% if you reference history (“as we discussed”)
  • +25% for long conversations (turn > 10)
  • -50% if you’ve enabled “prioritize speed” preference
  • Hard ceiling: 10,000 tokens — beyond this, context becomes noise

Step 3: Relevance Scoring

Every learning in your project memory is scored from 0.0 to 1.0.

Scoring Formula

Final Score =
    (Semantic Similarity × W_semantic) +
    (Recency Score × W_recency) +
    (Domain Match × W_domain) +
    (Usage Frequency × W_usage) +
    (Co-occurrence Bonus) +
    (Type Boost × intent multiplier)

Base weights come from the intent profile (e.g., “question” uses semantic=0.55, recency=0.10) and are then dynamically shifted based on conversation context:

Context SignalWeight ShiftRationale
Deep conversation (>10 turns)Recency +0.10, Semantic -0.10Recent context matters more as conversation evolves
Code context presentDomain +0.08, Usage +0.02, Semantic -0.10Domain matching is critical when code is present
History references (“we discussed…”)Semantic +0.10, Recency -0.05, Domain -0.05Past conversations need strong semantic matching

Weights are clamped to [0, 1] and re-normalized to sum to 1.0 after shifting.

Components

Semantic Similarity (~50%) How closely does this learning match the meaning of your query? Uses vector embeddings for comparison.

Recency Score (~15%) More recent learnings score higher. Uses per-type decay curves (see Tiered Context).

Domain Match (~15%)

Proportional scoring based on domain overlap. If your query mentions “database” and “security”, a learning tagged with both scores higher than one tagged with just “database”:

domainScore = 0.2 × (overlap / max(queryDomains, learningDomains))

This rewards precision—learnings that match more of your query’s domains rank higher.

Usage Frequency (~5%) Patterns that get referenced frequently are more valuable. Logarithmic curve that plateaus after ~20 uses, capped at +0.15.

Co-occurrence Bonus (up to +0.10) Items that frequently co-occur with high-scoring items in positive-feedback contexts get a bonus. This surfaces knowledge that consistently works well together — e.g., if “use Zod validation” and “early returns” always appear together in successful responses, both get a retrieval boost.

Type Boost (varies by intent)

TypeBase BoostModified by Intent
Invariant+0.25Stable across intents
Golden Path+0.15+1.5x for debugging, generation
Pattern+0.10+2.0x for generation
Decision+0.10+2.0x for analysis, 0.5x for debugging
Antipattern+0.05+2.0x for debugging

Invariants get the highest boost because they prevent mistakes.


Step 4: Memory Selection

With scores calculated, Chorum selects memories to fill the token budget:

  1. Sort all memories by score (highest first)
  2. Apply intent-adaptive thresholds — skip items below the minimum score for this intent
  3. Add highest-scoring items until budget is reached

Intent-Adaptive Thresholds

Different intents use different noise thresholds. Debugging casts a wider net to catch relevant antipatterns and golden paths, while generation demands precision:

IntentGeneral ThresholdInvariant Threshold
Debugging0.250.15
Continuation0.300.18
Question / Analysis / Discussion0.350.20
Generation0.400.20
Greeting0.500.30

Invariants always have a lower threshold than general items because they prevent mistakes.


Step 5: Context Assembly

Selected memories are formatted for clean injection:

<chorum_context>
## Active Invariants
- Always use Zod for runtime validation (learned: Jan 15)
- Never store secrets in environment variables without encryption
 
## Relevant Patterns
- This project uses the repository pattern for data access
- Error handling follows the Result<T, E> pattern
 
## Recent Decisions
- Chose PostgreSQL over SQLite for multi-user support (Jan 20)
 
## Project Facts
- Tech stack: Next.js, TypeScript, Drizzle ORM
</chorum_context>

Why structured format:

  • Clear sections help the model parse relevance
  • Labeled dates help the model weight recency
  • Consistent format allows the model to learn the pattern

Performance Targets

MetricTarget
Query classification< 50ms
Embedding generation< 100ms
Relevance scoring< 50ms
Memory selection< 10ms
Context assembly< 10ms
Total overhead< 220ms

Simple queries stay fast. Complex queries get rich context.


User Controls

The Conductor gives you several ways to influence relevance gating without touching the scoring math directly.

Memory Depth (Conductor Lens)

A project-level setting that shifts the scoring thresholds and token budget:

DepthBudget EffectThreshold EffectBest For
Light0.7x budget+0.10 (stricter)Established projects, speed priority
Normal (default)No changeNo changeMost projects
Rich1.3x budget-0.10 (looser)New projects, complex domains

Set this in Settings > Memory & Learning.

Pin & Mute

Direct steering of individual items:

  • Pinned items bypass the relevance threshold entirely — they’re always injected (budget permitting)
  • Muted items are filtered out before scoring — they never appear in context

Focus Areas

Project-level domain tags that give a permanent +0.05 bonus to matching items, even when the current query doesn’t mention that domain.

Per-Query Adaptation

The Conductor adapts automatically based on your query:

  • “Quick question: what’s the port?” → Lower complexity, smaller budget
  • “Help me understand the full auth flow” → Higher complexity, larger budget, lower thresholds

See The Conductor for the complete guide to all controls.


Why Invariants Get Priority

Invariants are special because violating them causes real problems:

  • They protect against security issues
  • They enforce team standards
  • They prevent known bugs

A relevant invariant is like a senior engineer tapping you on the shoulder: “Hey, don’t forget about this rule.”

Invariants get the highest type boost (+0.25), the lowest score thresholds, and pinned invariants bypass scoring entirely.