Relevance Gating
Relevance gating is how Chorum decides which memories to inject into each conversation—balancing context richness against speed and cost.
This is the Tier 3 system used for large-context models. For small and medium models, Chorum uses Tiered Context with pre-compiled memory summaries.
Why This Matters
Without intelligent gating, you’d face two bad options:
- Inject everything — Slow, expensive, and the AI ignores irrelevant noise
- Inject nothing — Fast but the AI lacks context and hallucinates
Chorum injects exactly the memory that makes this response better, and nothing more.
The Problem It Solves
| Too Much Context | Too Little Context |
|---|---|
| Token costs explode ($0.50+ per message) | AI lacks context, makes mistakes |
| Latency increases (more tokens = slower) | You repeat yourself constantly |
| Signal drowns in noise | ”I told you this already” frustration |
The goal: Match memory injection depth to query complexity.
How It Works
User Message
↓
┌─────────────────────────────────────┐
│ Query Classification │ ← Fast (local, <50ms)
│ (complexity, intent, domain) │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Token Budget Assignment │ ← Based on classification
│ (500 → 2K → 5K → 8K tokens) │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Relevance Scoring │ ← Embedding similarity + recency + type
│ (score each memory item 0-1) │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Memory Selection │ ← Greedy fill within budget
│ (highest relevance first) │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Context Assembly │ ← Format for injection
│ (structured, not raw dump) │
└─────────────────────────────────────┘
↓
Prompt sent to LLMStep 1: Query Classification
Chorum classifies your query in under 50ms using local heuristics (no LLM call required).
Complexity Levels
| Complexity | Characteristics | Example |
|---|---|---|
| Trivial | Greetings, thanks, short affirmations | ”hi”, “thanks!” |
| Simple | Quick questions, single concepts | ”What port does this run on?” |
| Moderate | Standard development tasks | ”Write a function to validate email” |
| Complex | Debugging, multi-file work, analysis | ”Why is this test failing?”, “Debug this error” |
| Deep | Architecture, analysis | ”Review this system design” |
Query Intent
Chorum also classifies the intent of your query to adjust scoring weights:
| Intent | When Detected | Scoring Adjustments |
|---|---|---|
| Question | ”what”, “how”, ”?” | Balanced weights (default) |
| Generation | ”write”, “create”, “implement” | Patterns +2.0x, Golden Paths +1.5x |
| Analysis | ”why”, “analyze”, “explain” | Decisions +2.0x |
| Debugging | ”debug”, “fix”, “error”, “bug”, “trace”, “stack”, “exception”, “breakpoint”, etc. | Recency +35%, Antipatterns +2.0x |
| Discussion | General conversation | Balanced weights |
| Continuation | Follow-up in long thread | Recency +30% |
Classification Signals
| Signal | What It Indicates |
|---|---|
| Message < 50 chars | Likely trivial or simple |
| Contains code blocks | Needs accuracy, likely complex |
| References “we”, “our”, “before” | History-dependent, needs memory |
| Conversation turn > 5 | Deep context already built |
| Technical jargon density | Domain-specific, needs patterns |
Step 2: Token Budget Assignment
Each complexity level gets a memory token budget:
| Complexity | Memory Budget | Typical Use Case |
|---|---|---|
| Trivial | 0 tokens | Skip memory entirely |
| Simple | 500 tokens | Quick factual questions |
| Moderate | 2,000 tokens | Standard code generation |
| Complex | 5,000 tokens | Debugging, multi-file work |
| Deep | 8,000 tokens | Architecture discussions |
Budget Modifiers
The base budget is adjusted by:
- +50% if you reference history (“as we discussed”)
- +25% for long conversations (turn > 10)
- -50% if you’ve enabled “prioritize speed” preference
- Hard ceiling: 10,000 tokens — beyond this, context becomes noise
Step 3: Relevance Scoring
Every learning in your project memory is scored from 0.0 to 1.0.
Scoring Formula
Final Score =
(Semantic Similarity × W_semantic) +
(Recency Score × W_recency) +
(Domain Match × W_domain) +
(Usage Frequency × W_usage) +
(Co-occurrence Bonus) +
(Type Boost × intent multiplier)Base weights come from the intent profile (e.g., “question” uses semantic=0.55, recency=0.10) and are then dynamically shifted based on conversation context:
| Context Signal | Weight Shift | Rationale |
|---|---|---|
| Deep conversation (>10 turns) | Recency +0.10, Semantic -0.10 | Recent context matters more as conversation evolves |
| Code context present | Domain +0.08, Usage +0.02, Semantic -0.10 | Domain matching is critical when code is present |
| History references (“we discussed…”) | Semantic +0.10, Recency -0.05, Domain -0.05 | Past conversations need strong semantic matching |
Weights are clamped to [0, 1] and re-normalized to sum to 1.0 after shifting.
Components
Semantic Similarity (~50%) How closely does this learning match the meaning of your query? Uses vector embeddings for comparison.
Recency Score (~15%) More recent learnings score higher. Uses per-type decay curves (see Tiered Context).
Domain Match (~15%)
Proportional scoring based on domain overlap. If your query mentions “database” and “security”, a learning tagged with both scores higher than one tagged with just “database”:
domainScore = 0.2 × (overlap / max(queryDomains, learningDomains))This rewards precision—learnings that match more of your query’s domains rank higher.
Usage Frequency (~5%) Patterns that get referenced frequently are more valuable. Logarithmic curve that plateaus after ~20 uses, capped at +0.15.
Co-occurrence Bonus (up to +0.10) Items that frequently co-occur with high-scoring items in positive-feedback contexts get a bonus. This surfaces knowledge that consistently works well together — e.g., if “use Zod validation” and “early returns” always appear together in successful responses, both get a retrieval boost.
Type Boost (varies by intent)
| Type | Base Boost | Modified by Intent |
|---|---|---|
| Invariant | +0.25 | Stable across intents |
| Golden Path | +0.15 | +1.5x for debugging, generation |
| Pattern | +0.10 | +2.0x for generation |
| Decision | +0.10 | +2.0x for analysis, 0.5x for debugging |
| Antipattern | +0.05 | +2.0x for debugging |
Invariants get the highest boost because they prevent mistakes.
Step 4: Memory Selection
With scores calculated, Chorum selects memories to fill the token budget:
- Sort all memories by score (highest first)
- Apply intent-adaptive thresholds — skip items below the minimum score for this intent
- Add highest-scoring items until budget is reached
Intent-Adaptive Thresholds
Different intents use different noise thresholds. Debugging casts a wider net to catch relevant antipatterns and golden paths, while generation demands precision:
| Intent | General Threshold | Invariant Threshold |
|---|---|---|
| Debugging | 0.25 | 0.15 |
| Continuation | 0.30 | 0.18 |
| Question / Analysis / Discussion | 0.35 | 0.20 |
| Generation | 0.40 | 0.20 |
| Greeting | 0.50 | 0.30 |
Invariants always have a lower threshold than general items because they prevent mistakes.
Step 5: Context Assembly
Selected memories are formatted for clean injection:
<chorum_context>
## Active Invariants
- Always use Zod for runtime validation (learned: Jan 15)
- Never store secrets in environment variables without encryption
## Relevant Patterns
- This project uses the repository pattern for data access
- Error handling follows the Result<T, E> pattern
## Recent Decisions
- Chose PostgreSQL over SQLite for multi-user support (Jan 20)
## Project Facts
- Tech stack: Next.js, TypeScript, Drizzle ORM
</chorum_context>Why structured format:
- Clear sections help the model parse relevance
- Labeled dates help the model weight recency
- Consistent format allows the model to learn the pattern
Performance Targets
| Metric | Target |
|---|---|
| Query classification | < 50ms |
| Embedding generation | < 100ms |
| Relevance scoring | < 50ms |
| Memory selection | < 10ms |
| Context assembly | < 10ms |
| Total overhead | < 220ms |
Simple queries stay fast. Complex queries get rich context.
User Controls
The Conductor gives you several ways to influence relevance gating without touching the scoring math directly.
Memory Depth (Conductor Lens)
A project-level setting that shifts the scoring thresholds and token budget:
| Depth | Budget Effect | Threshold Effect | Best For |
|---|---|---|---|
| Light | 0.7x budget | +0.10 (stricter) | Established projects, speed priority |
| Normal (default) | No change | No change | Most projects |
| Rich | 1.3x budget | -0.10 (looser) | New projects, complex domains |
Set this in Settings > Memory & Learning.
Pin & Mute
Direct steering of individual items:
- Pinned items bypass the relevance threshold entirely — they’re always injected (budget permitting)
- Muted items are filtered out before scoring — they never appear in context
Focus Areas
Project-level domain tags that give a permanent +0.05 bonus to matching items, even when the current query doesn’t mention that domain.
Per-Query Adaptation
The Conductor adapts automatically based on your query:
- “Quick question: what’s the port?” → Lower complexity, smaller budget
- “Help me understand the full auth flow” → Higher complexity, larger budget, lower thresholds
See The Conductor for the complete guide to all controls.
Why Invariants Get Priority
Invariants are special because violating them causes real problems:
- They protect against security issues
- They enforce team standards
- They prevent known bugs
A relevant invariant is like a senior engineer tapping you on the shoulder: “Hey, don’t forget about this rule.”
Invariants get the highest type boost (+0.25), the lowest score thresholds, and pinned invariants bypass scoring entirely.
Related Documentation
- Memory Overview — How the memory system works
- The Conductor — User-facing controls for steering injection
- Tiered Context — How memory adapts to different model sizes
- Learning Types — What each memory type means
- Confidence Scoring — How project confidence affects injection