Data SovereigntyLocal-First Operation

Local-First Operation

Chorum can run entirely on your machine—no cloud, no API calls, no data leaving your computer. Use local models like Ollama for complete privacy.

Why This Matters

Sometimes you need absolute privacy:

  • Working with classified or sensitive projects
  • Air-gapped environments
  • Avoiding API costs entirely
  • Maximum control over your data

Local-first mode gives you all of Chorum’s features without any external dependencies.


What “Local-First” Means

ComponentCloud ModeLocal-First Mode
LLM inferenceOpenAI, Anthropic, etc.Ollama, LM Studio
Memory storageYour machineYour machine
EmbeddingsLocal by defaultLocal by default
AuthenticationSupabase AuthLocal/optional
Data sent externallyPrompts to LLM providersNothing

Setting Up Ollama

1. Install Ollama

Download and install from ollama.ai:

# macOS
brew install ollama
 
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
 
# Windows
# Download installer from ollama.ai

2. Pull a Model

Download a model to use:

# Good all-around model
ollama pull llama3.2
 
# Smaller, faster model
ollama pull mistral
 
# Larger, more capable
ollama pull mixtral

3. Start Ollama

ollama serve

Ollama runs on http://localhost:11434 by default.

4. Configure in Chorum

  1. Go to Settings → Providers
  2. Find Ollama in the provider list
  3. Configure:
    • Base URL: http://localhost:11434 (default)
    • Model: Select from detected models
  4. Verify connection shows “Connected - X model(s) found”

Ollama Configuration


Complete Local Setup

To run Chorum with zero external connections:

Step 1: Disable Cloud Providers

  1. Go to Settings → Providers
  2. Remove or disable API keys for:
    • OpenAI
    • Anthropic
    • Google
    • Perplexity
    • Any other cloud providers

Step 2: Configure Local Provider

  1. Set up Ollama (see above) or LM Studio
  2. Verify connection in Chorum
  3. Set as default provider

Step 3: Verify Local-Only

In Settings → Providers, you should see:

  • No cloud providers configured
  • Ollama (or LM Studio) connected with models

Now all LLM calls go to your local model.


Local Model Options

Ollama

Best for: General use, easy setup, good model variety

ModelSizeBest For
llama3.24.7GBGeneral coding, chat
codellama4.7GBCode generation
mistral4.1GBFast responses
mixtral26GBComplex reasoning
phi32.3GBLightweight use

LM Studio

Best for: GUI-based management, experimentation

  1. Download from lmstudio.ai
  2. Browse and download models from the UI
  3. Start the local server
  4. Configure in Chorum’s provider settings

Other Local Options

  • GPT4All — Cross-platform, easy setup
  • llamafile — Single-file executable models
  • Text Generation WebUI — Advanced options

Performance Considerations

Hardware Requirements

Model SizeRAM NeededGPU VRAM
7B params8GB+6GB+
13B params16GB+10GB+
34B+ params32GB+24GB+

Without GPU: Models run on CPU (slower but works) With GPU: Much faster inference, especially for larger models

Response Speed

Local models are generally slower than cloud APIs:

ModelTypical Speed
Small (7B) on GPU30-50 tokens/sec
Small (7B) on CPU5-15 tokens/sec
Large (34B) on GPU15-25 tokens/sec

For most use cases, this is acceptable. For complex tasks, patience is required.


Budget and Local Models

When using local models:

  • No per-token cost — Run unlimited queries
  • Budget settings ignored — They don’t apply to local
  • Daily budget UI hidden — Not relevant

The cost is your hardware and electricity, not API fees.


Routing with Local Models

Chorum’s intelligent routing still works:

Query ComplexityLocal Behavior
Simple questionsSmaller model if available
Complex codingLarger model if available
Reasoning tasksBest available model

If you only have one local model, all queries go to it.


Mixing Local and Cloud

You can use local models for some things and cloud for others:

Example: Local for Privacy, Cloud for Power

  • Sensitive projects: Route to Ollama
  • General projects: Route to cloud providers

Configure in Settings → Resilience:

  • Set Ollama as primary for specific projects
  • Cloud providers as fallback

Example: Local as Fallback

  • Primary: Cloud providers (OpenAI, Anthropic)
  • Fallback: Ollama when cloud is unavailable

This gives you speed normally, with local backup.


Limitations of Local Models

Be aware of trade-offs:

AspectCloud ModelsLocal Models
Response qualityGenerally betterVaries by model
Context window100K+ tokensUsually 4-32K
SpeedFastDepends on hardware
CostPer-tokenFree (after hardware)
PrivacyData sent to providerStays local

For many tasks, local models are excellent. For complex reasoning or long documents, cloud models may perform better.


Troubleshooting

”Connection refused” to Ollama

  1. Verify Ollama is running: ollama list
  2. Check the server: curl http://localhost:11434
  3. Verify the port matches your Chorum config

”Model not found”

  1. Check available models: ollama list
  2. Pull the model: ollama pull <model-name>
  3. Refresh the model list in Chorum settings

Slow responses

  1. Use a smaller model
  2. Reduce max tokens in response
  3. Consider GPU acceleration
  4. Check system resources (RAM, CPU)

Memory issues

Large models need significant RAM:

  • Close other applications
  • Use a smaller model
  • Enable swap space (slower but works)

FAQ

Can I use local models with MCP?

Yes. When external agents query via MCP, they can trigger local model inference. The agent never knows if you’re using local or cloud.

Do embeddings use local models?

By default, Chorum uses local embeddings (no cloud call needed). This is independent of your LLM choice.

Can I run without internet at all?

Yes. Once models are downloaded and Chorum is set up, no internet is required. You can even run in airplane mode.

How do I update local models?

ollama pull <model-name>  # Re-pulls latest version