Ollama
Ollama runs open-source models entirely on your machine. No API keys, no network calls, no data leaving your laptop.
Upgrade: Local Agent Engine
Groove now includes a built-in agentic runtime for local models that goes far beyond what the basic Ollama CLI integration offers. The Local Agent Engine gives your local models tool calling (read, write, edit, run commands, search), context rotation, journalist synthesis, and full team coordination -- all running on your machine.
Recommended: Use the Local Models provider in the spawn wizard instead of the Ollama CLI provider. You get the same models with dramatically more capability.
Installation
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | shPull a model with tool calling support:
ollama pull qwen2.5-coder:7bVerify it's running:
ollama listTwo Ways to Use Ollama with Groove
1. Local Agent Engine (Recommended)
Select Local Models as the provider in the spawn wizard. Groove runs its built-in agent loop that talks to Ollama's OpenAI-compatible API (localhost:11434/v1). Your agent gets:
- 7 tools: read files, write code, edit files, run commands, search files, grep content, list directories
- Real-time streaming to the GUI
- Context rotation with handoff briefs
- Token tracking from API responses
- Interactive chat -- send messages to running agents
- Full team coordination with cloud and local agents
2. Ollama CLI (Legacy)
Select Ollama (Local) as the provider. Groove spawns ollama run <model> as a child process. This is the simpler integration:
- Text-only output (no tool calling)
- One-shot: prompt in, response out
- Estimated token tracking from text length
- Still participates in team coordination (introductions, file locks, journalist)
The CLI mode is kept for backward compatibility. For new projects, use the Local Agent Engine.
Recommended Models
| Model | Size | RAM | Best For |
|---|---|---|---|
| Qwen 2.5 Coder 7B | 4.7 GB | 8 GB | General coding, fast iteration |
| Qwen 2.5 Coder 14B | 9 GB | 16 GB | Complex features, debugging |
| Qwen 2.5 Coder 32B | 20 GB | 24 GB | Architecture-level work, rivals GPT-4o |
| DeepSeek R1 14B | 8.5 GB | 12 GB | Chain-of-thought debugging |
| Llama 3.1 8B | 4.7 GB | 8 GB | Large 128K context window |
| Codestral 25B | 14 GB | 18 GB | Multi-language, autocomplete |
Apple Silicon
Macs with Apple Silicon (M1/M2/M3/M4) use unified memory -- all RAM is GPU RAM. A MacBook Pro with 36 GB can comfortably run a 32B model. This is the best local inference hardware available.
When to Use Local Models
- Privacy -- your code never leaves the machine
- Cost -- zero API spend, run as many agents as your hardware supports
- Offline -- works on air-gapped machines and restricted networks
- Experimentation -- try different models and quantizations at no cost
- Mixed teams -- pair a cloud planner (Opus) with local builders (Qwen) for best of both worlds
Next Steps
- Local Agent Engine -- full guide on the agentic runtime, model browser, and tool calling
- Model Routing -- how Groove picks the right model tier for each task
