Ollama

Ollama runs open-source models entirely on your machine. No API keys, no network calls, no data leaving your laptop.

Upgrade: Local Agent Engine

Groove now includes a built-in agentic runtime for local models that goes far beyond what the basic Ollama CLI integration offers. The Local Agent Engine gives your local models tool calling (read, write, edit, run commands, search), context rotation, journalist synthesis, and full team coordination -- all running on your machine.

Recommended: Use the Local Models provider in the spawn wizard instead of the Ollama CLI provider. You get the same models with dramatically more capability.

Installation

bash

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

Pull a model with tool calling support:

bash

ollama pull qwen2.5-coder:7b

Verify it's running:

bash

ollama list

Two Ways to Use Ollama with Groove

1. Local Agent Engine (Recommended)

Select Local Models as the provider in the spawn wizard. Groove runs its built-in agent loop that talks to Ollama's OpenAI-compatible API (localhost:11434/v1). Your agent gets:

7 tools: read files, write code, edit files, run commands, search files, grep content, list directories
Real-time streaming to the GUI
Context rotation with handoff briefs
Token tracking from API responses
Interactive chat -- send messages to running agents
Full team coordination with cloud and local agents

2. Ollama CLI (Legacy)

Select Ollama (Local) as the provider. Groove spawns ollama run <model> as a child process. This is the simpler integration:

Text-only output (no tool calling)
One-shot: prompt in, response out
Estimated token tracking from text length
Still participates in team coordination (introductions, file locks, journalist)

The CLI mode is kept for backward compatibility. For new projects, use the Local Agent Engine.

Recommended Models

Model	Size	RAM	Best For
Qwen 2.5 Coder 7B	4.7 GB	8 GB	General coding, fast iteration
Qwen 2.5 Coder 14B	9 GB	16 GB	Complex features, debugging
Qwen 2.5 Coder 32B	20 GB	24 GB	Architecture-level work, rivals GPT-4o
DeepSeek R1 14B	8.5 GB	12 GB	Chain-of-thought debugging
Llama 3.1 8B	4.7 GB	8 GB	Large 128K context window
Codestral 25B	14 GB	18 GB	Multi-language, autocomplete

Apple Silicon

Macs with Apple Silicon (M1/M2/M3/M4) use unified memory -- all RAM is GPU RAM. A MacBook Pro with 36 GB can comfortably run a 32B model. This is the best local inference hardware available.

When to Use Local Models

Privacy -- your code never leaves the machine
Cost -- zero API spend, run as many agents as your hardware supports
Offline -- works on air-gapped machines and restricted networks
Experimentation -- try different models and quantizations at no cost
Mixed teams -- pair a cloud planner (Opus) with local builders (Qwen) for best of both worlds

Next Steps

Local Agent Engine -- full guide on the agentic runtime, model browser, and tool calling
Model Routing -- how Groove picks the right model tier for each task

Ollama ​

Installation ​

Two Ways to Use Ollama with Groove ​

1. Local Agent Engine (Recommended) ​

2. Ollama CLI (Legacy) ​

Recommended Models ​

When to Use Local Models ​

Next Steps ​

Ollama

Installation

Two Ways to Use Ollama with Groove

1. Local Agent Engine (Recommended)

2. Ollama CLI (Legacy)

Recommended Models

When to Use Local Models

Next Steps