Skip to content

Ollama

Ollama runs open-source models entirely on your machine. No API keys, no network calls, no data leaving your laptop.

Upgrade: Local Agent Engine

Groove now includes a built-in agentic runtime for local models that goes far beyond what the basic Ollama CLI integration offers. The Local Agent Engine gives your local models tool calling (read, write, edit, run commands, search), context rotation, journalist synthesis, and full team coordination -- all running on your machine.

Recommended: Use the Local Models provider in the spawn wizard instead of the Ollama CLI provider. You get the same models with dramatically more capability.

Installation

bash
# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

Pull a model with tool calling support:

bash
ollama pull qwen2.5-coder:7b

Verify it's running:

bash
ollama list

Two Ways to Use Ollama with Groove

Select Local Models as the provider in the spawn wizard. Groove runs its built-in agent loop that talks to Ollama's OpenAI-compatible API (localhost:11434/v1). Your agent gets:

  • 7 tools: read files, write code, edit files, run commands, search files, grep content, list directories
  • Real-time streaming to the GUI
  • Context rotation with handoff briefs
  • Token tracking from API responses
  • Interactive chat -- send messages to running agents
  • Full team coordination with cloud and local agents

2. Ollama CLI (Legacy)

Select Ollama (Local) as the provider. Groove spawns ollama run <model> as a child process. This is the simpler integration:

  • Text-only output (no tool calling)
  • One-shot: prompt in, response out
  • Estimated token tracking from text length
  • Still participates in team coordination (introductions, file locks, journalist)

The CLI mode is kept for backward compatibility. For new projects, use the Local Agent Engine.

ModelSizeRAMBest For
Qwen 2.5 Coder 7B4.7 GB8 GBGeneral coding, fast iteration
Qwen 2.5 Coder 14B9 GB16 GBComplex features, debugging
Qwen 2.5 Coder 32B20 GB24 GBArchitecture-level work, rivals GPT-4o
DeepSeek R1 14B8.5 GB12 GBChain-of-thought debugging
Llama 3.1 8B4.7 GB8 GBLarge 128K context window
Codestral 25B14 GB18 GBMulti-language, autocomplete

Apple Silicon

Macs with Apple Silicon (M1/M2/M3/M4) use unified memory -- all RAM is GPU RAM. A MacBook Pro with 36 GB can comfortably run a 32B model. This is the best local inference hardware available.

When to Use Local Models

  • Privacy -- your code never leaves the machine
  • Cost -- zero API spend, run as many agents as your hardware supports
  • Offline -- works on air-gapped machines and restricted networks
  • Experimentation -- try different models and quantizations at no cost
  • Mixed teams -- pair a cloud planner (Opus) with local builders (Qwen) for best of both worlds

Next Steps

  • Local Agent Engine -- full guide on the agentic runtime, model browser, and tool calling
  • Model Routing -- how Groove picks the right model tier for each task

FSL-1.1-Apache-2.0