What is Olympus?
Olympus is a multi-agent DevOps system: one human, a coordinator, and a team of LLM specialists that operate real infrastructure. The vibe-coding tools let anyone build anything they can imagine — but keeping it running is still a DevOps problem. Olympus is the smallest viable answer: built by one person, used by one person, to operate infrastructure that used to take a whole DevOps team.
It was built for CS 153: Frontier Systems at Stanford. There's a live instance at demo.0lympu5.com.
The two ways you talk to it
- CLI —
olympus "..."routes your task to the single best specialist via an LLM router (or deterministic keyword routing offline) and runs it to completion, prompting on stdin before anything destructive. - Dashboard chat — a group-chat ticket: you, a
maincoordinator, and any specialists it pulls in are participants in one thread. The coordinator does no domain work itself — it dispatches subtasks to specialists and asks them direct questions, narrating its reasoning as a live, interleaved "thinking" trace. Destructive actions surface inline approval cards.
The team
| Agent | Does | Destructive verbs |
|---|---|---|
| sysadmin | Kubernetes runtime ops (kubectl), logs, events, ssh_run; + NetDB DNS/IPAM over MCP | delete_pod, ssh_run |
| programmer | Authors files — Dockerfiles, compose, Helm values, scripts | write_file, edit_file, delete_file |
| terraform | Runs existing Terraform stacks | tf_apply, tf_destroy |
| ansible | Runs playbooks + host introspection over SSH | run_playbook, run_module |
| hpc | Slurm scheduling + GPU health (via MCP) | gated Slurm ops |
Two non-routable agents round it out: main (the group-chat coordinator) and terminal_companion (a read-only observer for the in-browser SSH terminal).
The four safety invariants
Everything in Olympus is a different arrangement of these four:
- Tool-gated execution. An agent declares a fixed tool set and a fixed set of
destructive_verbs. The runtime wraps every tool so the agent cannot call anything outside its declaration — no matter what the LLM emits. The sysadmin agent cannot runterraform applyeven if asked. - Human-in-the-loop on destructive ops. Any destructive verb re-enters the runtime through an approval hook before it executes. A self-protection policy sits in front of approval: calls targeting Olympus's own cluster/hosts are hard-denied, so no one can escalate by managing the system that gates them.
- Append-only audit. Every tool call is logged twice — pre-execution (with the approval decision) and post-execution (with the result).
- Bus-based observability. The orchestrator publishes task/agent/tool/ approval/result events to a bus; the dashboard projects them into a per-ticket transcript streamed to the browser.
What else it does
- Memory + feedback — writes a compact transcript of every settled task and retrieves the most-similar prior runs at the next task start; 👍/👎/correction tunes future retrieval.
- Per-verb rollback — captures the inverse of a destructive op before it fires; undoing re-prompts approval.
- Cost telemetry — per-invocation cost tracked on the agent; a group-chat turn aggregates the cost of every specialist it dispatched, with a per-agent breakdown and per-user daily caps.
- MCP — third-party tools graft onto a named agent over stdio or HTTP, gated and audited like native tools.
Next: Architecture & concepts for how it fits together, or jump to the Quick start.