Skip to content

What is Olympus?

Olympus is a multi-agent DevOps system: one human, a coordinator, and a team of LLM specialists that operate real infrastructure. The vibe-coding tools let anyone build anything they can imagine — but keeping it running is still a DevOps problem. Olympus is the smallest viable answer: built by one person, used by one person, to operate infrastructure that used to take a whole DevOps team.

It was built for CS 153: Frontier Systems at Stanford. There's a live instance at demo.0lympu5.com.

The two ways you talk to it

  • CLIolympus "..." routes your task to the single best specialist via an LLM router (or deterministic keyword routing offline) and runs it to completion, prompting on stdin before anything destructive.
  • Dashboard chat — a group-chat ticket: you, a main coordinator, and any specialists it pulls in are participants in one thread. The coordinator does no domain work itself — it dispatches subtasks to specialists and asks them direct questions, narrating its reasoning as a live, interleaved "thinking" trace. Destructive actions surface inline approval cards.

The team

AgentDoesDestructive verbs
sysadminKubernetes runtime ops (kubectl), logs, events, ssh_run; + NetDB DNS/IPAM over MCPdelete_pod, ssh_run
programmerAuthors files — Dockerfiles, compose, Helm values, scriptswrite_file, edit_file, delete_file
terraformRuns existing Terraform stackstf_apply, tf_destroy
ansibleRuns playbooks + host introspection over SSHrun_playbook, run_module
hpcSlurm scheduling + GPU health (via MCP)gated Slurm ops

Two non-routable agents round it out: main (the group-chat coordinator) and terminal_companion (a read-only observer for the in-browser SSH terminal).

The four safety invariants

Everything in Olympus is a different arrangement of these four:

  1. Tool-gated execution. An agent declares a fixed tool set and a fixed set of destructive_verbs. The runtime wraps every tool so the agent cannot call anything outside its declaration — no matter what the LLM emits. The sysadmin agent cannot run terraform apply even if asked.
  2. Human-in-the-loop on destructive ops. Any destructive verb re-enters the runtime through an approval hook before it executes. A self-protection policy sits in front of approval: calls targeting Olympus's own cluster/hosts are hard-denied, so no one can escalate by managing the system that gates them.
  3. Append-only audit. Every tool call is logged twice — pre-execution (with the approval decision) and post-execution (with the result).
  4. Bus-based observability. The orchestrator publishes task/agent/tool/ approval/result events to a bus; the dashboard projects them into a per-ticket transcript streamed to the browser.

What else it does

  • Memory + feedback — writes a compact transcript of every settled task and retrieves the most-similar prior runs at the next task start; 👍/👎/correction tunes future retrieval.
  • Per-verb rollback — captures the inverse of a destructive op before it fires; undoing re-prompts approval.
  • Cost telemetry — per-invocation cost tracked on the agent; a group-chat turn aggregates the cost of every specialist it dispatched, with a per-agent breakdown and per-user daily caps.
  • MCP — third-party tools graft onto a named agent over stdio or HTTP, gated and audited like native tools.

Next: Architecture & concepts for how it fits together, or jump to the Quick start.