Skip to content

Type a request. The right agent answers.

An open source, local-first agent dispatch system. One chat box on your phone, a routing layer that knows which of two dozen specialized agents should take the job, and models that run on your own hardware.

View on GitHub

MIT licensed. Built on the Hermes AI harness.

what this is

Hermes Dispatch turns a fleet of local LLM agents into something you can use from your phone like a single assistant. You type a request in plain language. A dispatch layer figures out which agent is the right one, expands your prompt into a proper brief, and hands it off. The agent runs locally through Ollama and LiteLLM and streams the answer back.

There are 24 or more agents out of the box, covering developer advocacy, content and go-to-market, finance, sales, client delivery, productivity, and legal. Each one is a Hermes profile: a system prompt in a SOUL.md file plus a pinned model alias. You decide which model sits behind each alias.

why it exists

A pile of well-tuned agents is only useful if you can reach the right one without thinking about it. The usual options are a wall of dropdowns, a folder of prompt files, or a desktop app you have to be sitting at. None of that works when the thought hits you on a walk.

Dispatch removes the routing decision. You describe what you want and the system picks the agent. Because everything runs on your own models, your prompts and the artifacts they produce stay on your machine. Nothing is exposed to the public internet.

how dispatch works

Every request makes two quick LLM calls before it reaches an agent. The first is a fast router on the structured alias that reads your message and chooses the target agent. The second is a prompt enhancer on the reasoning alias that turns your one-line request into a fuller brief the agent can act on. Then the chosen agent runs on its own pinned alias.

  1. 01

    Route

    A fast classifier reads the request and names the agent that should handle it.

  2. 02

    Expand

    A prompt enhancer rewrites your short request into a complete brief, so the agent starts with context instead of a fragment.

  3. 03

    Run

    The chosen agent runs on its pinned models and streams the answer back to your chat.

model aliases

Agents never name a model directly. They name an alias, and you map the alias to whatever you actually have. Aliases come in two axes. The three tiers are the capability and cost spine, and they are all you have to map to get running. The four task roles are optional specializations: point one at a dedicated model when a specialist beats the tier, or leave it blank and it inherits the tier shown. Swap models later without touching a single agent.

tiers, required

fast
Smallest and quickest: triage, routing, extraction, short chat.
balanced
Mid capability: a solid default for most work.
max
Largest: best quality, reserved for high-consequence work.

task roles, optional

structured inherits fast
A model that reliably emits clean JSON and fixed schemas.
code inherits balanced
A dedicated coding model.
writing inherits balanced
A prose-tuned model for long-form.
reasoning inherits max
A chain-of-thought model for analysis and arithmetic.

what you get

The mobile UI keeps persistent chat sessions with streaming output. Artifacts the agents produce are saved to disk, with an optional Obsidian integration if you keep a vault. Access from your phone runs over Tailscale, so the whole thing stays on your private network with nothing public-facing.

It works with any OpenAI-compatible backend. Ollama and LiteLLM are the default local pair, but you can point an alias at Groq, OpenAI, or anything else that speaks the same API. Local-first is the default, not the only option.

three ways to set it up

  1. 01

    Do it Yourself

    Documentation only. Clone the repo, read the setup guide, wire your own models and agents. Free, and you keep full control of every config file.

  2. 02

    Do it With Me

    An interactive setup.sh walks you through model mapping, agent selection, and Tailscale config one prompt at a time. You answer questions, it writes the files.

  3. 03

    Do it For Me

    A Docker Compose stack that brings up dispatch, the agents, and the mobile UI together. Map your model aliases, point it at a backend, and it runs.

get it

Clone it and point it at your own models.

The repo has the agents, the dispatch layer, the mobile UI, and the setup paths. MIT licensed, so fork it and make it yours.