David Ondrej

Hermes Agent + Mixture of Agents is insane…

⏱ 36 min video · 3 min read30 Jun 2026

TL;DR

David Ondrej demonstrates how to set up Hermes Agent with Mixture of Agents (MOA) on a Hostinger VPS, using Claude Code and Pi Agent to automate the installation. MOA combines multiple frontier models (GPT-5.5, Opus 4.8, GLM 5.2, Kimi K2.7) in parallel and feeds their outputs to an aggregator, achieving intelligence that benchmarks above any single publicly available model.

Key points

Mixture of Agents (MOA) runs multiple AI models in parallel as reference agents, then feeds all outputs to a single aggregator model that synthesizes the final answer — benchmarking above GPT-5.5 and Opus 4.8 individually.

MOA is now a native preset feature in Hermes Agent, appearing as a swappable model with full tool calling, memory, and session context preserved.

The full VPS setup — SSH, Hermes installation, Open Router API key, and MOA configuration — was handled almost entirely by Claude Code acting on the terminal, with minimal manual input.

Pi Agent (running GLM 5.2 Fast via Vercel AI Gateway) was used to monitor and steer the Hermes Agent during a long task, automatically sending steering prompts when Hermes stalled for over 3 minutes.

The MOA demo built and publicly deployed a 3D Flappy Bird game end-to-end without the creator providing deployment credentials — total API spend was approximately $20 for that single task.

Actionable insights

→

Use MOA only for high-value tasks like hard debugging, code architecture, or security review — it costs significantly more and runs slower than a single model due to parallel token usage across multiple providers.

→

Set reference model temperature to ~0.9 (creative/diverse) and aggregator temperature to ~0.2 (consistent/predictable) to get the most from the MOA architecture.

→

Use tmux-style terminal multiplexers like Warp or Zellij (the creator uses CMAX) to run multiple agents in parallel panes — one manager agent can monitor and steer another agent on a remote VPS without your involvement.

→

Limit your Open Router API key spend before enabling MOA, as running four reference models plus an aggregator can consume tokens rapidly — the demo hit $20 for one complex task.

→

Shift more token spend toward open-weight models like GLM 5.2 and Kimi K2.7 Code, which the creator argues match closed frontier models in coding tasks at a fraction of the cost and without data privacy concerns.

Notable quotes

“The defining skill of the AI era is the ability to ask better questions. It is to have better logic. It is to think more abstract rather than saying do this task, do this task.”

“Do not build your business and life on top of closed models. You will absolutely regret this in two to three years where access to AI will be more essential than access to water.”

“These companies do not have your best interest in mind. They distilled all of the web's knowledge, all of humanity's knowledge, art, code, literature, everything, and they created for their own closed models using it to generate disgusting amounts of profits.”

Worth watching?

⏭️

Worth watching the full video?

The key setup steps, architecture explanation, and cost figures are all captured here — only watch if you want to follow the live terminal walkthrough or need the exact Claude Code prompts, which are also available via the free resource link mentioned in the video.

Topics

AI & Tech Hermes Agent

Explore more summaries on these topics →

Saved you some time? The creator still deserves a like.

Watch on YouTube →

More like this