summree
I Tested the Fable 5 Killer (Hermes Agent)
Claude
Jack Roberts

I Tested the Fable 5 Killer (Hermes Agent)

⏱ 19 min video · 3 min read24 Jun 2026
TL;DR
Jack Roberts benchmarks two hyped AI models — Sakana Fugu (an intelligent model router) and GLM 5.2 (a Chinese open-weight model) — against Claude Opus 4.8 inside his Hermes agent system. Neither kills Claude Fable 5, but GLM 5.2 emerges as a surprisingly strong value pick at 1/6th the price.
Key points
1
Sakana Fugu is not a model but an intelligent router that secretly calls a pool of frontier models (Gemini, Opus 4.8, ChatGPT, Minimax, etc.) via one API — it added significant latency, often 2-3x slower than Opus 4.8, and used far more tokens for similar or worse output.
2
GLM 5.2 is an open-weight Chinese model that costs 1/6th the price of Claude Opus 4.8, offers a 1 million token context window, and performed comparably or better than Opus 4.8 in coding and web creation tasks.
3
In tool-calling tests (retrieving an Outlook email via Zapier MCP), Fugu won but used 230,000 tokens vs Opus 4.8's fewer tokens; GLM 5.2 failed initially and required a third retry, raising robustness concerns.
4
In website creation and code improvement tasks, GLM 5.2 produced the best visual output and used the fewest tokens, beating Opus 4.8 on both quality and efficiency in those specific tests.
5
Jack's recommended strategy is to make Hermes itself a black-box router — assigning specific models to specific task types (e.g. GLM 5.2 or Opus 4.8 for big-brain tasks, DeepSeek for high-volume cheap tasks) rather than relying on Fugu to do the routing.
Actionable insights
Use GLM 5.2 via OpenRouter for coding, website generation, and cost-sensitive tasks — it delivers near-Opus performance at 1/6th the price and is the best value-per-token model tested.
Avoid Sakana Fugu for time-sensitive agentic workflows — its latency overhead (2-3x slower, massively inflated token counts) outweighs the benefit of its automatic model routing in real-world use.
In Hermes agent, manually assign models to task types using the Pantheon skill system rather than relying on a black-box router like Fugu — this gives you speed, cost control, and quality without unpredictable latency.
When giving GLM 5.2 API access, use OpenRouter as the easiest integration path; for Claude Code integration, use the direct API key setup documented in Jack's linked Notion guide.
Always pass API keys to Hermes via terminal commands (not in chat) to keep credentials secure — Hermes can generate the exact terminal command for you on request.
Notable quotes

It is literally like Game of Thrones. It is impossible for many people to keep up with what the hell is actually going on with these models because it feels like there is a brand new king every week.

Fugu Ultra is 776,000 tokens, which is preposterously large for its output. And the wait time on that guys 3.5x greater than Opus 4.8 for an output that is probably a little bit better in its first shot.

GLM 5.2 is surprisingly good. It is a sixth of the price which is crazy. So in terms of power per token ratio, GLM 5.2 is really impressive.

Worth watching?
⏭️
Worth watching the full video?
The key benchmarks, conclusions, and setup steps are all captured here — only watch the full video if you want to see the live website comparisons or the Hermes agent UI walkthroughs in action.
Topics
AI & TechClaude

Explore more summaries on these topics →

Saved you some time? The creator still deserves a like.

Watch on YouTube →
More like this

Want this for your own channels?

Add the channels you follow. Every new video summarised and in your inbox the moment it drops. From £4/month.

Try it free