Greg Isenberg

GLM 5.2 Clearly Explained (and how to set it up)

⏱ 23 min video · 3 min read23 Jun 2026

TL;DR

Greg Isenberg and guest Amir break down GLM 5.2, a new open-source AI model from ZAI that rivals Claude Opus 4.8 on coding benchmarks at roughly 5x lower token cost. They explain how to set it up in Cursor or Codex via Open Router, and why a hybrid model strategy (planning with a frontier model, executing with GLM 5.2) is the smart workflow for builders watching token spend.

Key points

GLM 5.2 scores 81 on Terminal Bench 2.1 and 62.1 on coding benchmarks, about 4-7 points behind Claude Opus 4.8, but costs roughly 5x less per token via Open Router (44 cents vs $2.38 for a typical 50k input / 85k output token job).

Setup in Cursor: get a ZAI API key, paste it into the OpenAI key field in Cursor settings, override the OpenAI endpoint with ZAIs endpoint, then add GLM 5.2 as a custom model. In Codex, use your Open Router key and create a custom profile.

GLM 5.2 currently lacks vision/image capabilities, but you can work around this by using Opus 4.8 to describe screenshots in text, then feeding that description to GLM 5.2 to act on it.

Open Router supports a fusion or sequencing approach: plan with a powerful thinking model like Opus 4.8, execute with GLM 5.2, then review with Codex or Gemini 2.5 to get frontier-quality output at a fraction of the cost.

Companies are now hitting token budget limits and starting governance conversations about which models employees should use for which tasks, making cost-aware model selection an emerging enterprise concern.

Actionable insights

→

Start with Open Router today: load $20 in credits, add GLM 5.2, and test it inside Cursor or Codex CLI without buying any hardware.

→

Use a plan-execute-review chain: Opus 4.8 to plan and interpret images, GLM 5.2 to execute code changes, and a review model to QA — this preserves quality while cutting token costs by up to 5x on execution tasks.

→

If you are scaling a team, audit who is using high-thinking frontier models for low-complexity tasks like formatting emails; swapping those to cheaper models is the fastest governance win on token spend.

→

Consider making an upfront hardware investment now if you anticipate heavy local model usage, since AI subsidies are expected to decrease as providers move toward profitability.

Notable quotes

“You should be token minimizing as much as possible and output maxing instead.”

“What if I plan with Opus, execute with 5.2, and then review with Gemini 2.5 or Codex? There are a lot of ways, and I think we can be really effective, and I think that is what the smart people are going to be doing in the near future.”

“Sooner or later, the subsidy is going to run out.”

Worth watching?

⏭️

Worth watching the full video?

The key setup steps, benchmark context, and hybrid workflow strategy are all captured here — skip the video unless you want to see the live demo of GLM 5.2 refining a front-end UI in real time.

Topics

AI & Tech Open Router

Explore more summaries on these topics →

Saved you some time? The creator still deserves a like.

Watch on YouTube →

More like this