Model routing

Lower AI cost without giving up quality or control

Standard Agent costs can fall when lower-cost models pass quality checks, while premium fallbacks, budgets, client data policy, and human review stay in place.

Review Agent OS Review pricing

Gateway policy

You get lower model cost without hidden quality risk

The buyer benefit is not a specific model name. It is controlled access, fallback behavior, and spend visibility before cheaper routes are trusted with client work.

AI spend stays governed

Every standard model call stays behind approved budgets, fallback rules, provider allowlists, and spend records before it touches paid client work.

Lower-cost work is tested first

Drafting, summarization, classification, and long-context tasks move to GLM-5.2 only after quality, latency, privacy, and fallback checks prove the cheaper route is safe.

Risky work keeps stronger fallbacks

Higher-risk, regulated, latency-sensitive, media-specific, or quality-critical work can still use premium or client-specific routes when a cheaper model is not good enough.

Provider secrets stay server-side

OpenRouter, Z.AI, OpenAI, Anthropic, and routing-admin secrets do not reach public pages, browsers, setup forms, or client-visible bundles.

Sensitive work does not choose its own route

Data classification, approval policy, output QA, human supervision, and legal review decide where sensitive or regulated work may run.

Cost changes stay explainable

Usage, cost, fallback reason, Agent, template version, and client context stay reviewable when spend rises or quality needs investigation.

Alias contract

Your Agent should not get stuck on one provider

Server-side route aliases let AI Team switch providers after cost, quality, latency, privacy, and fallback review without rebuilding every Agent template.

Standard work: Routine tasks can use the lower-cost route after quality checks pass.
Long-context work: Large-context tasks get stricter checks for output quality, latency, cache behavior, and cost.
Direct provider option: Direct Z.AI access stays behind a backend route so provider secrets and routing changes do not reach public pages or client-visible bundles.
Premium fallback: Premium routes are kept for quality, data policy, modality, client contract, or failed quality-check cases.
Low-risk classifiers: Simple routing, spam checks, tagging, and extraction can use a separate approved low-cost route.

Gates

GLM-5.2 is a workhorse candidate, not a blanket replacement

Lower token cost is useful only if the route passes the same quality, privacy, fallback, and approval gates required for managed Agents.

1
Representative AI Team quality checks against setup review, CRM hygiene, support triage, inbox triage, reporting, SEO brief drafting, lead response, document collection, and exception detection.
2
Structured-output reliability and tool-call reliability checks for Agent templates that use the route.
3
Latency, retry, cache hit-rate, token usage, and cost evidence captured before the route is used for paid client work.
4
Fallback and outage behavior verified through LiteLLM routing policy and Agent OS exception paths.
5
Privacy, DPA, subprocessor, retention, data-classification, and security review before sensitive or regulated client data uses a route.

Related controls

Cheaper AI still has to pass cost, security, and QA checks

Model choices stay connected to provider ownership, human review, pricing, subprocessors, and go-live evidence.

Agent OS

Pricing

API keys and pass-through costs

Credential governance

FAQ

Model routing questions

Short answers for buyers and investors checking whether lower model cost creates hidden quality, privacy, or fallback risk.

Are OpenRouter or Z.AI secrets exposed to public pages?

No. Provider calls stay behind server-side aliases so AI Team can change routes, budgets, and fallbacks without exposing provider secrets or changing public pages.

Can GLM-5.2 become the default AI Team model?

Yes, for standard work only after AI Team quality checks, structured-output checks, tool-call checks, latency and cost budgets, fallback tests, and privacy/security review pass. Premium or client-specific routes remain available when the workhorse route is not suitable.

Why use OpenRouter first for GLM-5.2?

OpenRouter is currently the simpler and lower-cost launch route for GLM-5.2 access. Direct Z.AI remains available as a backend option if price, privacy terms, reliability, or latency become better there.

Who pays for model usage?

Standard model usage can be included when it stays inside AI Team budgets. Premium fallback, unusual volume, regulated data requirements, dedicated routes, and client-specific provider constraints can become custom, pass-through, or margin-monitored usage.

Does lower model cost remove human QA or approval rules?

No. Cheaper inference improves margin only when model quality, approval rules, data classification, spend caps, human supervision, and deployment QA remain intact.