Five savings levers for your Anthropic / OpenAI / Google API bill. Every dollar number anchored to a public vendor pricing page — same trust contract as our seat-compression catalog.
Public benchmarks suggest 60% of premium-tier work (code review, drafting, summarization) holds quality on the next-tier-down model. Re-pricing that slice at published rates yields the savings shown — engineering team picks the routing logic; CFO sees a per-month delta.
Verify on sourceAt $80K/mo you're squarely in enterprise territory. Both Anthropic and OpenAI offer 20–40% off list for committed annual contracts; 25% is a defensible mid-range anchor. Caveat: confidence is low because actual discount depends on your commit term + customer leverage.
Verify on sourceAnything that doesn't need an answer in <1 minute (nightly evals, bulk doc analysis, offline summarization) qualifies for Batch API at 50% off list. Engineering ships the queue change, CFO sees the discount kick in next billing cycle.
Verify on sourceAnthropic discounts cached reads by 90% (with a 25% write premium amortized over many reads). Caching requires explicit cache_control breakpoints on the API — engineering owns the change. At a typical 80% hit rate this nets ~67% off the cacheable input slice.
Verify on sourceInput tokens are a meaningful share of your bill. Audit per-request context: drop irrelevant retrieved chunks, prune chat history beyond the last N turns, switch from "send everything" to retrieval-then-LLM. Savings vary by workload; we don't quote a number without your token telemetry, but operational teams typically see 10–25% reduction on input cost from disciplined context management.
Verify on sourceSavings are composed sequentially. Enterprise rate negotiates the list price first; subsequent levers (model swap, batch API, prompt cache) discount the remaining spend. Per-lever sums equal the total exactly — no double-counting tokens that are hit by multiple levers.