Is consumption really the only viable pricing model for AI? Marcos Rivera, founder & CEO of Pricing I/O, makes the case that it is… but that doesn’t mean every company should be charging by the token tomorrow.
In this session from Schematic’s Monetizing AI summit, Marcos breaks down why “clumsy consumption” models fail, how to avoid the dreaded “oh sh*t” billing moment, and what frameworks help SaaS and AI companies balance clarity, fairness, and profitability in usage-based pricing.
If you’re building AI products, working on pricing strategy, or trying to make sense of credits, tiers, or overage policies... this one’s worth your full watch.
Marcos Rivera: Is consumption really the only viable model in the world of AI? I think it is. But there’s more than one path to get there — and it doesn’t mean everyone should start charging by the token tomorrow.
The “oh sh*t” moment everyone’s worried about is the massive bill that makes someone walk into their boss’s office asking for more budget. It’s nerve-wracking to sign a software agreement without knowing what the bill will be. Real CFOs don’t buy that way. They won’t accept “it depends how much you use it” as an answer.
I’m Marcos Rivera, CEO of Pricing I/O and former head of pricing at Vista Equity Partners. I’ve done over 500 pricing engagements across 25 years of software evolution — from on-prem to SaaS to AI.
I like to keep pricing fun. You’re literally building the systems that make money. So let’s talk about where consumption models go wrong, what I call “clumsy consumption,” and how to fix them.
As software evolved, value shifted:
Access era: You bought systems — licenses, servers — and value was tied to owning technology.
Activity era: Value moved to what people did with those systems.
Action era (now): Value comes from what systems do for people. AI handles more of the work itself.
That changes how value is exchanged — and how we should price.
I track pricing mistakes closely because they’re great learning moments.
Jasper: Charged by word count. Customers tried to game it — a clear sign the metric didn’t align with perceived value.
Cursor: Marketed “unlimited” usage, then hit customers with surprise overage bills. Result: backlash and churn.
Salesforce (Agentforce): Tried per-conversation pricing, got pushback, then pivoted to a credit-based “Flex” model that gave customers more control and predictability.
If customers are confused, angry, or trying to bypass your metric, that’s clumsy consumption.
About 18 years ago, I ran pricing for a product sold to insurance carriers like State Farm and Progressive. They wanted a flat fee with unlimited usage. The product originally priced per user — per field adjuster — but the software actually made them more efficient, meaning fewer users and less revenue for us. That’s when I switched it to a consumption model based on number of claims processed.
It worked — revenue jumped from $300K to $30M in ~2.5 years — because we aligned pricing with value. But it wasn’t perfect. Natural disasters caused unpredictable spikes in claims, and customers feared surprise bills. We eventually added safeguards to manage that risk — a lesson that applies directly to AI pricing today.
Looking across 70+ AI companies, most pricing models share three layers:
Base: The plan or access tier (features, support, performance).
Included volume: Credits, requests, or actions included in the plan.
Scale volume: What happens beyond the included limit — where risk and confusion live.
Most of the pain in consumption pricing happens in that third layer.
Classic good / better / best packaging. Access itself gates consumption — higher tiers unlock heavier workloads, more advanced models, or multi-step tasks. You’re indirectly managing usage behind feature gates. Fair-use limits (“unlimited within reason”) help protect you without shocking the customer.
A fixed fee plus a per-unit charge. Example: Intercom — $59 per agent + $0.99 per outcome. Common for agentic AI tools. Predictable baseline, scalable upside.
A base fee, an included allocation, and an overage rate. Feels fair to customers but introduces anxiety — “Will I go over?” This model dominates AI SaaS today. It’s where most design mistakes happen.
Prepaid consumption with flexible burn rates. Works if the value of a credit is clear and consistent. Fails when credits are opaque or redefined midstream. You can discount upfront bundles or roll unused credits forward, but be strict about transparency.
Multiple types of credits for different activities (e.g., daily vs. monthly, or integration vs. flow credits). Useful in complex products, but clarity drops fast beyond two credit types.
Simple, but risky for customers. Great for developers and hobby use; less ideal for enterprise where budgets need predictability.
The key is overage options — don’t surprise customers.
The 3 P’s of overage management:
Protection: Caps that limit how far over they can go.
Performance: Throttling. Let them exceed usage, but slow down performance instead of billing instantly.
Passive: Delay billing or true-up at renewal. Give customers time to adjust or upgrade before charging.
These small design choices dramatically reduce anxiety around consumption.
Fairness means something different to every buyer — usually “what’s fair to me.” Two practical tips:
Don’t oversell credits. If reps are paid on total contract value, they’ll push too many upfront and cause buyer remorse.
Use “first-year forgiveness.” Estimate usage in good faith. If they exceed it, don’t penalize them. Reset allocations at renewal once real patterns emerge. It builds trust.
A few factors to consider:
How much work does the product do vs. the user?
More human activity → tiers or simple unit pricing.
More autonomous product → credits or three-part tariff.
Predictability and complexity: infra and LLM APIs lean right (credit / usage); workflow SaaS leans left (tiers / flat). Most companies converge somewhere in the middle — hybrid models that blend predictability with scalability.
When setting unit prices, anchor on variable costs, not total costs. Don’t bury implementation or support costs in your usage rate — that leads to a death spiral as volume grows. AI economics can shift overnight as models get cheaper or workloads move to lower-cost engines. Keep variable margin flexibility.
For infra and dev tools, simplicity wins early. Three stages usually work:
Starter: Small free or hobby tier for testing (limited scale, no production).
Mid: Transparent tiering with visible overage rules — developers hate hidden fees.
Scale: Discounts or credits for long-term, high-volume usage.
Developers will churn if they detect excessive markup. A 10–20% markup on pass-through costs is tolerated; 40–60% is not.
My framework: the T-Model — focus on Transparency and Type.
Give tools to estimate and monitor usage. Buyers need predictability.
Publish fair-use policies and define caps clearly.
Let credit resets align with billing cycles (monthly resets for monthly plans; annual for annual).
Reward long-term commitments with either discounts or bonus usage.
Have indisputable definitions of what a unit or credit is — and what state counts (e.g., “completed action,” not “attempted request”).
Keep it under three types of units or credits. Clarity breaks down fast beyond that.
A common concern: “Doesn’t usage pricing kill value-based pricing?” It can, if you rely purely on the unit rate. That’s why you need a base access layer.
Differentiate in that layer — through features, data, services, or integrations — and keep your consumption pricing fair. The unit captures cost; the base captures differentiation.
Consumption pricing isn’t the enemy. Clumsy consumption is. The winners in AI pricing will manage risk, communicate clearly, and design fairness into their models.
Marcos Rivera: I post about this constantly — AI pricing, SaaS pricing, and the messy middle where the two meet. Follow along if you want to keep learning.