We Built a Dashboard to Tame LLM Costs — And It Changed How We Practice

There is a moment in every yoga class when a student stops trying to force a pose and simply notices what the body is doing. The hip is tight. The breath is shallow. The shoulder is compensating. That moment of awareness — non-judgmental, precise — is where change begins. Not in the forcing. In the noticing.

We built the AI Model Pricing Dashboard for exactly the same reason. Not because we wanted to spend more on Large Language Models, but because we could not tell you — honestly, precisely — what we were already spending. Nobody on the team knew. The bills came in, and nobody could answer the simple question: which models are we actually using, and at what cost per provider?

The mess under the hood

If you've run AI workloads across multiple providers, you know the pain. OpenRouter charges per-token with one rate. OpenAI separates input and output tokens at different prices. Anthropic has its own structure. Kilo AI Gateway layers on subscription plans with overage math. Every provider counts tokens differently. Every provider changes their pricing sheet without a release note.

It is a bit like trying to practice yoga by reading five different books in five different languages, none of which agree on what "Downward Dog" means. You can muddle through. But you will compensate. And compensation has a cost — paid in money on the infrastructure side, paid in alignment on the mat.

What the dashboard does

The AI Model Pricing Dashboard is an open-source tool — BSD 3-Clause, MIT-friendly — built with a FastAPI backend and a Next.js frontend. It aggregates pricing across 10+ LLM providers, normalizes everything to dollars per million tokens, and gives you a single, filterable, sortable table that makes comparison effortless.

But the real value is not the table. The real value is what the table reveals:

Input vs. output token math. Most teams think in terms of "per-call cost." But modern LLM usage is dominated by output tokens — the model's responses. A model that charges $2 per million input tokens but $15 per million output tokens has a completely different cost profile than one that charges $5 and $6 respectively. The dashboard makes this visible instantly.
Context window context. A model with a 128K context window might seem more expensive per-token than one with a 4K window. But if you are currently chaining multiple calls to a cheaper model to handle long prompts, the expensive model with the large context window is actually cheaper — and produces better results.
Trend tracking. Pricing changes. The dashboard stores 30-day price history. That means you see provider price moves before they hit your next invoice.
Kilo plan projection. If you are a Kilo AI Gateway user, the dashboard maps actual usage against each tier and tells you which plan minimizes your total cost including overages.

The architecture underneath

The system runs on Postgres and Redis, with Kubernetes manifests for production deployment. The FastAPI backend handles pricing aggregation with a 900-second Redis cache — fast enough for real-time dashboard use, gentle enough on provider APIs. CronJobs refresh pricing every 15 minutes and send daily email reports at 9 AM UTC.

For local development, the entire stack spins up with docker-compose.yaml — Postgres, Redis, the API, and the Next.js frontend — in under two minutes.

We went with Kubernetes not because the dashboard needs massive scale, but because the deployment pattern should be boring. Traefik for ingress, cert-manager for TLS, horizontal pod autoscaling for the API and web replicas. Boring is reliable. Reliable is what you want when this tool is informing budget decisions.

Where yoga meets the dashboard

Here is the part no architecture blog will tell you: cost optimization and yoga share the same first step.

In yoga, we say awareness precedes change. You cannot correct a misaligned hip if you do not know it is misaligned. You cannot soften a chronically gripped jaw if you have never noticed it was tight. The noticing is the practice. The noticing is where agency begins.

The dashboard exists because we noticed. For the first time, we had a precise, honest picture of what our AI workloads actually cost — not an estimate, not a vague sense that "it seems expensive," but a number. And once you have a number, you can make a decision. Do we switch providers for this workload? Do we use a cheaper model for non-critical tasks? Do we cache more aggressively? Do we pick the model with the larger context window and eliminate call chaining?

None of those decisions were visible before. Not because the data did not exist, but because it was scattered across five provider dashboards, three billing pages, and a spreadsheet that nobody updated.

We built the scraper. We built the normalizer. We built the dashboard. And the first week we used it, we found a $40/month inefficiency that had been running for four months. That is $160 we spent because we were not looking. The dashboard cost less than an evening to set up.

What we are learning from the data

Since deploying the dashboard, three things have changed in how we operate:

Rightsizing by task. We now assign specific models to specific tasks based on actual cost-per-task, not brand preference. High-repetition, low-complexity tasks go to cheaper, faster models. Complex reasoning tasks go where quality justifies the price. This is not a new idea — it is the same principle as choosing a restorative practice versus a power flow based on what your body actually needs that day.

Caching as pranayama. In yoga, pranayama (breath control) is the practice of making each breath count — less volume, more intention, better oxygen exchange. Caching LLM responses is the same concept. If the same prompt produces the same output, you store it. You do not pay for the same computation twice. The dashboard's Redis cache with a 900-second TTL has reduced our repeat-token spend by roughly 30%. Breathing less, getting more.

Trend awareness. Tracking pricing trends over 30 days means we see provider price drops before we see blog posts about them. When a provider drops their rate by 20%, we know within the hour. When a provider raises rates, we can re-evaluate before the next billing cycle. Awareness. Agency. Decision.

Open source, because hoarding tools is bad yoga

The project is MIT-licensed on GitHub: github.com/andreab67/ai-models-pricing. The Kubernetes manifests are included. The API documentation is in FUNCTIONAL.md. The SBOM — every dependency, its license, its security notes — is in SBOM.md.

We open-sourced this because the problem is not ours alone. Every team running multi-provider AI workloads has this problem. And hoarding useful tools is, to use a yoga analogy, like refusing to teach a student a pose that would help them. The practice is not diminished by sharing. It is deepened.

One move for this week

If your team runs LLM workloads and you cannot immediately answer "what are we spending per provider per task" — you are compensating. Not badly. Just expensively.

Set up the dashboard locally. It takes two minutes with Docker. Run it. Look at the numbers. Let the numbers tell you something you didn't notice.

Then decide.

That is the same practice we do on the mat: arrive, notice, choose. The tools change. The principle does not.

The AI Model Pricing Dashboard is open source at github.com/andreab67/ai-models-pricing. The production instance runs on our Kubernetes cluster. If you want help deploying it or integrating it with your own cost-alerting pipeline, reach out.