What does an AI assistant really cost? Run the numbers before you decide.
← Blog
AI Models

What does an AI assistant really cost? Run the numbers before you decide.

AI deployment cost isn't the price of a single model call. This calculator shows the full RAG pipeline — embeddings, rerank, generation — in USD or EUR.

One of the first questions we hear in conversations about rolling out an AI assistant: “How much will this cost per month?”. And one of the most common answers companies get from other vendors: “It depends.”

That’s true — cost depends on the model, the number of queries, the size of the knowledge base. But “it depends” doesn’t help a CFO approve a budget. So we built a calculator that gives a concrete answer: how much the full RAG pipeline costs for your company, per month, in your currency.

Open the calculator →

Why “token calculators” aren’t enough

Most calculators you’ll find online compute the cost of a single model call. You enter a token count, multiply by the price, get a number. The problem is that in a real AI deployment a single user question is not a single model call.

Every query hitting the assistant kicks off several stages in the background. Each of them has a cost. If you only count the last one, three months in you get an invoice two or three times larger than your forecast. Unpleasant surprise.

That’s why our calculator shows the whole pipeline, not just one slice of it.

What makes up the cost of a single query

When your employee asks the AI assistant “What’s our returns procedure for orders from Germany?”, the system runs, in order:

  1. Question reformulation (multi-query). The question is expanded into several alternative phrasings to search the knowledge base more effectively. “DE returns procedure”, “foreign return process”, “how to return from Germany” — each of these wordings increases the chance of hitting the right document.

  2. Embeddings. The query text is converted into vectors that allow documents to be found by semantic similarity, not just keyword match.

  3. Reranking. The retrieved fragments are re-evaluated by a more precise model that picks the most relevant ones. That’s the difference between “probably matches” and “definitely matches”.

  4. Answer generation. The main LLM (Claude, GPT, Gemini, or an open-source model) composes the answer from the best fragments in the knowledge base. This is the part the user sees.

  5. Document summaries. At indexing time each page of a new document gets a summary — it helps the assistant understand what a given document is about as a whole. A one-off cost, paid once per new document.

The calculator takes all of these stages into account and shows the total cost — per query, per month, per year.

How to run the numbers for your company

Two inputs are enough:

  • How many queries per month your employees or customers will ask
  • How many pages of new documents you add to the knowledge base per month

You pick a chat model (e.g. Claude Sonnet 4.6, GPT-5.4, Gemini 3 Flash), pick a currency (USD or EUR at the current ECB rate), and you immediately see the cost of the full pipeline.

For advanced users there’s a detailed mode — it lets you pick a separate model for each stage (embeddings, rerank, chat) and see an exact cost breakdown. That helps when you want to optimise the budget: a more expensive model for answer generation, a cheaper one for reformulations.

Run the numbers for your company →

European open-source providers — a cheaper, GDPR-native alternative

In addition to the global players (OpenAI, Anthropic, Google), the calculator includes European providers that host open-source models:

  • Scaleway (Paris) — Qwen 3, Llama 3.3, Mistral Small, DeepSeek R1
  • OVH AI Endpoints (Gravelines) — Llama 3.3, Mistral Small, DeepSeek R1

For European companies, this is a real alternative. Data is processed in the EU, GDPR compliance is built-in, and costs can be up to ten times lower than frontier models from US providers.

For many scenarios — a customer-support chatbot, an internal FAQ assistant, a complaints-handling assistant — open-source from a European provider is more than enough. You save flagship Claude or GPT models for tasks where answer quality decides everything: contract analysis, complex reports, VIP customer service.

On-premise vs Ragen Cloud — what the calculator covers

The calculator shows AI model costs for on-premise deployments. That’s the scenario in which your company pays AI API vendors directly for every processed token. You see everything plainly: each stage, each model, each cost.

If you’re on Ragen Cloud, it’s different.

In Ragen Cloud, model costs are included in the subscription.

You don’t count tokens, you don’t manage agreements with AI vendors, you don’t get separate invoices from OpenAI or Anthropic. One monthly fee, end of story.

The calculator is therefore especially useful if:

  • you’re considering on-premise and want to estimate real model costs
  • you’re comparing your own infrastructure costs with the Ragen Cloud subscription
  • you’re looking to optimise the AI budget of an existing deployment
  • you’re preparing a business case for the board or CFO

What you get after a few seconds

Once you enter the numbers and pick a model, the calculator shows:

  • Cost per query — so you understand unit economics
  • Monthly cost — for operating budget
  • Annual cost — for the annual financial forecast
  • Breakdown across pipeline stages — so you see where the money actually goes
  • Model comparison — detailed mode lets you swap models and see the difference in the bill instantly

This isn’t a marketing tool. It’s a decision-making tool.

Open the AI cost calculator →


Questions about the results, or want to discuss a deployment for your company? Book a 30-min fit call — in 30 minutes we’ll show you exactly where the savings sit for your case.

Before booking, you can also check whether Ragen is for you — customer profiles and a readiness checklist are there.