One of the most common challenges in RAG (Retrieval-Augmented Generation) systems is document quality in the knowledge base. Even the best AI model won’t find the right answer if the documents are poorly structured — text blocks that are too long, missing concrete data, sections that make no sense without the context of the whole document.
That’s why we’re introducing Preparing documents for the knowledge base — a tool that automatically formats your materials and scores them for AI retrieval quality.
The problem: not every document belongs in a knowledge base
Imagine uploading a company HR policy or an e-commerce T&Cs document to the knowledge base. It’s 15 pages of continuous text, no clear sections, no concrete contact details next to the relevant topics. When a customer asks “how many days of leave do I get?”, the system has to search the whole document and pick the right passage.
The problem is that:
- Large blocks of text are hard to retrieve precisely
- Without headings, there are no natural boundaries between fragments
- Answers scattered across a document force the system to stitch multiple fragments together
- Vague phrasing (“contact HR”) instead of specifics (“Anna Schmidt, anna@company.com, ext. 210”) lowers the value of the answer
The solution: automatic preparation and scoring
Q&A document generator
A new option “Optimise for RAG” is available in the add-document menu in the knowledge base. It works like this:
- You paste raw document text (company policy, T&Cs, FAQ, instruction)
- AI reformats it into a question–answer structure with numbered sections
- Real-time preview — you see how the document is being processed
- Save to the knowledge base with one click
What the generator does:
- Splits content into self-contained Q&A sections (80–150 words each)
- Adds numbered headings (### 1.1, ### 1.2) that create natural fragment boundaries
- Preserves all concrete data: amounts, dates, legal article numbers, contact details
- Repeats key information in every section so it’s understandable without context
- Detects the document’s language and responds in that language
Automatic RAG Score
Every document uploaded to the knowledge base is now automatically scored for AI-retrieval readiness. The score is visible as a colour label next to the file name:
- Green (70–100) — document well prepared for RAG
- Orange (40–69) — document needs improvement
- Red (0–39) — document poorly suited to retrieval
The score is based on five dimensions:
- Section split — whether the document has headings that create natural divisions
- Section size — whether sections are at optimal length (80–150 words)
- Entity density — how much concrete data (names, amounts, dates) the document contains
- Self-sufficiency — whether each section is understandable without reading the rest
- Q&A format — whether the document has a question–answer structure
Example: an e-commerce returns policy
We uploaded a sample returns and complaints policy (electronics retailer, 7 sections, ~30 questions). After optimisation the document scored RAG: 92/100.
Here are a few queries that show the power of a well-prepared document:
- “The washing machine doesn’t fit, they picked it up — what about the cost?” — hits the large-appliance returns section
- “Device broke down after a year and a half” — finds the nuance about burden of proof after 12 months
- “I bought an iPhone, signed in, want to return it” — identifies the Activation Lock issue
- “I returned a product bought with a voucher code — does the code come back?” — a precise answer from the T&Cs
The key point is that every answer contains concrete data: legal article numbers, deadlines, amounts and the contact details of the responsible person.
How to get started
- Open Knowledge base in the Ragen panel
- Click “Add document” and select “Optimise for RAG”
- Paste the document text and click “Generate”
- Check the preview and save to the knowledge base
Existing documents can be scored by clicking the menu (three dots) next to the file and selecting “Score for RAG”.
Document preparation is available to all Ragen AI users. If you have questions or suggestions, drop us a line at hello@ragen.ai.
