Internal Custom Tools

RAG SYSTEM ARCHITECTURE · 2025

Document Chatbot
RAG Planner

Decision support for designing retrieval architectures and estimating ingestion plus operational costs before committing to implementation.
A vector database platform for semantic search and retrieval workloads. | L:Claude$43 / monthOpen implementation

Click any stack above to open a full-screen implementation popup with procedures, connection methods, and provider mix comparisons.

67%

Failure Rate Cut

Contextual + A mixed retrieval approach that combines multiple search signals instead of only one. + A second-pass model that reorders retrieved chunks so the most relevant ones are first.

93-97%

Peak Accuracy

Stack B on structured corpora

3.5x

Recall Gain

A mixed retrieval approach that combines multiple search signals instead of only one. vs Retrieval that uses only vector similarity, without keyword/BM25 scoring. search

Hours

Time to Production

Stack C (LlamaCloud managed)

The 3-Technique Foundation

Regardless of stack, these three techniques deliver a 67% reduction in retrieval failures.

  1. 1Combines lexical search and vector search so either signal can surface relevant content.
  2. 2Retrieval that enriches chunks and queries with context before search and ranking.
  3. 3A second-pass model that reorders retrieved chunks so the most relevant ones are first.

Image & Diagram Handling

A · OCR First

Azure document AI service for OCR, layout extraction, and structured field parsing. extracts text + layout metadata for enterprise compliance workflows.

Cost note: predictable page-based pricing.
B · Vision + Context

GPT-4o vision plus Retrieval that enriches chunks and queries with context before search and ranking. captures chart semantics and caption references.

Cost note: image-heavy corpora increase ingest costs fastest.
C · Managed Parse

LlamaIndex parsing service for complex PDFs, tables, and layout-aware extraction. handles complex PDFs and forms with automatic managed The process of splitting documents into smaller sections before indexing..

Cost note: free tier for first 1,000 pages each month.

Quick Stack Chooser

☁️ Stack A

Large regulated teams that need end-to-end governance inside Azure.

Accuracy95%
Setup1-2 days
Lock-inHigh Dependency on a single vendor that makes migration harder and slower.
FrameworkAzure AI Suite

🧩 Stack B

Accuracy-first teams willing to compose best-in-class providers.

Accuracy99.9%
Setup4-7 days
Lock-inLow Dependency on a single vendor that makes migration harder and slower.
FrameworkLangGraph / Custom

🦙 Stack C

Small teams that need managed The pipeline stage that parses, chunks, embeds, and indexes documents. and quick production rollout.

Accuracy94%
SetupHours
Lock-inMedium Dependency on a single vendor that makes migration harder and slower.
FrameworkLlamaIndex