RAG Pipeline

A fully local, 100% free RAG (Retrieval Augmented Generation) app that lets you upload PDFs and chat with them using AI — no API keys, no cloud, runs entirely on your machine.

Technologies Used

PythonStreamlitLangChainChromaDBOllamaLlama 3.1HuggingFacesentence-transformersPyPDFLCEL

About the Project

RAG Pipeline is a fully offline, cost-free Retrieval Augmented Generation application built to demonstrate how modern AI systems answer questions from private documents without hallucinating or relying on cloud services. The app allows users to upload one or multiple PDF documents, which are automatically chunked, embedded, and stored in a local ChromaDB vector database. Users can then ask natural language questions and receive accurate, grounded answers powered by Llama 3.1 running locally via Ollama — with zero API costs. Key Features: - Upload and chat with multiple PDFs simultaneously - 100% local — no internet required after setup, no API keys, no cost - Persistent storage — documents remembered across restarts, no re-uploading - Source transparency — every answer shows exactly which chunks were used - Modular architecture — cleanly separated into config, RAG, vectorstore, and tracker modules - Multi-document search — queries search across all uploaded documents at once How It Works: 1. PDF is loaded and split into overlapping chunks of 1000 characters 2. Each chunk is converted to a 384-dimension vector using HuggingFace all-MiniLM-L6-v2 3. Vectors are stored persistently in ChromaDB on disk 4. On each question, the query is embedded and top 5 similar chunks are retrieved 5. Retrieved chunks + question are sent to Llama 3.1 via Ollama 6. LLM answers ONLY from the retrieved context — preventing hallucination Purpose: Built as a learning project to deeply understand the RAG architecture — from document ingestion and vector embeddings to semantic search and local LLM inference — using industry-standard tools like LangChain, ChromaDB, and Ollama.