< Back to Index
CollegeInfo-Agent: RAG Academic Assistant
[FastAPI][ChromaDB][LangChain][OpenAI/Gemini][React]
01_Context & Problem
Students and faculty struggle to find specific information buried in hundreds of unstructured PDF documents (syllabus, notices, calendars), leading to administrative bottlenecks.
02_Architecture & Design
Built a Retrieval-Augmented Generation (RAG) pipeline that ingests PDFs, chunks and embeds them into a local vector store (ChromaDB), and retrieves relevant context for an LLM to answer queries accurately with citations.
03_Key Technical Decisions
- Decision: Chosen ChromaDB for local, lightweight vector storage without cloud overhead.
- Decision: Implemented Multi-tenancy via College ID metadata tagging to isolate knowledge bases.
- Decision: Used Hybrid Search (Keyword + Semantic) to improve retrieval accuracy for specific terms (e.g., course codes).
04_Challenges & Resolutions
! WARNING: Hallucinations: Solved by strictly enforcing "Answer only from context" prompt rules.
! WARNING: Latency: Optimized embedding generation by using smaller, faster models for initial retrieval.
! WARNING: PDF Parsing: Handled complex table structures in syllabi using specialized extraction logic.