CollegeInfo-Agent: RAG Academic Assistant

[FastAPI][ChromaDB][LangChain][OpenAI/Gemini][React]

01_Context & Problem

Students and faculty struggle to find specific information buried in hundreds of unstructured PDF documents (syllabus, notices, calendars), leading to administrative bottlenecks.

02_Architecture & Design

Built a Retrieval-Augmented Generation (RAG) pipeline that ingests PDFs, chunks and embeds them into a local vector store (ChromaDB), and retrieves relevant context for an LLM to answer queries accurately with citations.

03_Key Technical Decisions

Decision: Chosen ChromaDB for local, lightweight vector storage without cloud overhead.
Decision: Implemented Multi-tenancy via College ID metadata tagging to isolate knowledge bases.
Decision: Used Hybrid Search (Keyword + Semantic) to improve retrieval accuracy for specific terms (e.g., course codes).

04_Challenges & Resolutions

! WARNING: Hallucinations: Solved by strictly enforcing "Answer only from context" prompt rules.

! WARNING: Latency: Optimized embedding generation by using smaller, faster models for initial retrieval.

! WARNING: PDF Parsing: Handled complex table structures in syllabi using specialized extraction logic.

[View Source Code on GitHub]