< Back to Index

CollegeInfo-Agent: RAG Academic Assistant

[FastAPI][ChromaDB][LangChain][OpenAI/Gemini][React]

01_Context & Problem

Students and faculty struggle to find specific information buried in hundreds of unstructured PDF documents (syllabus, notices, calendars), leading to administrative bottlenecks.

02_Architecture & Design

> Ingestion Service: PDF text extraction -> Chunking -> Embedding
> Vector Store: ChromaDB (Local persistence)
> Retrieval: Semantic search with metadata filtering (College ID)
> Generation: LLM (Gemini/OpenAI) grounded in retrieved context
> Frontend: React Admin Dashboard for document management

Built a Retrieval-Augmented Generation (RAG) pipeline that ingests PDFs, chunks and embeds them into a local vector store (ChromaDB), and retrieves relevant context for an LLM to answer queries accurately with citations.

03_Key Technical Decisions

  • Decision: Chosen ChromaDB for local, lightweight vector storage without cloud overhead.
  • Decision: Implemented Multi-tenancy via College ID metadata tagging to isolate knowledge bases.
  • Decision: Used Hybrid Search (Keyword + Semantic) to improve retrieval accuracy for specific terms (e.g., course codes).

04_Challenges & Resolutions

! WARNING: Hallucinations: Solved by strictly enforcing "Answer only from context" prompt rules.
! WARNING: Latency: Optimized embedding generation by using smaller, faster models for initial retrieval.
! WARNING: PDF Parsing: Handled complex table structures in syllabi using specialized extraction logic.
[View Source Code on GitHub]