TutorialRAGLLMsVector DatabasesAI BasicsLangChainAI for beginners

Build Your Own Quiz Master AI: A Complete Beginner's Guide to Local RAG Pipelines

Understand the five essential components of a Retrieval-Augmented Generation (RAG) pipeline and how they work together to make AI smarter, faster, and more reliable.

10 min read
Build Your Own Quiz Master AI: A Complete Beginner's Guide to Local RAG Pipelines

This tutorial walks through building a local RAG (Retrieval-Augmented Generation) pipeline from scratch, using only tools that run entirely on your machine. By the end you will have a working system that ingests documents, creates embeddings, stores them in a vector database, and generates grounded answers through a locally-running LLM. No GPU required for the core pipeline; Apple Silicon runs everything smoothly.

The project is a Quiz Master AI: feed it your notes or a PDF, and it generates quiz questions from the actual content.

Step 1: Set Up Your Environment – Install Python and Ollama

Before coding, get the basics running. This works on macOS (Apple Silicon or Intel—same steps, thanks to universal binaries). For other OS? Adapt via official docs, but we'll focus here.

Install Python (3.8+ Recommended)

  1. Download: Head to python.org/downloads and grab the latest (e.g., 3.12+ as of now). The macOS installer is universal—no chip differences.
  2. Run Installer: Double-click the .pkg, follow prompts (install for all users, add to PATH).
  3. Verify: Open Terminal (Spotlight search: Terminal) and run:
    shell
    python3 --version
    Expect "Python 3.x.x". If not, use python3 in commands.

Pro Tip: For AI work, pip is key—it's included. No extras needed yet.

Install Ollama (Local LLM Runner)

Ollama powers our generation step with models like Llama 3. It's cross-platform, efficient on M-series chips (faster inference via Metal), but fine on Intel too.

  1. Download: Visit ollama.com/download and get the macOS .dmg.
  2. Install: Mount the .dmg, drag Ollama.app to Applications.
  3. Launch: Open it—it runs in the background.
  4. Pull a Model: In Terminal:
    shell
    ollama pull llama3.1:8b
    (About 4-5GB; try phi3:mini for lighter setups.)
  5. Verify: Run ollama run llama3.1:8b and prompt "Hello!" to test.

Nuance for Beginners: Ollama's not the only option (LM Studio for GUIs, llama.cpp for speed), but it's simplest for RAG prototypes.

Install Python libs next:

pip install sentence-transformers faiss-cpu torch pypdf ollama
  • sentence-transformers: For embeddings.
  • faiss-cpu: Vector store.
  • pypdf: PDF handling (for real notes).
  • Others: Backends.

Try It Out: Run python3 -c "import sentence_transformers; print('Ready!')" to confirm.

Step 2: Understand RAG Components – The Why Behind the Pipeline

RAG combines retrieval (finding information) with generation (LLM answers), grounding responses in your data to cut hallucinations.

Core Pieces:

  1. Ingestion & Embedding: Load notes, chunk into snippets, convert to vectors (semantic numbers via models like MiniLM).
  2. Vector Store: Index for fast searches (FAISS: Local, efficient).
  3. Retrieval: Query embeddings, fetch top matches (tune for relevance).
  4. Generation: Feed context + query to LLM (Ollama) for quizzes.

Nuance: Embeddings capture meaning (e.g., "France capital" matches "Paris is French hub"), but chunk size affects precision—too big = noise, too small = lost context.

For our Quiz Master: We'll ingest notes/PDFs, retrieve on topics, and generate quizzes. Create quiz_rag.py for the code.

Step 3: Ingest Notes and Create Embeddings – Prep Your Knowledge Base

Start with chunking for quiz-friendly snippets. Use a sample notes.txt or PDF—then swap yours!

Add to quiz_rag.py:

python
from sentence_transformers import SentenceTransformer from pypdf import PdfReader import faiss import numpy as np import ollama # Load embedder (MiniLM: Fast; try 'all-mpnet-base-v2' for accuracy, but slower) embedder = SentenceTransformer('all-MiniLM-L6-v2') # Ingest function: Handles TXT/PDF, chunks with overlap def ingest_notes(file_path): if file_path.endswith('.pdf'): reader = PdfReader(file_path) text = ''.join(page.extract_text() for page in reader.pages if page.extract_text()) else: with open(file_path, 'r') as f: text = f.read() # Chunk: ~800 chars, 200 overlap (tweak for your notes!) chunk_size = 800 overlap = 200 chunks = [] for i in range(0, len(text), chunk_size - overlap): chunks.append(text[i:i + chunk_size]) return chunks # Your notes here! E.g., history_notes.txt or startup_pitch.pdf documents = ingest_notes('your_notes.txt') embeddings = embedder.encode(documents) embeddings = np.array(embeddings).astype('float32') print(f"Embedded {len(documents)} chunks!")

Hands-On Encouragement: Create your_notes.txt with study material (e.g., AI basics). Run this section: python3 quiz_rag.py. Tweak chunk_size—smaller for precise quizzes.

Gotcha: PDFs with images? Text extraction skips them; focus on content-rich files.

Step 4: Build the Vector Store – Index for Speedy Searches

FAISS stores vectors; we use cosine-like indexing for semantic oomph.

Add:

python
dimension = embeddings.shape[1] index = faiss.IndexFlatIP(dimension) # Inner Product: Better for relevance index.add(embeddings) faiss.write_index(index, 'quiz_index.faiss') # Save for quick reloads

Tip: For huge notes, FAISS scales well locally. Reload with faiss.read_index('quiz_index.faiss') next time—saves time!

Step 5: Implement Retrieval – Find Quiz-Worthy Context

Retrieve top chunks, filter weak matches to avoid fluff.

Add:

python
def retrieve(query, top_k=5, threshold=0.4): query_embedding = embedder.encode([query])[0].astype('float32') distances, indices = index.search(np.array([query_embedding]), top_k) results = [(documents[i], distances[0][j]) for j, i in enumerate(indices[0]) if distances[0][j] > threshold] return [doc for doc, score in results] or ["No matches—expand your notes!"]

Step 6: Generate Quizzes – The Fun Augmentation Step

LLM crafts questions from context. Prompt engineering adds variety.

Add:

python
def generate_quiz(context, num_questions=5, style="multiple-choice"): if not context: return "No context found—try another topic!" prompt = f"Context: {' '.join(context)}\n\nCreate {num_questions} {style} quiz questions. For multiple-choice: 4 options, answer key at end. Make it engaging for learners!" response = ollama.generate(model='llama3.1:8b', prompt=prompt) return response['response'] # Full flow: Your query here! query = "Key Python features" # Swap for your topic retrieved = retrieve(query) quiz = generate_quiz(retrieved, num_questions=4, style="true-false") print("Retrieved:", retrieved) print("\nQuiz:\n", quiz)

Make It Yours: Run the whole script! Tweak style to "open-ended" or prompt with "Tailor for solopreneurs." Compare with/without context—see RAG's anti-hallucination power. Try other models just for your curiosity later: ollama pull gemma2:9b.

Step 7: Full Script and Experimentation – Put It All Together

Here's the complete quiz_rag.py—copy, paste, run!

python
from sentence_transformers import SentenceTransformer from pypdf import PdfReader import faiss import numpy as np import ollama embedder = SentenceTransformer('all-MiniLM-L6-v2') def ingest_notes(file_path): if file_path.endswith('.pdf'): reader = PdfReader(file_path) text = ''.join(page.extract_text() for page in reader.pages if page.extract_text()) else: with open(file_path, 'r') as f: text = f.read() chunk_size = 800 overlap = 200 chunks = [] for i in range(0, len(text), chunk_size - overlap): chunks.append(text[i:i + chunk_size]) return chunks documents = ingest_notes('your_notes.txt') embeddings = embedder.encode(documents) embeddings = np.array(embeddings).astype('float32') dimension = embeddings.shape[1] index = faiss.IndexFlatIP(dimension) index.add(embeddings) def retrieve(query, top_k=5, threshold=0.4): query_embedding = embedder.encode([query])[0].astype('float32') distances, indices = index.search(np.array([query_embedding]), top_k) results = [(documents[i], distances[0][j]) for j, i in enumerate(indices[0]) if distances[0][j] > threshold] return [doc for doc, score in results] or ["No matches—expand your notes!"] def generate_quiz(context, num_questions=5, style="multiple-choice"): if not context: return "No context found—try another topic!" prompt = f"Context: {' '.join(context)}\n\nCreate {num_questions} {style} quiz questions. For multiple-choice: 4 options, answer key at end. Make it engaging for learners!" response = ollama.generate(model='llama3.1:8b', prompt=prompt) return response['response'] query = "Retrieval Augmented generation" # E.g., "AI startup trends" retrieved = retrieve(query) quiz = generate_quiz(retrieved) print("Retrieved:", retrieved) print("\nQuiz:\n", quiz)

Github code link

See the code here - https://github.com/ambikaiyer29/ragyfied-tutorials/tree/main/build-your-local-rag

Troubleshooting for Newbies:

  • Errors? Check file paths, imports, or restart Ollama.
  • Slow generation? Use smaller models or fewer questions.
  • No quizzes? Broaden query or add notes.
  • Need more help - shoot us a mail here : algocattech@gmail.com

Next Steps and Real-World Twists – Keep Building!

  • Experiments: Add a loop for multi-topic quizzes; integrate Streamlit (pip install streamlit) for a web UI—upload notes, get quizzes via browser!
  • DIY Ideas: Solopreneurs—quiz your business plans. Funded startups—train teams on product docs. Non-tech? Quiz family on recipes!
  • Scale Nuance: For big datasets, explore hybrid search or cloud vector stores (but stay local for privacy).

Built your Quiz Master? Post results on X and tag us. This pipeline's your gateway to GenAI mastery—dive in, tweak, and share!

Reach us at algocattech@gmail.com for anyq questions or feedback.

Stay Ragyfied!

Related Articles