Build Your Own Quiz Master AI: A Complete Beginner's Guide to Local RAG Pipelines
Understand the five essential components of a Retrieval-Augmented Generation (RAG) pipeline and how they work together to make AI smarter, faster, and more reliable.

Hey Ragyfied community! Welcome back to our GenAI learning hub, where we're all about demystifying LLMs for everyone—from curious non-tech folks to solopreneurs bootstrapping AI tools. As of August 23, 2025, the local AI scene is hotter than ever, with breakthroughs like efficient SLMs making DIY projects more accessible. Inspired by Andrej Karpathy's recent X threads on practical RAG hacks and Greg Kamradt's tips on knowledge amplification, we're compiling our evolving tutorial into one seamless guide.
This isn't your run-of-the-mill "hello world"—we're building a Quiz Master AI, a real-world RAG (Retrieval-Augmented Generation) pipeline that turns your notes into custom quizzes. Perfect for studying, team training, or even startup pitches (quiz investors on your deck!). Why local? Privacy, zero costs, and full control—ideal for solopreneurs handling sensitive data or funded startups prototyping edtech apps.
By the end, you'll have a working prototype to experiment with, grasping RAG's core: Ingest docs, embed, store, retrieve, and generate. We'll keep it nuanced (chunking trade-offs, prompt tweaks) to encourage tinkering. Total time: 45-90 minutes. No GPU required, but Apple Silicon shines. Let's roll—step by step from setup to quiz time!
Step 1: Set Up Your Environment – Install Python and Ollama
Before coding, get the basics running. This works on macOS (Apple Silicon or Intel—same steps, thanks to universal binaries). For other OS? Adapt via official docs, but we'll focus here.
Install Python (3.8+ Recommended)
- Download: Head to python.org/downloads and grab the latest (e.g., 3.12+ as of now). The macOS installer is universal—no chip differences.
- Run Installer: Double-click the .pkg, follow prompts (install for all users, add to PATH).
- Verify: Open Terminal (Spotlight search: Terminal) and run:
Expect "Python 3.x.x". If not, useshell
python3 --version
python3
in commands.
Pro Tip: For AI work, pip is key—it's included. No extras needed yet.
Install Ollama (Local LLM Runner)
Ollama powers our generation step with models like Llama 3. It's cross-platform, efficient on M-series chips (faster inference via Metal), but fine on Intel too.
- Download: Visit ollama.com/download and get the macOS .dmg.
- Install: Mount the .dmg, drag Ollama.app to Applications.
- Launch: Open it—it runs in the background.
- Pull a Model: In Terminal:
(About 4-5GB; tryshell
ollama pull llama3.1:8b
phi3:mini
for lighter setups.) - Verify: Run
ollama run llama3.1:8b
and prompt "Hello!" to test.
Nuance for Beginners: Ollama's not the only option (LM Studio for GUIs, llama.cpp for speed), but it's simplest for RAG prototypes.
Install Python libs next:
pip install sentence-transformers faiss-cpu torch pypdf2 ollama
sentence-transformers
: For embeddings.faiss-cpu
: Vector store.pypdf2
: PDF handling (for real notes).- Others: Backends.
Try It Out: Run python3 -c "import sentence_transformers; print('Ready!')"
to confirm.
Step 2: Understand RAG Components – The Why Behind the Pipeline
RAG combines retrieval (finding information) with generation (LLM answers), grounding responses in your data to cut hallucinations.
Core Pieces:
- Ingestion & Embedding: Load notes, chunk into snippets, convert to vectors (semantic numbers via models like MiniLM).
- Vector Store: Index for fast searches (FAISS: Local, efficient).
- Retrieval: Query embeddings, fetch top matches (tune for relevance).
- Generation: Feed context + query to LLM (Ollama) for quizzes.
Nuance: Embeddings capture meaning (e.g., "France capital" matches "Paris is French hub"), but chunk size affects precision—too big = noise, too small = lost context.
For our Quiz Master: We'll ingest notes/PDFs, retrieve on topics, and generate quizzes. Create quiz_rag.py
for the code.
Step 3: Ingest Notes and Create Embeddings – Prep Your Knowledge Base
Start with chunking for quiz-friendly snippets. Use a sample notes.txt
or PDF—then swap yours!
Add to quiz_rag.py
:
from sentence_transformers import SentenceTransformer
from PyPDF2 import PdfReader
import faiss
import numpy as np
import ollama
# Load embedder (MiniLM: Fast; try 'all-mpnet-base-v2' for accuracy, but slower)
embedder = SentenceTransformer('all-MiniLM-L6-v2')
# Ingest function: Handles TXT/PDF, chunks with overlap
def ingest_notes(file_path):
if file_path.endswith('.pdf'):
reader = PdfReader(file_path)
text = ''.join(page.extract_text() for page in reader.pages if page.extract_text())
else:
with open(file_path, 'r') as f:
text = f.read()
# Chunk: ~800 chars, 200 overlap (tweak for your notes!)
chunk_size = 800
overlap = 200
chunks = []
for i in range(0, len(text), chunk_size - overlap):
chunks.append(text[i:i + chunk_size])
return chunks
# Your notes here! E.g., history_notes.txt or startup_pitch.pdf
documents = ingest_notes('your_notes.txt')
embeddings = embedder.encode(documents)
embeddings = np.array(embeddings).astype('float32')
print(f"Embedded {len(documents)} chunks!")
Hands-On Encouragement: Create your_notes.txt
with study material (e.g., AI basics). Run this section: python3 quiz_rag.py
. Tweak chunk_size—smaller for precise quizzes.
Gotcha: PDFs with images? Text extraction skips them; focus on content-rich files.
Step 4: Build the Vector Store – Index for Speedy Searches
FAISS stores vectors; we use cosine-like indexing for semantic oomph.
Add:
dimension = embeddings.shape[1]
index = faiss.IndexFlatIP(dimension) # Inner Product: Better for relevance
index.add(embeddings)
faiss.write_index(index, 'quiz_index.faiss') # Save for quick reloads
Tip: For huge notes, FAISS scales well locally. Reload with faiss.read_index('quiz_index.faiss')
next time—saves time!
Step 5: Implement Retrieval – Find Quiz-Worthy Context
Retrieve top chunks, filter weak matches to avoid fluff.
Add:
def retrieve(query, top_k=5, threshold=0.4):
query_embedding = embedder.encode([query])[0].astype('float32')
distances, indices = index.search(np.array([query_embedding]), top_k)
results = [(documents[i], distances[0][j]) for j, i in enumerate(indices[0]) if distances[0][j] > threshold]
return [doc for doc, score in results] or ["No matches—expand your notes!"]
Step 6: Generate Quizzes – The Fun Augmentation Step
LLM crafts questions from context. Prompt engineering adds variety.
Add:
def generate_quiz(context, num_questions=5, style="multiple-choice"):
if not context:
return "No context found—try another topic!"
prompt = f"Context: {' '.join(context)}\n\nCreate {num_questions} {style} quiz questions. For multiple-choice: 4 options, answer key at end. Make it engaging for learners!"
response = ollama.generate(model='llama3.1:8b', prompt=prompt)
return response['response']
# Full flow: Your query here!
query = "Key Python features" # Swap for your topic
retrieved = retrieve(query)
quiz = generate_quiz(retrieved, num_questions=4, style="true-false")
print("Retrieved:", retrieved)
print("\nQuiz:\n", quiz)
Make It Yours: Run the whole script! Tweak style to "open-ended" or prompt with "Tailor for solopreneurs." Compare with/without context—see RAG's anti-hallucination power.
Try other models just for your curiosity later: ollama pull gemma2:9b
.
Step 7: Full Script and Experimentation – Put It All Together
Here's the complete quiz_rag.py
—copy, paste, run!
from sentence_transformers import SentenceTransformer
from PyPDF2 import PdfReader
import faiss
import numpy as np
import ollama
embedder = SentenceTransformer('all-MiniLM-L6-v2')
def ingest_notes(file_path):
if file_path.endswith('.pdf'):
reader = PdfReader(file_path)
text = ''.join(page.extract_text() for page in reader.pages if page.extract_text())
else:
with open(file_path, 'r') as f:
text = f.read()
chunk_size = 800
overlap = 200
chunks = []
for i in range(0, len(text), chunk_size - overlap):
chunks.append(text[i:i + chunk_size])
return chunks
documents = ingest_notes('your_notes.txt')
embeddings = embedder.encode(documents)
embeddings = np.array(embeddings).astype('float32')
dimension = embeddings.shape[1]
index = faiss.IndexFlatIP(dimension)
index.add(embeddings)
def retrieve(query, top_k=5, threshold=0.4):
query_embedding = embedder.encode([query])[0].astype('float32')
distances, indices = index.search(np.array([query_embedding]), top_k)
results = [(documents[i], distances[0][j]) for j, i in enumerate(indices[0]) if distances[0][j] > threshold]
return [doc for doc, score in results] or ["No matches—expand your notes!"]
def generate_quiz(context, num_questions=5, style="multiple-choice"):
if not context:
return "No context found—try another topic!"
prompt = f"Context: {' '.join(context)}\n\nCreate {num_questions} {style} quiz questions. For multiple-choice: 4 options, answer key at end. Make it engaging for learners!"
response = ollama.generate(model='llama3.1:8b', prompt=prompt)
return response['response']
query = "Retrieval Augmented generation" # E.g., "AI startup trends"
retrieved = retrieve(query)
quiz = generate_quiz(retrieved)
print("Retrieved:", retrieved)
print("\nQuiz:\n", quiz)
Github code link
See the code here - https://github.com/ambikaiyer29/ragyfied-tutorials/tree/main/build-your-local-rag
Troubleshooting for Newbies:
- Errors? Check file paths, imports, or restart Ollama.
- Slow generation? Use smaller models or fewer questions.
- No quizzes? Broaden query or add notes.
- Need more help - shoot us a mail here : algocattech@gmail.com
Next Steps and Real-World Twists – Keep Building!
- Experiments: Add a loop for multi-topic quizzes; integrate Streamlit (
pip install streamlit
) for a web UI—upload notes, get quizzes via browser! - DIY Ideas: Solopreneurs—quiz your business plans. Funded startups—train teams on product docs. Non-tech? Quiz family on recipes!
- Scale Nuance: For big datasets, explore hybrid search or cloud vector stores (but stay local for privacy).
Built your Quiz Master? Post results on X and tag us. This pipeline's your gateway to GenAI mastery—dive in, tweak, and share!
Reach us at algocattech@gmail.com for anyq questions or feedback.
Stay Ragyfied!