Project
Tiny Search Engine
Crawler, indexer, and query engine for 15K+ pages with efficient memory & I/O management
CDataStructuresMemoryAlgorithms

I implemented a miniature search engine in C with three components—Crawler, Indexer, Querier—focused on correctness, memory safety, and performance.
Backstory
This project was a deep dive into IR fundamentals: fetching pages, building an index, and answering free‑text queries efficiently without the crutch of high‑level libraries.
Architecture
- Crawler: respectful fetcher with URL normalization, deduplication, and domain scoping.
- Indexer: inverted index over tokens with document frequencies; compact in‑memory representation plus on‑disk persistence.
- Querier: tokenizes and normalizes queries, ranks results using index statistics, and prints annotated matches.
Engineering Practices
- Defensive programming throughout; all allocations checked, and errors surfaced with clear codes.
- Valgrind‑driven iteration to eliminate leaks and undefined behavior.
- Tight inner loops and careful data layout to reduce cache misses on lookups.
Performance
- Optimized tokenization and thread pooling, reducing average query latency from 30s → 0.8s.
- Implemented efficient memory management and I/O optimizations for handling 15K+ pages.
- Bounded memory and streaming writes during indexing to handle larger inputs gracefully.
Project Log
No log entries yet. I’ll share stories, insights, and progress notes here.