back to projects

Custom MapReduce Engine

A Python implementation of the MapReduce programming model from scratch, applied to large-scale document processing.

data engineeringsystems
Custom MapReduce Engine

Overview

Built entirely without Hadoop or Spark, this engine implements the core MapReduce paradigm in Python — including shuffle, sort, and reduce phases — and benchmarks it against MongoDB aggregation pipelines on real datasets.