MapReduce Server


March 2017


Click here to see specification.


  • Implement a MapReduce server in Python
  • Multi-process, Multi-threaded
  • Fault-tolearance
  • No detailed map and reduce algorithm implemented

Implement Design


  • Three threads: Main master thread, heartbeat listening thread, fault tolerance thread
  • Fault-tolearance: Miss 5 pings (10 seconds) -> dead worker, no more used. Assign remaining tasks to other alive workers.
  • Master status: ready, map, group, reduce


  • Two threads: Main worker thread, heartbeat send thread
  • Worker status: ready, busy, dead


  • Most DS in Python are thread-safe, so no need to lock() and unlock()
  • Avoid removing dead worker from DS in case of concurrency problem (addressed in EECS482)
  • No reuse of sockets
  • We save grouper intermediate files in folder **var/job-x/grouper-worker-output**

Related Code

You can contact me if you want to see the actual implementation. I can add you to the private Github repo.

%d bloggers like this: