MapReduce Server

Time

March 2017

Specification

Click here to see specification.

Summary

  • Implement a MapReduce server in Python
  • Multi-process, Multi-threaded
  • Fault-tolearance
  • No detailed map and reduce algorithm implemented

Implement Design

Master

  • Three threads: Main master thread, heartbeat listening thread, fault tolerance thread
  • Fault-tolearance: Miss 5 pings (10 seconds) -> dead worker, no more used. Assign remaining tasks to other alive workers.
  • Master status: ready, map, group, reduce

Worker

  • Two threads: Main worker thread, heartbeat send thread
  • Worker status: ready, busy, dead

Notes

  • Most DS in Python are thread-safe, so no need to lock() and unlock()
  • Avoid removing dead worker from DS in case of concurrency problem (addressed in EECS482)
  • No reuse of sockets
  • We save grouper intermediate files in folder **var/job-x/grouper-worker-output**

Related Code

You can contact me if you want to see the actual implementation. I can add you to the private Github repo.