This a very popular for parallel processing of large data sets

  • originally designed by google
  • Two key functions: map() & reduce()

The map() function generates key-value pairs for intermediate partitions of large data sets that are to be processed in parallel

The reduce() function then uses the generate key pairs to merge the partitions together.