This a very popular for parallel processing of large data sets
- originally designed by google
- Two key functions:
map()&reduce()
The map() function generates key-value pairs for intermediate partitions of large data sets that are to be processed in parallel
The reduce() function then uses the generate key pairs to merge the partitions together.