

Asked By : Beached Whale
Answered By : Sean Easter
[…] it says that Map() function emits <“hello”, 1> every time it sees hello where the reduce function counts the number of instances “hello” occurs
Not quite: It appears the mapper reads each file, counts the number of times a word appears, and outputs a single (word, count) pair per file, rather than per occurrence of the word. The reduce step then sums these. (“hello”, 1) indicates that “hello” appeared once in a given file, (“hello”, 3) indicates three appearances in a file, etc. In the example for the reduce step, it appears four files were mapped, and that “hello” appeared 3 times in the first, 5 in the second, etc.
Also, why do you need MapReduce to do this?
Via wiki MapReduce is “for processing parallelizable problems across huge datasets using a large number of computers[.]” Meaning, if your task is to count the number of times “hello” appears in four small documents, you likely don’t need MapReduce. But if your task is to count the appearances of all words that appear in a large set of documents, then the only way to accomplish this is a practically useful time may require distributing across multiple processors.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/33483