питон - скрипт запуска примера hadoop

обобщённый скрипт для этой заметки:

rm -r /home/hduser/python/wordcount/out/pyoutput
hadoop fs -rm -R /user/hduser/pyoutput
bin/hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -file /home/hduser/python/wordcount/mapper.py -mapper "python /home/hduser/python/wordcount/mapper.py" -file /home/hduser/python/wordcount/reducer.py -reducer "python /home/hduser/python/wordcount/reducer.py" -input /user/hduser/myinput/* -output /user/hduser/pyoutput
hadoop fs -copyToLocal /user/hduser/pyoutput /home/hduser/python/wordcount/out

Установить число редукторов

Чтобы установить число редукторов достаточно добавить опцию -jobconf mapred.reduce.tasks, например:

-jobconf mapred.reduce.tasks=0

rm -r /home/hduser/python/wordcount/out/pyoutput
hadoop fs -rm -R /user/hduser/pyoutput
bin/hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -file /home/hduser/python/wordcount/mapper.py -mapper "python /home/hduser/python/wordcount/mapper.py" -file /home/hduser/python/wordcount/reducer.py -reducer "python /home/hduser/python/wordcount/reducer.py" -jobconf mapred.reduce.tasks=0 -input /user/hduser/myinput/* -output /user/hduser/pyoutput
hadoop fs -copyToLocal /user/hduser/pyoutput /home/hduser/python/wordcount/out