gnupack+CDH4のhadoop-streamingを実行したところ、OutOfMemoryErrorが出た。
$ hadoop jar "C:\tool\hadoop\hadoop-2.0.0-cdh4.2.0\share\hadoop\tools\lib\hadoop-streaming-2.0.0-cdh4.2.0.jar" -mapper cat -reducer cat -input input -output output cygpath: can't convert empty path 13/03/24 23:21:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/03/24 23:21:56 WARN conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id 13/03/24 23:21:56 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 13/03/24 23:21:56 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 13/03/24 23:21:57 INFO mapred.FileInputFormat: Total input paths to process : 2 13/03/24 23:21:57 INFO mapreduce.JobSubmitter: number of splits:2 13/03/24 23:21:57 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar 13/03/24 23:21:57 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 13/03/24 23:21:57 WARN conf.Configuration: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class 13/03/24 23:21:57 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name 13/03/24 23:21:57 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 13/03/24 23:21:57 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 13/03/24 23:21:57 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 13/03/24 23:21:57 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 13/03/24 23:21:57 WARN conf.Configuration: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class 13/03/24 23:21:57 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 13/03/24 23:21:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local874722513_0001 13/03/24 23:21:57 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 13/03/24 23:21:57 INFO mapreduce.Job: Running job: job_local874722513_0001 13/03/24 23:21:57 INFO mapred.LocalJobRunner: OutputCommitter set in config null 13/03/24 23:21:57 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 13/03/24 23:21:57 INFO mapred.LocalJobRunner: Waiting for map tasks 13/03/24 23:21:57 INFO mapred.LocalJobRunner: Starting task: attempt_local874722513_0001_m_000000_0 13/03/24 23:21:57 INFO mapred.Task: Using ResourceCalculatorPlugin : null 13/03/24 23:21:58 INFO mapred.MapTask: Processing split: file:/C:/tool/gnupack/home/sample/input/in1.txt:0+31 13/03/24 23:21:58 INFO mapred.MapTask: numReduceTasks: 1 13/03/24 23:21:58 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 13/03/24 23:21:58 INFO mapred.LocalJobRunner: Starting task: attempt_local874722513_0001_m_000001_0 13/03/24 23:21:58 INFO mapred.Task: Using ResourceCalculatorPlugin : null 13/03/24 23:21:58 INFO mapred.MapTask: Processing split: file:/C:/tool/gnupack/home/sample/input/in2.txt:0+18 13/03/24 23:21:58 INFO mapred.MapTask: numReduceTasks: 1 13/03/24 23:21:58 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 13/03/24 23:21:58 INFO mapred.LocalJobRunner: Map task executor complete. 13/03/24 23:21:58 WARN mapred.LocalJobRunner: job_local874722513_0001 java.lang.Exception: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:399) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:949) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:389) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 13/03/24 23:21:58 INFO mapreduce.Job: Job job_local874722513_0001 running in uber mode : false 13/03/24 23:21:58 INFO mapreduce.Job: map 0% reduce 0% 13/03/24 23:21:58 INFO mapreduce.Job: Job job_local874722513_0001 failed with state FAILED due to: NA 13/03/24 23:21:58 INFO mapreduce.Job: Counters: 0 13/03/24 23:21:58 ERROR streaming.StreamJob: Job not Successful! Streaming Command Failed! $
mapred-site.xmlに
<property> <name>mapred.child.java.opts</name> <value>-Xmx1024m</value> </property>
と書けば解消するような報告もあったが該当しなかったようで解決せず。
最終的には、
<configuration> <property> <name>mapreduce.task.io.sort.mb</name> <value>1</value> </property> </configuration>
cf.
http://stackoverflow.com/questions/12896300/mapreduce-jobs-in-hive-0-8-1-cdh4-0-1-failed
コマンド実行直後の「cygpath: can't convert empty path」は、
# cygwin path translation if $cygwin; then HADOOP_PREFIX=`cygpath -w "$HADOOP_PREFIX"` HADOOP_LOG_DIR=`cygpath -w "$HADOOP_LOG_DIR"` # -iオプションを追加 JAVA_LIBRARY_PATH=`cygpath -i -w "$JAVA_LIBRARY_PATH"` fi