2014-03-24 107 views
0

嘗試運行時random forest example我遇到java.lang.ArrayIndexOutOfBoundsException: 100錯誤。這裏100是綁定到樹的數量。地圖部分完成100%,縮小爲0%。我使用hadoop-1.2.1mahout-distribution-0.7。我也嘗試過mahout-distribution-0.9,發生同樣的錯誤。Mahout隨機森林分類器示例ArrayIndexOutOfBoundsException

有沒有人跑過這個例子運氣?

+0

什麼讓你覺得'100'異常對應的株數?你能發佈更多的堆棧跟蹤嗎? –

回答

1

發現問題。如果使用mapred.job.tracker = local運行hadoop,則PartialBuilder無法使用mapred.map.tasks獲取映射任務的數量。因此,它計算每個映射任務的樹數量是錯誤的。

解決方案:在本地hadoop上運行隨機森林作業時,不要使用參數「-p」。

詳情:

[email protected]:~/mahout/data/> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d testdata/KDDTrain+.arff -ds testdata/KDDTrain+.info -sl 5 -t 100 -o nsl-forest 
Warning: $HADOOP_HOME is deprecated. 

14/08/07 11:25:18 INFO mapreduce.BuildForest: InMem Mapred implementation 
14/08/07 11:25:18 INFO mapreduce.BuildForest: Building the forest... 
14/08/07 11:25:18 INFO util.NativeCodeLoader: Loaded the native-hadoop library 
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Creating KDDTrain+.info in /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata-work-5026960219142699303 with rwxr-xr-x 
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.info as /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata/KDDTrain+.info 
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.info as /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata/KDDTrain+.info 
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Creating KDDTrain+.arff in /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata-work-5750487161401524172 with rwxr-xr-x 
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.arff as /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata/KDDTrain+.arff 
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.arff as /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata/KDDTrain+.arff 
14/08/07 11:25:19 INFO mapred.JobClient: Running job: job_local966281240_0001 
14/08/07 11:25:19 INFO mapred.LocalJobRunner: Waiting for map tasks 
14/08/07 11:25:19 INFO mapred.LocalJobRunner: Starting task: attempt_local966281240_0001_m_000000_0 
14/08/07 11:25:19 INFO util.ProcessTree: setsid exited with exit code 0 
14/08/07 11:25:19 INFO mapred.Task: Using ResourceCalculatorPlugin : [email protected] 
14/08/07 11:25:19 INFO mapred.MapTask: Processing split: [firstId:0, nbTrees:100, seed:null] 
14/08/07 11:25:19 INFO inmem.InMemMapper: Loading the data... 
14/08/07 11:25:20 INFO mapred.JobClient: map 0% reduce 0% 
14/08/07 11:25:21 INFO inmem.InMemMapper: Data loaded : 125973 instances 
14/08/07 11:25:25 INFO mapred.LocalJobRunner: 
14/08/07 11:25:26 INFO mapred.JobClient: map 1% reduce 0% 

... 

14/08/07 11:27:59 INFO mapred.JobClient: map 98% reduce 0% 
14/08/07 11:28:00 INFO mapred.Task: Task:attempt_local966281240_0001_m_000000_0 is done. And is in the process of commiting 
14/08/07 11:28:00 INFO mapred.LocalJobRunner: 
14/08/07 11:28:00 INFO mapred.Task: Task attempt_local966281240_0001_m_000000_0 is allowed to commit now 
14/08/07 11:28:00 INFO output.FileOutputCommitter: Saved output of task 'attempt_local966281240_0001_m_000000_0' to file:/home/martin/Programmieren/mahout/data/cut/nsl-forest 
14/08/07 11:28:00 INFO mapred.LocalJobRunner: 
14/08/07 11:28:00 INFO mapred.Task: Task 'attempt_local966281240_0001_m_000000_0' done. 
14/08/07 11:28:00 INFO mapred.LocalJobRunner: Finishing task: attempt_local966281240_0001_m_000000_0 
14/08/07 11:28:00 INFO mapred.LocalJobRunner: Map task executor complete. 
14/08/07 11:28:00 INFO mapred.JobClient: map 99% reduce 0% 
14/08/07 11:28:00 INFO mapred.JobClient: Job complete: job_local966281240_0001 
14/08/07 11:28:00 INFO mapred.JobClient: Counters: 12 
14/08/07 11:28:00 INFO mapred.JobClient: File Output Format Counters 
14/08/07 11:28:00 INFO mapred.JobClient:  Bytes Written=2353226 
14/08/07 11:28:00 INFO mapred.JobClient: File Input Format Counters 
14/08/07 11:28:00 INFO mapred.JobClient:  Bytes Read=0 
14/08/07 11:28:00 INFO mapred.JobClient: FileSystemCounters 
14/08/07 11:28:00 INFO mapred.JobClient:  FILE_BYTES_READ=61962918 
14/08/07 11:28:00 INFO mapred.JobClient:  FILE_BYTES_WRITTEN=45667235 
14/08/07 11:28:00 INFO mapred.JobClient: Map-Reduce Framework 
14/08/07 11:28:00 INFO mapred.JobClient:  Map input records=100 
14/08/07 11:28:00 INFO mapred.JobClient:  Physical memory (bytes) snapshot=0 
14/08/07 11:28:00 INFO mapred.JobClient:  Spilled Records=0 
14/08/07 11:28:00 INFO mapred.JobClient:  Total committed heap usage (bytes)=132120576 
14/08/07 11:28:00 INFO mapred.JobClient:  CPU time spent (ms)=0 
14/08/07 11:28:00 INFO mapred.JobClient:  Virtual memory (bytes) snapshot=0 
14/08/07 11:28:00 INFO mapred.JobClient:  SPLIT_RAW_BYTES=90 
14/08/07 11:28:00 INFO mapred.JobClient:  Map output records=100 
14/08/07 11:28:00 INFO common.HadoopUtil: Deleting file:/home/martin/Programmieren/mahout/data/cut/nsl-forest 
14/08/07 11:28:00 INFO mapreduce.BuildForest: Build Time: 0h 2m 41s 702 
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest num Nodes: 130056 
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest mean num Nodes: 1300 
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest mean max Depth: 19 
14/08/07 11:28:00 INFO mapreduce.BuildForest: Storing the forest in: nsl-forest/forest.seq