2013-03-08 80 views
0

嗨,大家好,我正在嘗試爲k-mean Clustering Algo運行集羣轉儲。它不工作。任何想法?這是Mahout在psudo模式集羣中的實例。Mahout集羣不讀取輸入

任何可視化來自羣集轉儲的輸出或來自K均值的輸出的工具或裝置。

[[email protected] bin]$ ./mahout clusterdump -dt sequencefile -d /home/186946/reuters-vectors/dictionary.file-0-i reuters-fkmeans-clusters/clusters-3 -o /home/186946/clusters.txt -b 10 -n 10 
Running on hadoop, using HADOOP_HOME=/home/186946/hadoop-0.20.2-cdh3u5 
No HADOOP_CONF_DIR set, using /home/186946/hadoop-0.20.2-cdh3u5/src/conf 
MAHOUT-JOB: /home/186946/mahout-0.5-cdh3u5/mahout-examples-0.5-cdh3u5-job.jar 
MAHOUT-JOB: /home/186946/mahout-0.5-cdh3u5/mahout-examples-0.5-cdh3u5-job.jar 
13/03/08 17:26:11 ERROR common.AbstractJob: Unexpected reuters-fkmeans-clusters/clusters-3 while processing Job-Specific Options: 
usage: <command> [Generic Options] [Job-Specific Options] 
Generic Options: 
-archives <paths>    comma separated archives to be unarchived 
           on the compute machines. 
-conf <configuration file>  specify an application configuration file 
-D <property=value>   use value for given property 
-files <paths>     comma separated files to be copied to the 
           map reduce cluster 
-fs <local|namenode:port>  specify a namenode 
-jt <local|jobtracker:port> specify a job tracker 
-libjars <paths>    comma separated jar files to include in 
           the classpath. 
-tokenCacheFile <tokensFile> name of the file with the tokens 
Unexpected reuters-fkmeans-clusters/clusters-3 while processing Job-Specific  
Options:                   
Usage:                   
[--seqFileDir <seqFileDir> --output <output> --substring <substring>   
--numWords <numWords> --pointsDir <pointsDir> --dictionary <dictionary>   
--dictionaryType <dictionaryType> --help --tempDir <tempDir> --startPhase  
<startPhase> --endPhase <endPhase>]            
Job-Specific Options:               
    --seqFileDir (-s) seqFileDir    The directory containing Sequence  
              Files for the Clusters    
    --output (-o) output      Optional output directory. Default 
              is to output to the console.   
    --substring (-b) substring    The number of chars of the   
              asFormatString() to print    
    --numWords (-n) numWords     The number of top terms to print  
    --pointsDir (-p) pointsDir    The directory containing points  
              sequence files mapping input vectors 
              to their cluster. If specified,  
              then the program will output the  
              points associated with a cluster  
    --dictionary (-d) dictionary    The dictionary file     
    --dictionaryType (-dt) dictionaryType The dictionary file type    
              (text|sequencefile)     
    --help (-h)        Print out help      
    --tempDir tempDir      Intermediate output directory   
    --startPhase startPhase     First phase to run     
    --endPhase endPhase      Last phase to run      
13/03/08 17:26:11 INFO driver.MahoutDriver: Program took 133 ms 

感謝

回答

0
mahout clusterdump \ 
-d output/vectors/dictionary.file-0 \ 
-dt sequencefile \ 
-i output/clusters/clusters-2-final/part-00000 \ 
-n 20 \ 
-b 100 \ 
-o cdump.txt \ 
-p output/clusters/clusteredPoints/ 

只要複製粘貼上面的文本編輯器中的所有行,把你的參數-d-dt-i,仔細-p礦山。

p.s路徑來自HDFS。