2013-04-22 46 views
1

是否執行定義映射程序的幫助,以及是否未執行,出於何種原因。我從數據庫中將讀取方式的輸出寫入執行映射器的本地文件系統的文本文件中。在這裏,我給一個代碼確定映射程序的執行

package org.myorg; 

import java.io.*; 
import java.util.*; 
import java.sql.Connection; 
import java.sql.DriverManager; 
import java.sql.ResultSet; 
import java.sql.SQLException; 
import java.sql.Statement; 
import java.util.logging.Level; 
import org.apache.hadoop.fs.*; 
import org.apache.hadoop.conf.*; 
import org.apache.hadoop.io.*; 
import org.apache.hadoop.mapred.*; 
import org.apache.hadoop.util.*; 


public class ParallelIndexation { 

    public static class Map extends MapReduceBase implements 
      Mapper<LongWritable, Text, Text, LongWritable> { 
     private final static LongWritable zero = new LongWritable(0); 
     private Text word = new Text(); 


     public void map(LongWritable key, Text value, 
       OutputCollector<Text, LongWritable> output, Reporter reporter) 
       throws IOException { 

      Configuration conf = new Configuration(); 
      int CountComputers; 
      FileInputStream fstream = new FileInputStream(
        "/export/hadoop-1.0.1/bin/countcomputers.txt"); 
      BufferedReader br = new BufferedReader(new InputStreamReader(fstream)); 
      String result=br.readLine(); 
      CountComputers=Integer.parseInt(result); 
      input.close(); 
      fstream.close(); 
      Connection con = null; 
      Statement st = null; 
       ResultSet rs = null;  
       String url = "jdbc:postgresql://192.168.1.8:5432/NexentaSearch"; 
       String user = "postgres"; 
       String password = "valter89"; 
      ArrayList<String> paths = new ArrayList<String>(); 
      try 
      { 
       con = DriverManager.getConnection(url, user, password); 
         st = con.createStatement(); 
         rs = st.executeQuery("select path from tasks order by id"); 
       while (rs.next()) { paths.add(rs.getString(1)); }; 
       PrintWriter zzz = null; 
        try 
        { 
          zzz = new PrintWriter(new FileOutputStream("/export/hadoop-1.0.1/bin/readwaysfromdatabase.txt")); 
        } 
        catch(FileNotFoundException e) 
        { 
          System.out.println("Error"); 
          System.exit(0); 
        } 
        for (int i=0; i<paths.size(); i++) 
       { 
        zzz.println("paths[i]=" + paths.get(i) + "\n"); 
        } 
        zzz.close(); 
      } 
      catch (SQLException e) 
      { 
       System.out.println("Connection Failed! Check output console"); 
       e.printStackTrace(); 
      } 

但是,儘管它的/export/hadoop-1.0.1/bin/readwaysfromdatabase.txt文件未創建下屬的一個節點。無論從這裏跟隨,什麼映射器都沒有被執行?我也把輸出到

args[0]=/export/hadoop-1.0.1/bin/input 
13/04/22 14:00:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 
13/04/22 14:00:53 INFO mapred.FileInputFormat: Total input paths to process : 0 
13/04/22 14:00:54 INFO mapred.JobClient: Running job: job_201304221331_0003 
13/04/22 14:00:55 INFO mapred.JobClient: map 0% reduce 0% 
13/04/22 14:01:12 INFO mapred.JobClient: map 0% reduce 100% 
13/04/22 14:01:17 INFO mapred.JobClient: Job complete: job_201304221331_0003 
13/04/22 14:01:17 INFO mapred.JobClient: Counters: 15 
13/04/22 14:01:17 INFO mapred.JobClient: Job Counters 
13/04/22 14:01:17 INFO mapred.JobClient:  Launched reduce tasks=1 
13/04/22 14:01:17 INFO mapred.JobClient:  SLOTS_MILLIS_MAPS=9079 
13/04/22 14:01:17 INFO mapred.JobClient:  Total time spent by all reduces waiting after reserving slots (ms)=0 
13/04/22 14:01:17 INFO mapred.JobClient:  Total time spent by all maps waiting after reserving slots (ms)=0 
13/04/22 14:01:17 INFO mapred.JobClient:  SLOTS_MILLIS_REDUCES=7983 
13/04/22 14:01:17 INFO mapred.JobClient: File Output Format Counters 
13/04/22 14:01:17 INFO mapred.JobClient:  Bytes Written=0 
13/04/22 14:01:17 INFO mapred.JobClient: FileSystemCounters 
13/04/22 14:01:17 INFO mapred.JobClient:  FILE_BYTES_WRITTEN=21536 
13/04/22 14:01:17 INFO mapred.JobClient: Map-Reduce Framework 
13/04/22 14:01:17 INFO mapred.JobClient:  Reduce input groups=0 
13/04/22 14:01:17 INFO mapred.JobClient:  Combine output records=0 
13/04/22 14:01:17 INFO mapred.JobClient:  Reduce shuffle bytes=0 
13/04/22 14:01:17 INFO mapred.JobClient:  Reduce output records=0 
13/04/22 14:01:17 INFO mapred.JobClient:  Spilled Records=0 
13/04/22 14:01:17 INFO mapred.JobClient:  Total committed heap usage (bytes)=16252928 
13/04/22 14:01:17 INFO mapred.JobClient:  Combine input records=0 
13/04/22 14:01:17 INFO mapred.JobClient:  Reduce input records=0 

我也帶來了一個輸出到程序的成功執行的文件,一個虛擬機

12/10/28 10:41:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
12/10/28 10:41:14 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 
12/10/28 10:41:14 INFO mapred.FileInputFormat: Total input paths to process : 1 
12/10/28 10:41:15 INFO mapred.JobClient: Running job: job_local_0001 
12/10/28 10:41:15 INFO mapred.Task: Using ResourceCalculatorPlugin : null 
12/10/28 10:41:15 INFO mapred.MapTask: numReduceTasks: 1 
12/10/28 10:41:15 INFO mapred.MapTask: io.sort.mb = 100 
12/10/28 10:41:15 INFO mapred.MapTask: data buffer = 79691776/99614720 
12/10/28 10:41:15 INFO mapred.MapTask: record buffer = 262144/327680 
12/10/28 10:41:15 INFO mapred.MapTask: Starting flush of map output 
12/10/28 10:41:16 INFO mapred.JobClient: map 0% reduce 0% 
12/10/28 10:41:17 INFO mapred.MapTask: Finished spill 0 
12/10/28 10:41:17 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 
12/10/28 10:41:18 INFO mapred.LocalJobRunner: file:/export/hadoop-1.0.1/bin/input/paths.txt:0+156 
12/10/28 10:41:18 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done. 
12/10/28 10:41:18 INFO mapred.Task: Using ResourceCalculatorPlugin : null 
12/10/28 10:41:18 INFO mapred.LocalJobRunner: 
12/10/28 10:41:18 INFO mapred.Merger: Merging 1 sorted segments 
12/10/28 10:41:18 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 199 bytes 
12/10/28 10:41:18 INFO mapred.LocalJobRunner: 
12/10/28 10:41:19 INFO mapred.JobClient: map 100% reduce 0% 
12/10/28 10:41:19 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting 
12/10/28 10:41:19 INFO mapred.LocalJobRunner: 
12/10/28 10:41:19 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now 
12/10/28 10:41:19 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/export/hadoop-1.0.1/bin/output 
12/10/28 10:41:21 INFO mapred.LocalJobRunner: reduce > reduce 
12/10/28 10:41:21 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done. 
12/10/28 10:41:22 INFO mapred.JobClient: map 100% reduce 100% 
12/10/28 10:41:22 INFO mapred.JobClient: Job complete: job_local_0001 
12/10/28 10:41:22 INFO mapred.JobClient: Counters: 18 
12/10/28 10:41:22 INFO mapred.JobClient: File Input Format Counters 
12/10/28 10:41:22 INFO mapred.JobClient:  Bytes Read=156 
12/10/28 10:41:22 INFO mapred.JobClient: File Output Format Counters 
12/10/28 10:41:22 INFO mapred.JobClient:  Bytes Written=177 
12/10/28 10:41:22 INFO mapred.JobClient: FileSystemCounters 
12/10/28 10:41:22 INFO mapred.JobClient:  FILE_BYTES_READ=9573 
12/10/28 10:41:22 INFO mapred.JobClient:  FILE_BYTES_WRITTEN=73931 
12/10/28 10:41:22 INFO mapred.JobClient: Map-Reduce Framework 
12/10/28 10:41:22 INFO mapred.JobClient:  Reduce input groups=4 
12/10/28 10:41:22 INFO mapred.JobClient:  Map output materialized bytes=203 
12/10/28 10:41:22 INFO mapred.JobClient:  Combine output records=4 
12/10/28 10:41:22 INFO mapred.JobClient:  Map input records=1 
12/10/28 10:41:22 INFO mapred.JobClient:  Reduce shuffle bytes=0 
12/10/28 10:41:22 INFO mapred.JobClient:  Reduce output records=4 
12/10/28 10:41:22 INFO mapred.JobClient:  Spilled Records=8 
12/10/28 10:41:22 INFO mapred.JobClient:  Map output bytes=189 
12/10/28 10:41:22 INFO mapred.JobClient:  Total committed heap usage (bytes)=321527808 
12/10/28 10:41:22 INFO mapred.JobClient:  Map input bytes=156 
12/10/28 10:41:22 INFO mapred.JobClient:  Combine input records=0 
12/10/28 10:41:22 INFO mapred.JobClient:  Map output records=4 
12/10/28 10:41:22 INFO mapred.JobClient:  SPLIT_RAW_BYTES=98 
12/10/28 10:41:22 INFO mapred.JobClient:  Reduce input records=0 

@ChrisWhite我跑PROGRAMM在執行程序的文件與命令

./hadoop jar /export/hadoop-1.0.1/bin/ParallelIndexation.jar org.myorg.ParallelIndexation /export/hadoop-1.0.1/bin/input /export/hadoop-1.0.1/bin/output -D mapred.map.tasks=1 1> resultofexecute.txt 2&>1 

的幫助我的集羣中有4個節點,其中一個主,一個用於secondarynamenode和2個下屬。

+0

請不要使用DataInputStream來讀取文本文件。你不需要它,所以請刪除它。 – 2013-04-24 07:41:39

+0

@PeterLawrey爲什麼? – user2306966 2013-04-24 08:24:10

+0

DataInputStream是多餘的,但是這個壞例子在堆棧溢出時每月複製30次。它令我痛苦,因爲它是錯誤的,一直是錯誤的,有時會導致錯誤,這完全是可以避免的。 – 2013-04-24 08:45:37

回答

0

爲您的工作安排了多少地圖任務,您的集羣有多大?如果說你的工作只運行4個地圖任務和一個包含32個節點的集羣,那麼很可能28/32個節點不會有任何輸出(因爲在這些節點上沒有運行地圖任務)。

您可以看到關於有多少地圖任務組成作業以及這些作業計劃通過作業跟蹤器Web UI運行的信息。

奇怪的是,你第一次運行轉儲不顯示啓動的任何地圖的工作,只減少任務:

13/04/22 14:01:17 INFO mapred.JobClient:  Launched reduce tasks=1 

,也有沒有櫃檯辦理地圖輸入/輸出記錄,這樣一個奇怪的現象是瞭如何你正在運行這項工作 - 你可以分享你用來啓動你的工作的完整命令行,並可能配置和運行這個工作的驅動代碼?

+0

我使用命令 ./hadoop jar /export/hadoop-1.0.1/bin/ParallelIndexation.jar org.myorg.ParallelIndexation /export/hadoop-1.0.1/bin/input/export/hadoop運行程序-1.0.1/bin/output -D mapred.map.tasks = 1 1> resultofexecute.txt 2&> 1 @ChrisWhite – user2306966 2013-04-22 11:52:49

+0

我在一個集羣中有4個節點,其中一個主節點,一個用於secondarynamenode和2個下屬節點。 – user2306966 2013-04-22 11:57:06