Bufferreader和Bufferwriter用於讀取和寫入文件HDFS

我試圖從線HDFS文件中的行讀取，然後通過線創建一個HDFS文件，並寫入其行。我使用的代碼如下所示：Bufferreader和Bufferwriter用於讀取和寫入文件HDFS

  Path FileToRead=new Path(inputPath); 
     FileSystem hdfs = FileToRead.getFileSystem(new Configuration());    
     FSDataInputStream fis = hdfs.open(FileToRead); 
     BufferedReader reader = new BufferedReader(new InputStreamReader(fis)); 

     String line; 
      line = reader.readLine(); 
      while (line != null){ 

       String[] lineElem = line.split(","); 
       for(int i=0;i<10;i++){ 

        MyMatrix[i][Integer.valueOf(lineElem[0])-1] = Double.valueOf(lineElem[i+1]); 
       } 

       line=reader.readLine(); 
     } 

     reader.close(); 
     fis.close(); 


     Path FileToWrite = new Path(outputPath+"/V"); 
     FileSystem fs = FileSystem.get(new Configuration()); 
     FSDataOutputStream fileOut = fs.create(FileToWrite); 
     BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(fileOut)); 
     writer.write("check"); 
     writer.close(); 
     fileOut.close();

當我在outputPath文件中運行此代碼時V尚未創建。但是，如果我將零件替換爲用於寫入的零件，則將創建文件並將檢查寫入其中。任何人都可以請幫我理解如何正確使用它們，以便能夠先讀取整個文件，然後逐行寫入文件？

我也曾嘗試另一個代碼從一個文件讀取和寫入到另一個，但該文件將被創建，但沒有什麼寫進了！

我用某事像這樣：

hadoop jar main.jar program2.Main input output

然後在我的第一份工作，我從ARG閱讀[0]和寫在args [1] +「/ NewV」的文件中的地圖減少類和它作品。以我其他類（非地圖減少）我使用ARGS [1] +「/ NewV」作爲輸入路徑和輸出+「/ V_0」作爲輸出路徑（I通過這些字符串的構造函數）。這裏是該類的代碼：

public class Init_V { 

String inputPath, outputPath; 


public Init_V(String inputPath, String outputPath) throws Exception { 

    this.inputPath = inputPath; 
    this.outputPath = outputPath; 


    try{    

     FileSystem fs = FileSystem.get(new Configuration()); 
     Path FileToWrite = new Path(outputPath+"/V.txt"); 
     Path FileToRead=new Path(inputPath); 
     BufferedWriter output = new BufferedWriter 
     (new OutputStreamWriter(fs.create(FileToWrite, 
       true))); 

     BufferedReader reader = new 
      BufferedReader(new InputStreamReader(fs.open(FileToRead))); 
       String data; 
       data = reader.readLine(); 
       while (data != null) 
       { 
        output.write(data); 
        data = reader.readLine(); 
       } 
       reader.close();      
       output.close(); }catch(Exception e){ 
} 

} 

}

來源

2013-05-12 Eileen Jr

該文件是否存在於本地路徑上？可能是你要用本地上下文（'file：/// path/to/file'）傳遞輸出路徑，或者默認的文件系統是本地的。你可以分享你啓動程序的調用行，以及$ HADOOP_CONF/core-site中的'fs.default.name'的值。xml' – 2013-05-12 21:13:43

其實我試圖讀取的文件是由HDFS中的另一個作業創建的文件。它位於HDFS中的輸出目錄中。讀完這個文件後，我可以在同一個目錄輸出中（在HDFS中）創建要寫入的文件。我使用「hadoop fs -get output output」將其複製到我的本地機器，當我檢查新創建的文件時，它是空的！ – 2013-05-12 21:24:45

我認爲，你需要了解hadoop如何正常工作。在hadoop中，很多事情是由系統完成的，你只是給出輸入和輸出路徑，然後如果路徑有效，它們將被hadoop打開和創建。檢查下面的例子;

public int run (String[] args) throws Exception{ 

    if(args.length != 3){ 
     System.err.println("Usage: MapReduce <input path> <output path> "); 
     ToolRunner.printGenericCommandUsage(System.err); 
    } 
    Job job = new Job(); 
    job.setJarByClass(MyClass.class); 
    job.setNumReduceTasks(5); 
    job.setJobName("myclass"); 
    FileInputFormat.addInputPath(job, new Path(args[0])); 
    FileOutputFormat.setOutputPath(job, new Path(args[1])); 

    job.setMapperClass(MyMapper.class); 
    job.setReducerClass(MyReducer.class); 

    job.setOutputKeyClass(Text.class); 
    job.setOutputValueClass(Text.class); 

    return job.waitForCompletion(true) ? 0:1 ; 
} 


/* ----------------------main---------------------*/ 
public static void main(String[] args) throws Exception{  

    int exitCode = ToolRunner.run(new MyClass(), args); 
    System.exit(exitCode); 
}

正如你看到這裏，你只需要初始化變量和閱讀&寫入由Hadoop的完成。

此外，在您的映射類你說context.write(key, value)裏面的地圖，同樣在你減少類你正在做的一樣，寫你。

如果使用BufferedWriter/Reader，它將寫入本地文件系統而不是HDFS。要查看HDFS文件，你應該寫hadoop fs -ls <path>，你是ls命令尋找的文件在本地文件系統

編輯：爲了使用讀/寫，你應該知道以下內容：假設你有N個機器在你的hadoop網絡中。當你想閱讀時，你不會知道哪個映射器正在讀取，類似的寫法。所以，所有的映射器和縮減器都應該有這些路徑不要例外。

我不知道你是否可以使用任何其他職業，但你可以使用兩種方法爲你的具體原因：startup和cleanup。這些方法在每張地圖中只使用一次，並減少工作人員。所以如果你想讀寫你可以使用這些文件。讀和寫與普通的java代碼相同。例如，你想爲每個鍵看到一些東西，並且想把它寫入一個txt。您可以執行以下操作：

//in reducer 
BufferedReader bw ..; 

void startup(...){ 
    bw = new ....; 
} 

void reduce(...){ 
    while(iter.hasNext()){ ....; 
    } 
    bw.write(key, ...); 
} 
void cleanup(...){ 
    bw.close(); 
}

來源

2013-05-12 19:39:52 smttsp

謝謝你的回答。但問題是我想在一個類中使用hdfs文件，但不一定映射或減少類。在第二個代碼中，輸出文件是在hdfs中創建的，但是有註釋寫入它！有什麼方法可以在一個類中讀寫hdfs文件，但映射或減少？ – 2013-05-12 19:54:44

的Hadoop的工作作風是'每個鍵（一般線）''做操作map'和'在減速每個鍵收集相同的密鑰元素和寫入到hdfs'。我不知道是否有另一種使用map reduce的方式，但是通過這種方式，您可以根據鍵自動排序數據。這是MR的另一個優點。 – smttsp 2013-05-12 20:19:50

我想我沒有正確地問我的問題。其實我試圖從文件中讀取，並寫入不使用Mapper或Reduce類的文件。我需要從文件中讀取一次並存儲數據（我使用矩陣），並在處理數據後將結果寫入另一個文件中。但我想在一個普通的課堂上做所有這些，而不是Mapper或Reduce。我必須使用一個文件，並且想要同時訪問所有行，並且我不確定使用distributedcache是否能解決我的問題。如果我想使用Mapper和Reduce，我可能需要2個需要很長時間的工作。 – 2013-05-12 20:32:37

Bufferreader和Bufferwriter用於讀取和寫入文件HDFS

回答

相關問題