如何在每行添加行號？

想這些都是我的數據：如何在每行添加行號？

‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS. 
‘Map’ is responsible to read data from input location. 
it will generate a key value pair. 
that is, an intermediate output in local machine. 
’Reducer’ is responsible to process the intermediate. 
output received from the mapper and generate the final output.

，我想一個號碼添加到每一行類似下面的輸出：

1,‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS. 
2,‘Map’ is responsible to read data from input location. 
3,it will generate a key value pair. 
4,that is, an intermediate output in local machine. 
5,’Reducer’ is responsible to process the intermediate. 
6,output received from the mapper and generate the final output.

它們保存到文件中。

我已經試過：

object DS_E5 { 
    def main(args: Array[String]): Unit = { 

    var i=0 
    val conf = new SparkConf().setAppName("prep").setMaster("local") 
    val sc = new SparkContext(conf) 
    val sample1 = sc.textFile("data.txt") 
    for(sample<-sample1){ 
     i=i+1 
     val ss=sample.map(l=>(i,sample)) 
     println(ss) 
    } 
} 
}

，但它的輸出就像是自爆：

Vector((1,‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS.)) 
...

如何編輯我的代碼生成像我最喜歡輸出的輸出？

來源

2015-07-03 AHAD

問題也出現在這裏逐字：HTTP：//bigdataanalyticsnews.com/hadoop-interview-questions-mapreduce/ – Madoc

zipWithIndex是你需要在這裏。它從RDD[T]映射到RDD[(T, Long)]通過添加對的第二個位置上的索引。

sample1 
    .zipWithIndex() 
    .map { case (line, i) => i.toString + ", " + line }

或使用字符串插值（見@ DanielC.Sobral評論）

sample1 
    .zipWithIndex() 
    .map { case (line, i) => s"$i, $line" }

來源

2015-07-03 19:27:07 zero323

可能需要'I + 1'到如果算上從1開始 – jwvh

謝謝@ zero323，這是可以的，但還有括號（1，行），我想刪除這些括號。 – AHAD

我不確定我是否理解。你的輸出是RDD [String]嗎？ – zero323

通過調用val sample1 = sc.textFile("data.txt")您要創建一個新的RDD。

如果您需要只是一個輸出，你可以嘗試使用下面的代碼：

sample1.zipWithIndex().foreach(f => println(f._2 + ", " + f._1))

基本上，通過使用此代碼，你會做到這一點：

使用.zipWithIndex()將返回新的RDD[(T, Long)]，其中(T, Long)是Tuple,T是以前的RDD元素數據類型（java.lang.String，我相信），Long是RDD中元素的索引。
您執行了轉換，現在您需要制定一個動作。 foreach,這種情況下，很適合。基本上是這樣做的：它將語句應用於當前RDD中的每個元素，因此我們只需調用格式爲println的格式。

來源

2015-07-03 19:41:46 SuppieRK

如何在每行添加行號？

回答

相關問題