如何通過雲數據流中的自定義邏輯按鍵組

我試圖根據雲數據流管線中的自定義對象來實現Groupby鍵。如何通過雲數據流中的自定義邏輯按鍵組

public static void main(String[] args) { 
    Pipeline pipeline = Pipeline.create(PipelineOptionsFactory.create()); 
    List<KV<Student,StudentValues>> studentList = new ArrayList<>(); 
    studentList.add(KV.of(new Student("pawan", 10,"govt"), 
         new StudentValues("V1", 123,"govt"))); 
    studentList.add(KV.of(new Student("pawan", 13223,"word"), 
         new StudentValues("V2", 456,"govt"))); 

    PCollection<KV<Student,StudentValues>> pc = 
    pipeline.apply(Create.of(studentList)); 
    PCollection<KV<Student, Iterable<StudentValues>>> groupedWords = 
    pc.apply(GroupByKey.<Student,StudentValues>create()); 
}

我只是想GROUPBY都基於Student對象的PCollection記錄。

我已重寫我的自定義類的equals方法，但每次我收到學生對象的同一個實例來比較內部equals方法。理想情況下，它比較第一個學生的關鍵和第二個。

我在這裏做什麼錯了。

來源

2017-04-05 Pavan Tiwari

你爲什麼認爲你做錯了什麼？每個元素的鍵都被序列化（使用您指定的AvroCoder），並且GroupByKey可以將具有相同序列化表示的所有元素組合在一起。之後，它不需要比較學生，以確保具有相同密鑰的值已經組合在一起。

來源

2017-04-05 18:50:18

如何通過雲數據流中的自定義邏輯按鍵組

回答

相關問題