2015-03-13 109 views
0

我使用Spark笛卡爾函數來生成一個列表N對值。DStream的笛卡爾

我然後這些值映射到每個用戶之間產生一距離度量:

val cartesianUsers: org.apache.spark.rdd.RDD[(distance.classes.User, distance.classes.User)] = users.cartesian(users) 
cartesianUsers.map(m => manDistance(m._1, m._2)) 

這按預期工作。

使用星火流媒體庫中創建一個DSTREAM然後映射了它:

val customReceiverStream: ReceiverInputDStream[String] = ssc.receiverStream.... 
customReceiverStream.foreachRDD(m => { 
    println("size is " + m) 
}) 

我可以內customReceiverStream.foreachRDD使用笛卡爾功能,但根據文檔http://spark.apache.org/docs/1.2.0/streaming-programming-guide.htm這不是它的預期用途:

foreachRDD(func)應用函數的最通用的輸出運算符,func, to each RDD generated from the stream. This function should push the data in each RDD to a external system, like saving the RDD to files, or writing it over the network to a database. Note that the function func is executed in the driver process running the streaming application, and will usually have RDD actions in it that will force the computation of the streaming RDDs.

如何計算DStream的笛卡爾?也許我誤解了DStreams的使用?

回答