1
我已經在我的HDFS csv文件用的產品,如集合:變換org.apache.spark.rdd.RDD [字符串]爲並行化集合
[56]
[85,66,73]
[57]
[8,16]
[25,96,22,17]
[83,61]
我試圖應用關聯規則算法在我的代碼中。爲此,我需要運行此:
scala> val data = sc.textFile("/user/cloudera/data")
data: org.apache.spark.rdd.RDD[String] = /user/cloudera/data MapPartitionsRDD[294] at textFile at <console>:38
scala> val distData = sc.parallelize(data)
但是,當我提出這個我得到這個錯誤:在序列集合
<console>:40: error: type mismatch;
found : org.apache.spark.rdd.RDD[String]
required: Seq[?]
Error occurred in an application involving default arguments.
val distData = sc.parallelize(data)
我如何可以改變一個RDD [字符串]?
非常感謝!