一旦您擁有rdd
,有許多方法可以創建dataframe
。其中之一是使用.toDF
功能需要sqlContext.implicits
庫是imported
爲
val sparkSession = SparkSession.builder().appName("udf testings")
.master("local")
.config("", "")
.getOrCreate()
val sc = sparkSession.sparkContext
val sqlContext = sparkSession.sqlContext
import sqlContext.implicits._
後您閱讀fpgrowth
文本文件和隱蔽到rdd
val data = sc.textFile("path to sample_fpgrowth.txt that you have used")
val transactions: RDD[Array[String]] = data.map(s => s.trim.split(' '))
我從Frequent Pattern Mining - RDD-based API使用的代碼即問題中提供的內容
val fpg = new FPGrowth()
.setMinSupport(0.2)
.setNumPartitions(10)
val model = fpg.run(transactions)
下一步將調用.toDF
功能
對於第一dataframe
model.freqItemsets.map(itemset =>(itemset.items.mkString("[", ",", "]") , itemset.freq)).toDF("items", "freq").show(false)
這將導致到
+---------+----+
|items |freq|
+---------+----+
|[z] |5 |
|[x] |4 |
|[x,z] |3 |
|[y] |3 |
|[y,x] |3 |
|[y,x,z] |3 |
|[y,z] |3 |
|[r] |3 |
|[r,x] |2 |
|[r,z] |2 |
|[s] |3 |
|[s,y] |2 |
|[s,y,x] |2 |
|[s,y,x,z]|2 |
|[s,y,z] |2 |
|[s,x] |3 |
|[s,x,z] |2 |
|[s,z] |2 |
|[t] |3 |
|[t,y] |3 |
+---------+----+
only showing top 20 rows
用於第二dataframe
val minConfidence = 0.8
model.generateAssociationRules(minConfidence)
.map(rule =>(rule.antecedent.mkString("[", ",", "]"), rule.consequent.mkString("[", ",", "]"), rule.confidence))
.toDF("antecedent", "consequent", "confidence").show(false)
,這將導致對
+----------+----------+----------+
|antecedent|consequent|confidence|
+----------+----------+----------+
|[t,s,y] |[x] |1.0 |
|[t,s,y] |[z] |1.0 |
|[y,x,z] |[t] |1.0 |
|[y] |[x] |1.0 |
|[y] |[z] |1.0 |
|[y] |[t] |1.0 |
|[p] |[r] |1.0 |
|[p] |[z] |1.0 |
|[q,t,z] |[y] |1.0 |
|[q,t,z] |[x] |1.0 |
|[q,y] |[x] |1.0 |
|[q,y] |[z] |1.0 |
|[q,y] |[t] |1.0 |
|[t,s,x] |[y] |1.0 |
|[t,s,x] |[z] |1.0 |
|[q,t,y,z] |[x] |1.0 |
|[q,t,x,z] |[y] |1.0 |
|[q,x] |[y] |1.0 |
|[q,x] |[t] |1.0 |
|[q,x] |[z] |1.0 |
+----------+----------+----------+
only showing top 20 rows
我希望這是你需要
的可能的複製[如何RDD對象轉換爲數據框火花(https://stackoverflow.com/questions/29383578/how-to-convert-rdd -object-to-dataframe-in-spark) – stefanobaghino
我試過我有錯誤,可能是因爲我是scala新手。你能否詳細解釋我的問題給出的例子。 –
@ zero323你能幫助我通過我的問題 –