0
我有以下數據集:星火Mllib - 頻繁模式挖掘 - 關聯規則 - 沒有得到預期的結果
[A,D]
[C,A,B]
[A]
[A,E,D]
[B,D]
,我試圖提取使用頻繁模式挖掘利用星火Mllib一些關聯規則。對於我有下面的代碼:
val transactions = sc.textFile("/user/cloudera/teste")
import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
val freqItemsets = transactions.repartition(10).map(_.split(",")).flatMap(xs =>
(xs.combinations(1) ++ xs.combinations(2) ++ xs.combinations(3) ++ xs.combinations(4) ++ xs.combinations(5)).filter(_.nonEmpty).map(x => (x.toList, 1L)) ).reduceByKey(_ + _).map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)}
val ar = new AssociationRules().setMinConfidence(0.8)
val results = ar.run(freqItemsets)
results.collect().foreach { rule =>
println("[" + rule.antecedent.mkString(",")
+ "=>"
+ rule.consequent.mkString(",") + "]," + rule.confidence)}
但提取的所有規則,有信心爲1:
[[C=>A],1.0
[[C=>B]],1.0
[A,B]=>[C],1.0
[E=>D]],1.0
[E=>[A],1.0
[A=>B]],1.0
[A=>[C],1.0
[[C,A=>B]],1.0
[[A=>D]],1.0
[E,D]=>[A],1.0
[[A,E=>D]],1.0
[[C,B]=>A],1.0
[[B=>D]],1.0
[B]=>A],1.0
[B]=>[C],1.0
我真的不理解,我在我的代碼已經問題...任何人都知道我有什麼錯誤來計算信心?
非常感謝!
喜@ Anony,慕斯,感謝您的答覆:) 但爲什麼它是總是出現值等於1?作爲一個真正的信心低於0.8這些規則不應該出現,對嗎? –
你已經設置了'.setMinConfidence(0.8)' –