0
所以我的表看起來是這樣的:
在星火瀝水最後一個項目重複陣列結構的陣列結構數據幀
customer_1|place|customer_2|item |count
-------------------------------------------------
a | NY | b |(2010,304,310)| 34
a | NY | b |(2024,201,310)| 21
a | NY | b |(2010,304,312)| 76
c | NY | x |(2010,304,310)| 11
a | NY | b |(453,131,235) | 10
我試着做,但是這並沒有消除重複的,因爲前者是數組仍然存在(因爲它應該是,我需要它爲最終結果)。
val df= df_one.withColumn("vs", struct(col("item").getItem(size(col("item"))-1), col("item"), col("count")))
.groupBy(col("customer_1"), col("place"), col("customer_2"))
.agg(max("vs").alias("vs"))
.select(col("customer_1"), col("place"), col("customer_2"), col("vs.item"), col("vs.count"))
我想按customer_1,地點和customer_2列,僅返回陣列結構,其最後一個項目(-1)是具有最高計數獨特的,任何想法?
預期輸出:
customer_1|place|customer_2|item |count
-------------------------------------------------
a | NY | b |(2010,304,312)| 76
a | NY | b |(2010,304,310)| 34
a | NY | b |(453,131,235) | 10
c | NY | x |(2010,304,310)| 11