1
我有一個數據幀,其這樣下去:移調火花
+---------+-------------+--------------------+--------+
| ID | reg_num| reg_typ|reg_code|
+---------+-------------+--------------------+--------+
|523528690| 134886307000|Chamber of Commer | 14246|
|523528690|2015/369956|Government Gazett | 14225|
|523528690| 997253630|Tax Registration | 14259|
|523528691| 997253633|Tax Doc | 14250|
|523528691| 997253634|Tax File | 14251|
|523528691| 997253635|Tax Data | 14252|
|523528691| 997253636|Tax Monitor | 14253|
+---------+-------------+--------------------+--------+
現在我試圖實現與格式輸出:
+---------+-------------+--------------------+--------+-------------+-------------+-------------+-------------+
| ID | reg_num| reg_typ|reg_code| reg_1 | reg_2 | reg_3 | reg_4 |
+---------+-------------+--------------------+--------+-------------+-------------+-------------+-------------+
|523528690| 134886307000|Chamber of Commer | 14246| 134886307000|2015/369956| 997253630 | null |
|523528690|2015/369956|Government Gazett | 14225|134886307000 |2015/369956|997253630 |null |
|523528690| 997253630|Tax Registration | 14259| 134886307000|2015/369956| 997253630 | null |
|523528691| 997253633|Tax Doc | 14250| 997253633| 997253634| 997253635| 997253636|
|523528691| 997253634|Tax File | 14251| 997253633| 997253634| 997253635| 997253636|
|523528691| 997253635|Tax Data | 14252| 997253633| 997253634| 997253635| 997253636|
|523528691| 997253636|Tax Monitor | 14253| 997253633| 997253634| 997253635| 997253636|
+---------+-------------+--------------------+--------+-------------+-------------+-------------+-------------+
我所看到的預定義功能像樞軸,但它似乎不適合我的情況。
我使用Spark版本1.6和Scala版本2.10.5。
幫助appriciated!
@eliasah該解決方案解決了這個問題,並根據需要進行。謝謝:) – Svk
很高興聽到! – eliasah
@eliasah只是一個問題,當我試圖通過一個大型數據集時,reg_1,.. reg_4列的排列不是按照原始數據框中的順序排列的,因爲在第1個reg_num不對應於reg_1。是否因爲窗口函數正在使用order by子句? – Svk