0
當我嘗試計算每個組的記錄數時,我發現該組具有空值但沒有記錄,但這是不正確的。Apache Spark計數記錄每個組的空值
輸入數據幀:
+--------+
| Name|
+--------+
| Andrei|
| Andrei|
| null|
| null|
|Grigorii|
+--------+
代碼:
Dataset<Row> df = inputDf.groupBy("Name")
.agg(functions.count("Name").as("Name_count"));
實際數據框:
+--------+----------+
| Name|Name_count|
+--------+----------+
| null| 0|
| Andrei| 2|
|Grigorii| 1|
+--------+----------+
預期的數據幀:
+--------+----------+
| Name|Name_count|
+--------+----------+
| null| 2|
| Andrei| 2|
|Grigorii| 1|
+--------+----------+