按dayofweek分組並計數值DataFrame Spark SQL

我已經加載了一個DataFrame。它看起來像這樣：按dayofweek分組並計數值DataFrame Spark SQL

uber_converted.show() 

+--------------------+--------------------+-------------------+----------+---------+--------------------+ 
|dispatching_base_num|   pickup_date|affiliated_base_num|locationID|  zone|    borough| 
+--------------------+--------------------+-------------------+----------+---------+--------------------+ 
|    B02765|2015-05-08 19:05:...|    B02764|  262|Manhattan|  Yorkville East| 
|    B02765|2015-05-08 19:06:...|    B00013|  234|Manhattan|   Union Sq| 
|    B02765|2015-05-08 19:06:...|    B02765|  107|Manhattan|   Gramercy| 
|    B02765|2015-05-08 19:06:...|    B02765|  137|Manhattan|   Kips Bay| 
|    B02765|2015-05-08 19:06:...|    B02765|  220| Bronx|Spuyten Duyvil/Ki...| 
|    B02765|2015-05-08 19:06:...|    B02765|  138| Queens| LaGuardia Airport| 
|    B02765|2015-05-08 19:06:...|    B02749|  143|Manhattan| Lincoln Square West| 
|    B02765|2015-05-08 19:06:...|    B02765|  244|Manhattan|Washington Height...| 
|    B02765|2015-05-08 19:06:...|    B02617|  262|Manhattan|  Yorkville East| 
|    B02765|2015-05-08 19:06:...|    B02765|  144|Manhattan| Little Italy/NoLiTa| 
|    B02765|2015-05-08 19:06:...|    B00381|  209|Manhattan|    Seaport| 
|    B02765|2015-05-08 19:06:...|    B02765|  234|Manhattan|   Union Sq| 
|    B02765|2015-05-08 19:06:...|    B02765|  163|Manhattan|  Midtown North| 
|    B02765|2015-05-08 19:06:...|    B02765|  181| Brooklyn|   Park Slope| 
|    B02765|2015-05-08 19:06:...|    B02765|  116|Manhattan| Hamilton Heights| 
|    B02765|2015-05-08 19:06:...|    B02765|  236|Manhattan|Upper East Side N...| 
|    B02765|2015-05-08 19:06:...|    B02765|  140|Manhattan|  Lenox Hill East| 
|    B02765|2015-05-08 19:07:...|    B02765|  162|Manhattan|  Midtown East| 
|    B02765|2015-05-08 19:07:...|    B02788|  263|Manhattan|  Yorkville West| 
|    B02765|2015-05-08 19:07:...|    B02765|  181| Brooklyn|   Park Slope| 
+--------------------+--------------------+-------------------+----------+---------+--------------------+

而且我需要根據pickup日期字段對星期幾進行分組和計數。其結果必然是這樣

dayofweek count 
1   -> 234 (Monday) 
2   -> 343 (Tuesday)

等等

任何幫助，非常感謝你！

來源

2016-12-27 UserCode

您可以使用date_format：

from pyspark.sql.functions import date_format 

df.groupBy(date_format(df["pickup_date"], "u").alias("dayofweek")).count()

來源

2016-12-27 01:23:23 user7337271

偉大的答覆謝謝。你能告訴我，除了「你」之外，我還可以使用多少參數？ – UserCode

別擔心，這裏是[鏈接]列表（https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.htm） – UserCode

按dayofweek分組並計數值DataFrame Spark SQL

回答

相關問題