1

我已經加載了一個DataFrame。它看起來像這樣:按dayofweek分組並計數值DataFrame Spark SQL

uber_converted.show() 

+--------------------+--------------------+-------------------+----------+---------+--------------------+ 
|dispatching_base_num|   pickup_date|affiliated_base_num|locationID|  zone|    borough| 
+--------------------+--------------------+-------------------+----------+---------+--------------------+ 
|    B02765|2015-05-08 19:05:...|    B02764|  262|Manhattan|  Yorkville East| 
|    B02765|2015-05-08 19:06:...|    B00013|  234|Manhattan|   Union Sq| 
|    B02765|2015-05-08 19:06:...|    B02765|  107|Manhattan|   Gramercy| 
|    B02765|2015-05-08 19:06:...|    B02765|  137|Manhattan|   Kips Bay| 
|    B02765|2015-05-08 19:06:...|    B02765|  220| Bronx|Spuyten Duyvil/Ki...| 
|    B02765|2015-05-08 19:06:...|    B02765|  138| Queens| LaGuardia Airport| 
|    B02765|2015-05-08 19:06:...|    B02749|  143|Manhattan| Lincoln Square West| 
|    B02765|2015-05-08 19:06:...|    B02765|  244|Manhattan|Washington Height...| 
|    B02765|2015-05-08 19:06:...|    B02617|  262|Manhattan|  Yorkville East| 
|    B02765|2015-05-08 19:06:...|    B02765|  144|Manhattan| Little Italy/NoLiTa| 
|    B02765|2015-05-08 19:06:...|    B00381|  209|Manhattan|    Seaport| 
|    B02765|2015-05-08 19:06:...|    B02765|  234|Manhattan|   Union Sq| 
|    B02765|2015-05-08 19:06:...|    B02765|  163|Manhattan|  Midtown North| 
|    B02765|2015-05-08 19:06:...|    B02765|  181| Brooklyn|   Park Slope| 
|    B02765|2015-05-08 19:06:...|    B02765|  116|Manhattan| Hamilton Heights| 
|    B02765|2015-05-08 19:06:...|    B02765|  236|Manhattan|Upper East Side N...| 
|    B02765|2015-05-08 19:06:...|    B02765|  140|Manhattan|  Lenox Hill East| 
|    B02765|2015-05-08 19:07:...|    B02765|  162|Manhattan|  Midtown East| 
|    B02765|2015-05-08 19:07:...|    B02788|  263|Manhattan|  Yorkville West| 
|    B02765|2015-05-08 19:07:...|    B02765|  181| Brooklyn|   Park Slope| 
+--------------------+--------------------+-------------------+----------+---------+--------------------+ 

而且我需要根據pickup日期字段對星期幾進行分組和計數。其結果必然是這樣

dayofweek count 
1   -> 234 (Monday) 
2   -> 343 (Tuesday) 

等等

任何幫助,非常感謝你!

回答

0

您可以使用date_format

from pyspark.sql.functions import date_format 

df.groupBy(date_format(df["pickup_date"], "u").alias("dayofweek")).count() 
+0

偉大的答覆謝謝。你能告訴我,除了「你」之外,我還可以使用多少參數? – UserCode

+0

別擔心,這裏是[鏈接]列表(https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.htm) – UserCode