2017-07-31 121 views
0

我想僅獲取Spark DataFrame df中的日期時間大於2017-Jul-10 08:35的原始數據。我該怎麼做?如何獲取日期時間大於特定日期時間的aws?

我知道如何提取對應於特定日期時間的行,例如2017-Jul-10,但我不知道如何進行比較,即大於2017-Jul-10 08:35

df = df.filter(df("p_datetime") === "2017-Jul-10") 

回答

1

p_datetime是定製的日期格式,所以你需要轉換爲正確的日期格式進行比較,

下面是一個簡單的例子來表示你的問題

val df = Seq(
    ("2017-Jul-10", "0.26"), 
    ("2017-Jul-9", "0.81"), 
    ("2015-Jul-8", "0.24"), 
    ("2015-Jul-11", "null"), 
    ("2015-Jul-12", "null"), 
    ("2015-Jul-15", "0.13") 
).toDF("datetime", "value") 


val df1 = df.withColumn("datetime", from_unixtime(unix_timestamp($"datetime", "yyyy-MMM-dd"))) 

df1.filter($"datetime".gt(lit("2017-07-10"))).show // greater than 
df1.filter($"datetime" > (lit("2017-07-10"))).show 

輸出:

+-------------------+-----+ 
|   datetime|value| 
+-------------------+-----+ 
|2017-07-10 00:00:00| 0.26| 
+-------------------+-----+ 

df1.filter($"datetime".lt(lit("2017-07-10"))).show //less than 
df1.filter($"datetime" < (lit("2017-07-10"))).show 

輸出:

+-------------------+-----+ 
|   datetime|value| 
+-------------------+-----+ 
|2017-07-09 00:00:00| 0.81| 
|2015-07-08 00:00:00| 0.24| 
|2015-07-11 00:00:00| null| 
|2015-07-12 00:00:00| null| 
|2015-07-15 00:00:00| 0.13| 
+-------------------+-----+ 

df1.filter($"datetime".geq(lit("2017-07-10"))).show // greater than equal to 
df1.filter($"datetime" <= (lit("2017-07-10"))).show 

輸出:

+-------------------+-----+ 
|   datetime|value| 
+-------------------+-----+ 
|2017-07-10 00:00:00| 0.26| 
+-------------------+-----+ 

編輯:您還可以通過只

val df1 = df.withColumn("datetime", unix_timestamp($"datetime", "yyyy-MMM-dd")) //cast to timestamp 

df4.filter($"datetime" >= (lit("2017-07-10").cast(TimestampType))).show 
//cast "2017-07-10" also to timestamp 

希望這有助於比較timestamp