2017-09-24 70 views
0

我對python完全陌生。我想從實際和預計到達日期和時間創建一個名爲到達延遲的新列。我正在嘗試使用Pandas Dataframe進行這種操作。我試過的代碼如下。Python數據框 - 麻煩理解和解碼錯誤

for i in range(0,df_new.shape[0]): 
    if df_new["ACT_ARRIVAL_DATE"][i] == df_new["ARRIVAL_ETA_DATE"][i]: 
     if df_new["ACT_ARRIVAL_TIME"][i] > df_new["ARRIVAL_ETA_TIME"][i]: 
      df_new['Arrival Delay'][i] = df_new["ACT_ARRIVAL_TIME"][i] - 
      df_new["ARRIVAL_ETA_TIME"][i] 
     else: 
      df_new['Arrival Delay'][i] = 0 
    elif df_new["ACT_ARRIVAL_DATE"][i] > df_new["ARRIVAL_ETA_DATE"][i]: 
     if df_new["ACT_ARRIVAL_TIME"][i] > df_new["ARRIVAL_ETA_TIME"[i]: 
      df_new['Arrival Delay'][i] = 24 + (df_new["ACT_ARRIVAL_TIME"][i] - df_new["ARRIVAL_ETA_TIME"][i]) 
    else: 
     df_new['Arrival Delay'][i] = 24 

但我收到以下錯誤。

ValueError        Traceback (most recent call last) 
<ipython-input-60-8dfb865ac5c2> in <module>() 
    1 for i in range(0,df_new.shape[0]): 
----> 2  if df_new["ACT_ARRIVAL_DATE"][i] == df_new["ARRIVAL_ETA_DATE"][i]: 
    3   if df_new[ACT_ARRIVAL_TIME[i]] > df_new[ARRIVAL_ETA_TIME[i]]: 
    4    df_new['Arrival Delay'] = df_new[ACT_ARRIVAL_TIME[i]] - df_new[ARRIVAL_ETA_TIME[i]] 
    5   else: 

C:\Users\3016205\AppData\Local\Continuum\Anaconda3\lib\site- 
packages\pandas\core\generic.py in __nonzero__(self) 
951   raise ValueError("The truth value of a {0} is ambiguous. " 
952       "Use a.empty, a.bool(), a.item(), a.any() or 
a.all()." 
--> 953       .format(self.__class__.__name__)) 
954 
955  __bool__ = __nonzero__ 

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), 
a.item(), a.any() or a.all(). 

請幫我這一點。

注:變量的格式爲datetime64 [NS]

+0

即使在R中,你也不需要做if'賦值迭代,而是使用矢量化的'ifelse()'。 – Parfait

回答

1

行這樣

df_new["ACT_ARRIVAL_DATE"][i] 

需要這樣寫

df_new.loc[i,"ACT_ARRIVAL_DATE"] 

你不應該需要使用的循環,但是一個熊貓for循環看起來像這樣

for index,row in df_new.iterrows(): 
    if row["ACT_ARRIVAL_DATE"] == row["ARRIVAL_ETA_DATE"]: 
     if row["ACT_ARRIVAL_TIME"] > row["ARRIVAL_ETA_TIME"]: 
      df_new.loc[index,'Arrival Delay'] = row["ACT_ARRIVAL_TIME"] - 
      row["ARRIVAL_ETA_TIME"] 
     else: 

避免for循環,你可以做一些布爾索引

df_new.loc[(df_new.ACT_ARRIVAL_DATE == df.ARRIVAL_ETA_DATE) & (df_new.ACT_ARRIVAL_TIME > df_new.ARRIVAL_ETA_TIME),'Arrival Delay'] = df_new.ACT_ARRIVAL_TIME - df_new.ARRIVAL_ETA_TIME 

,只是建立了這一點,爲的可能性休息

0

考慮嵌套np.where()類似的r ifelse()

df_new["Arrival Delay"] = np.where((df_new["ACT_ARRIVAL_DATE"] == df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] > df_new["ARRIVAL_ETA_TIME"]), 
            df_new["ACT_ARRIVAL_TIME"] - df_new["ARRIVAL_ETA_TIME"], 

            np.where((df_new["ACT_ARRIVAL_DATE"] == df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] <= df_new["ARRIVAL_ETA_TIME"]), 0, 

              np.where((df_new["ACT_ARRIVAL_DATE"] > df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] > df_new["ARRIVAL_ETA_TIME"]), 
                 24 + df_new["ACT_ARRIVAL_TIME"] - df_new["ARRIVAL_ETA_TIME"], 24)))