Python - 如何使用熊貓編輯基於另一個CSV的CSV

我希望有人可以提供幫助。Python - 如何使用熊貓編輯基於另一個CSV的CSV

我處於需要根據另一個CSV從一個CSV文件中刪除行的情況。考慮一個簡單的例子：

Time    Some Column 
4/25/2016 06:20:00 A 
4/25/2016 06:20:01 B 
4/25/2016 06:20:02 B 
4/25/2016 06:20:03 B 
4/25/2016 06:20:04 A 
4/25/2016 06:20:05 A

然後，我有另外一個文件：

Time    Block 
4/25/2016 06:20:00 Block B for 10 seconds

我應該能夠使用第二個文件，並讀入我的程序，它會刪除任何行與' B」在‘一些列’10秒6點20分00秒之後，因此，實際上，我需要一些功能將着眼於第一和第二CSV文件，併產生這對我來說：

Time     Some Column 
4/25/2016 06:20:00  A 
4/25/2016 06:20:04  A 
4/25/2016 06:20:05  A

請注意那個我正在處理的CSV有超過300萬行，因此使用像openpyxl這樣慢的東西並不是一個真正的選擇，任何想法？

來源

2017-07-26 MathsIsHard

看看得分最高的答案[這個問題]（https://stackoverflow.com/questions/13851535/how-to-delete-rows-from -a-大熊貓非數據幀爲基礎上-A-條件表達式）。 –

您可以這樣做的一種方法是使用pd.merge_asof來幫助10秒間隔。將Time上的兩個文件合併爲一個tolarance，等於pd.Timedelta(10, unit='s')將file1過濾爲僅'B'。從file1中刪除從merge_asof返回的那些記錄。

from io import StringIO 
csv_file1 = StringIO("""Time    Some Column 
4/25/2016 06:20:00 A 
4/25/2016 06:20:01 B 
4/25/2016 06:20:02 B 
4/25/2016 06:20:03 B 
4/25/2016 06:20:04 A 
4/25/2016 06:20:05 A""") 

csv_file2 = StringIO("""Time    Block 
4/25/2016 06:20:00 Block B for 10 seconds""") 

df1 = pd.read_csv(csv_file1, sep='\s\s+', index_col='Time', engine='python', parse_dates=True) 
df2 = pd.read_csv(csv_file2, sep='\s\s+', index_col='Time', engine='python', parse_dates=True) 

df_out = (df1.drop(pd.merge_asof(df1[df1['Some Column'] == 'B'], 
           df2, 
           right_index=True, 
           left_index=True, 
           tolerance=pd.Timedelta(10, unit='s')).index)) 

print(df_out.reset_index())

輸出：

   Time Some Column 
0 2016-04-25 06:20:00   A 
1 2016-04-25 06:20:04   A 
2 2016-04-25 06:20:05   A

來源

2017-07-26 13:05:53

非常感謝你 – MathsIsHard

@MathsIsHard不客氣。 –

Python - 如何使用熊貓編輯基於另一個CSV的CSV

回答

相關問題