DataFrame.drop_duplicates和DataFrame.drop不刪除行

我已經讀入一個csv到一個熊貓數據框，它有五列。某些行僅在第二列中具有重複值，我想從數據框中刪除這些行，但既不drop也不drop_duplicates正在工作。DataFrame.drop_duplicates和DataFrame.drop不刪除行

這是我實現：

#Read CSV 
df = pd.read_csv(data_path, header=0, names=['a', 'b', 'c', 'd', 'e']) 

print Series(df.b) 

dropRows = [] 
#Sanitize the data to get rid of duplicates 
for indx, val in enumerate(df.b): #for all the values 
    if(indx == 0): #skip first indx 
     continue 

    if (val == df.b[indx-1]): #this is duplicate rtc value 
     dropRows.append(indx) 

print dropRows 

df.drop(dropRows) #this doesnt work 
df.drop_duplicates('b') #this doesnt work either 

print Series(df.b)

當我之前和之後，他們有相同的長度，我可以明顯地看到重複的還是打印出來的系列df.b。我的實施有什麼問題嗎？

來源

2014-09-06 user3123955

刪除和重複項創建新的數據庫。所以你想要的東西就像這樣：'df = df.drop_duplicates（'b'）' – 2014-09-06 01:47:07

默認情況下，drop和實際上大多數熊貓操作都會返回一個副本，對於一些實際上這些函數可以通過參數'in_place = true'在原始df上執行操作並且不返回副本 – EdChum 2014-09-06 07:06:26

我相信API是以這種方式設計的，以確保內存中的原始數據不會被意外寫入。如果有人考慮它，這會有所幫助。 – ericmjl 2014-09-07 09:15:57

正如評論中所述，drop和drop_duplicates會創建一個新的DataFrame，除非提供了inplace參數。所有這些選項將工作：

df = df.drop(dropRows) 
df = df.drop_duplicates('b') #this doesnt work either 
df.drop(dropRows, inplace = True) 
df.drop_duplicates('b', inplace = True)

來源

2014-09-07 08:01:47 Korem

DataFrame.drop_duplicates和DataFrame.drop不刪除行

回答

相關問題