我想從Python中的熊貓數據框中刪除重複的條目。 DataFrame由多個* .csv文件的垂直concatenated內容組成。下面是數據框:熊貓識別列1中的所有行,其中列2,3中出現重複
print df
file TestA TestB
One.csv 7513 -643.1
One.csv 15347 NaN
One.csv NaN 22.7
One.csv 46321 NaN
One.csv NaN 156.1
One.csv 2477 52.7
Two.csv 417 1473.5
Two.csv 7513 -643.1
Two.csv 15347 NaN
Two.csv NaN 22.7
Two.csv 46321 NaN
Two.csv NaN 156.1
Three.csv -4341 NaN
Three.csv 34473 437
Three.csv 1349 NaN
Four.csv 17 NaN
Four.csv 107 NaN
Four.csv -931 44536
Four.csv 6285 NaN
Four.csv 119 34722
我想做到以下幾點:一 。
print("Rows %s of %s are duplicated in rows %s of %s. Rows from %s will now be removed from the DataFrame.") % ([1,2,3,4,5],'One.csv',[2,3,4,5,6],'Two.csv', 'One.csv')
我想print
聲明這樣的結果:喜歡的東西
Rows [1,2,3,4,5] of One.csv are duplicated in rows [2,3,4,5,6] of Two.csv. Rows from One.csv will now be removed from the DataFrame.
我不知道如何識別行和設置他們在print
聲明。
有沒有方法根據第1列的行號識別重複的行(FileName
)?
編輯: 要創建DataFrame df
,請從這裏選擇並複製DataFrame到剪貼板。然後用這個:
import pandas as pd
df = pd.read_clipboard()
print df
嗨,我已經添加到原來的職位,我想'印刷'輸出。我想從'FileName'列的2' * .csv'文件名中找到重複行號的列表。在'One.csv'中,我想要一個表示'[1,2,3,4,5]'的列表,並且在'Two.csv'中我想要一個表示[2,3,4,5,6 ]。 – 2015-04-03 15:58:12