熊貓將列轉換爲整數不起作用

我試圖從dfA中刪除dfB中的項目。熊貓將列轉換爲整數不起作用

事情是dfA是在對象類型，所以我想轉換爲int。下面是代碼：

dfA = pd.read_excel('small_file.xlsx',header=None) 
dfB = pd.read_csv('large_file.csv',header=None) 

dfA = dfA.convert_objects(convert_numeric=True) 
dfA[0] = pd.to_numeric(dfA[0],errors='coerce') 
dfA = dfA.dropna() 

# converting to int 
dfA[0] = dfA[0].astype(int) # THIS line gets error 


df_output = dfA[~dfA[0].isin(dfB[0])]

這裏是DFA看起來像

   0 
0  2293365227 
1  3045897298 
2  8162414592 
3  9312969810 
...   ...

和DFB

   0 
0   2030000000 
1   2030156119 
2   2030389149 
...   ...

我得到這個錯誤：

ValueError: invalid literal for long() with base 10: 'Goulding'

來源

2017-09-25 VincFort

有了這樣的錯誤，仔細觀察總是很好的。

用途：

dfA.loc[dfA[0].str.contains('Goulding')]

要找到這個地方存在的指數和看到發生了什麼事情。然後創建一個函數，過濾掉不良數據並將其應用於系列。如果您碰到另一個錯誤，請沖洗並重復。

實施例：

def replace_str(x): 
    return re.search('\d+',x).group(0) 

dfA[0] = dfA[0].apply(replace_str)

來源

2017-09-26 01:08:05

它看起來像有一些VA lue - 可能是字符串'Goulding'，它不能轉換爲int。

您可以使用to_numeric並獲得NaN的地方是有問題的值：整列轉換爲整數

dfA[0] = pd.to_numeric(dfA[0], errors='coerce')

之前。

如果你還可以分享excel文件，我可以仔細看看。

來源

2017-09-25 22:24:14

熊貓將列轉換爲整數不起作用

回答

相關問題