阻止大熊貓刪除文本列中的空格

我試圖將CSV文件加載到熊貓數據框中。 CSV是以分號分隔的。文本列中的值使用雙引號。阻止大熊貓刪除文本列中的空格

文件中的問題：https://www.dropbox.com/s/1xv391gebjzmmco/file_01.csv?dl=0

在文本列中的一個（'TYTUL'）我有以下值：

「00 307 1457 212」

我指定列爲str但是當我打印或導出excel結果時我得到

代替

如何防止大熊貓從去掉空格？

這裏是我的代碼：

import pandas 

df = pandas.read_csv(r'file_01.csv' 
        ,sep = ';' 
        ,quotechar = '"' 
        ,names = ['DATA_OPERACJI' 
           ,'DATA_KSIEGOWANIA' 
           ,'OPIS_OPERACJI' 
           ,'TYTUL' 
           ,'NADAWCA_ODBIORCA' 
           ,'NUMER_KONTA' 
           ,'KWOTA' 
           ,'SALDO_PO_OPERACJI' 
           ,'KOLUMNA_9'] 
        ,usecols = [0,1,2,3,4,5,6,7] 
        ,skiprows = 38 
        ,skipfooter = 3 
        ,encoding = 'cp1250' 
        ,thousands = ' ' 
        ,decimal = ',' 
        ,parse_dates = [0,1] 
        ,converters = {'OPIS_OPERACJI': str 
            ,'TYTUL': str 
            ,'NADAWCA_ODBIORCA': str 
            ,'NUMER_KONTA': str} 
        ,engine = 'python' 
        ) 

df.TYTUL.replace([' +', '^ +', ' +$'], [' ', '', ''],regex=True,inplace=True) #this only removes excessive spaces 

print(df.TYTUL)

我也想出了一個解決辦法（評論#workaround），但我想問一下，如果有更好的方法。

import pandas 

df = pandas.read_csv(r'file_01.csv' 
        ,sep = ';' 
        ,quotechar = '?' #workaround 
        ,names = ['DATA_OPERACJI' 
           ,'DATA_KSIEGOWANIA' 
           ,'OPIS_OPERACJI' 
           ,'TYTUL' 
           ,'NADAWCA_ODBIORCA' 
           ,'NUMER_KONTA' 
           ,'KWOTA' 
           ,'SALDO_PO_OPERACJI' 
           ,'KOLUMNA_9'] 
        ,usecols = [0,1,2,3,4,5,6,7] 
        ,skiprows = 38 
        ,skipfooter = 3 
        ,encoding = 'cp1250' 
        ,thousands = ' ' 
        ,decimal = ',' 
        ,parse_dates = [0,1] 
        ,converters = {'OPIS_OPERACJI': str 
            ,'TYTUL': str 
            ,'NADAWCA_ODBIORCA': str 
            ,'NUMER_KONTA': str} 
        ,engine = 'python' 
        ) 

df.TYTUL.replace([' +', '^ +', ' +$'], [' ', '', ''],regex=True,inplace=True) #this only removes excessive spaces 

df.TYTUL.replace(['^"', '"$'], ['', ''],regex=True,inplace=True) #workaround 

print(df.TYTUL)

來源

2017-03-04 antoni

刪除此行從

,thousands = ' '

我測試你的代碼read_csv，輸出是正確的，而不該選項

'00 307 1457 212'

來源

2017-03-04 17:52:14 Kun

這解決了我描述生成，但另一個問題。以前是float64的列現在是對象。運行時：df.dtypes 前： KWOTA float64 SALDO_PO_OPERACJI float64 後： KWOTA對象 SALDO_PO_OPERACJI對象 – antoni

您可以在轉換器選項添加這兩列，使它們浮動。或者通過執行df.KWOTA.astype（float）來更改數據類型 – Kun

阻止大熊貓刪除文本列中的空格

回答

相關問題