2016-11-08 40 views
1

na_values表達我想讀這樣的文件中使用pandas.read_csv經常使用pandas.read_csv

1891, 91920, 7,  628,249, 59,51.0, 0.026, 0.028, NaN, NaN, NaN, NaN, NaN, 0.156, 0.071, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 21,43.8, 0.005, 0.619, NaN,45.6, 0.048, 0.053, NaN, NaN, NaN, NaN, NaN, -0.180, 0.088, 20, 0.012, 1.107, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,  NaN,  NaN,  NaN 
1891, 91920, 16,  628,135, 22,41.2, 0.093, 0.087, NaN, NaN, NaN, NaN, NaN, 0.416, 0.212, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 20,23.3, 0.021, 2.023, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,  NaN,  NaN,  NaN 
1891, 91920, 3,  628, 28, 39,47.0, 0.041, 0.044, NaN, NaN, NaN, NaN, NaN, -0.006, 0.064, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 21,37.5, 0.009, 0.964, NaN,45.3, 0.054, 0.055, NaN, NaN, NaN, NaN, NaN, -0.838, 0.228, 20, 0.013, 1.193, NaN,51.8, 0.025, 0.026, NaN, NaN, NaN, NaN, NaN, -0.021, 0.054, 21, 0.005, 0.540, NaN,  NaN,  NaN,  NaN 
1891, 91920, 6,  628,276, 20,40.0, 0.118, 0.101, NaN, NaN, NaN, NaN, NaN, -0.767, 0.558, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 20,26.7, 0.032, 2.982, NaN,41.0, 0.088, 0.089, NaN, NaN, NaN, NaN, NaN, -0.141, 0.233, 20, 0.024, 2.074, NaN,46.2, 0.053, 0.049, NaN, NaN, NaN, NaN, NaN, 0.080, 0.034, 21, 0.012, 1.187, NaN,  NaN,  NaN,  NaN 

我想讀它,因爲NaN值的問題。如果文件是一個csv文件(昏迷分離),我沒有問題,但它有空格。當我讀到它時使用:

df = pd.read_csv(file,index_col=None, header=None) 

很明顯,帶有NaN的列被讀爲字符串,因爲空格。如果空間具有相同的維度,我的問題很容易。我可以使用:

df = pd.read_csv(file,index_col=None, header=None, na_values = " NaN") 

並解決了問題,但有不同的空格的列。其中一些在NaN之前有4個空間,其他的有6個,等等。

所以,我的問題是:是否有一個正則表達式指定na_values類似na_values = "\s+ NaN"

+1

爲什麼不使用正則表達式*分隔符*,比如'sep =「,\ s +」'? – BrenBarn

+1

或者,您可以使用'delim_whitespace = True'或'skipinitialspace = True'參數 – MaxU

+0

@BrenBam skipinitialspace = True正常工作,謝謝。但是sep =「,\ s +」不起作用 – nandhos

回答

0

試試這個:

df = pd.read_csv(engine='python', index_col=None, sep=',\s*', header=None) 

解析引擎設爲python避免警告當您使用正則表達式作爲分隔符你。