1
na_values表達我想讀這樣的文件中使用pandas.read_csv
經常使用pandas.read_csv
1891, 91920, 7, 628,249, 59,51.0, 0.026, 0.028, NaN, NaN, NaN, NaN, NaN, 0.156, 0.071, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 21,43.8, 0.005, 0.619, NaN,45.6, 0.048, 0.053, NaN, NaN, NaN, NaN, NaN, -0.180, 0.088, 20, 0.012, 1.107, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN
1891, 91920, 16, 628,135, 22,41.2, 0.093, 0.087, NaN, NaN, NaN, NaN, NaN, 0.416, 0.212, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 20,23.3, 0.021, 2.023, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN
1891, 91920, 3, 628, 28, 39,47.0, 0.041, 0.044, NaN, NaN, NaN, NaN, NaN, -0.006, 0.064, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 21,37.5, 0.009, 0.964, NaN,45.3, 0.054, 0.055, NaN, NaN, NaN, NaN, NaN, -0.838, 0.228, 20, 0.013, 1.193, NaN,51.8, 0.025, 0.026, NaN, NaN, NaN, NaN, NaN, -0.021, 0.054, 21, 0.005, 0.540, NaN, NaN, NaN, NaN
1891, 91920, 6, 628,276, 20,40.0, 0.118, 0.101, NaN, NaN, NaN, NaN, NaN, -0.767, 0.558, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 20,26.7, 0.032, 2.982, NaN,41.0, 0.088, 0.089, NaN, NaN, NaN, NaN, NaN, -0.141, 0.233, 20, 0.024, 2.074, NaN,46.2, 0.053, 0.049, NaN, NaN, NaN, NaN, NaN, 0.080, 0.034, 21, 0.012, 1.187, NaN, NaN, NaN, NaN
我想讀它,因爲NaN值的問題。如果文件是一個csv文件(昏迷分離),我沒有問題,但它有空格。當我讀到它時使用:
df = pd.read_csv(file,index_col=None, header=None)
很明顯,帶有NaN的列被讀爲字符串,因爲空格。如果空間具有相同的維度,我的問題很容易。我可以使用:
df = pd.read_csv(file,index_col=None, header=None, na_values = " NaN")
並解決了問題,但有不同的空格的列。其中一些在NaN之前有4個空間,其他的有6個,等等。
所以,我的問題是:是否有一個正則表達式指定na_values
類似na_values = "\s+ NaN"
?
爲什麼不使用正則表達式*分隔符*,比如'sep =「,\ s +」'? – BrenBarn
或者,您可以使用'delim_whitespace = True'或'skipinitialspace = True'參數 – MaxU
@BrenBam skipinitialspace = True正常工作,謝謝。但是sep =「,\ s +」不起作用 – nandhos