我有Pandas.read_csvPandas.read_csv錯誤符號化數據
麻煩我想讀這個文本文件(見下文)當我將這些數據複製到Excel>文本列>由「空間」分隔它給了我正在尋找的確切輸出。
我嘗試了一堆不同的方法,我認爲regEx佔多個空間會做的伎倆,但我沒能使它工作。
我試試這個代碼:
petrelTxt = pd.read_csv(petrelfile, sep = ' ', header = None)
,它給我的錯誤
CParserError: Error tokenizing data. C error: Expected 6 fields in line 2, saw 17
當我嘗試改變 「SEP = '\ S +'」 這使得它越往下的文件,但仍然不起作用。
petrelTxt = pd.read_csv(petrelfile, sep = '\s+', header = None)
CParserError: Error tokenizing data. C error: Expected 5 fields in line 3, saw 6
這是原來的txt文件:
# WELL TRACE FROM PETREL
# WELL NAME: ZZ-0113
# WELL HEAD X-COORDINATE: 9999999.00000000 (m)
# WELL HEAD Y-COORDINATE: 9999999.00000000 (m)
# WELL KB: 159.00000000 (ft)
# WELL TYPE: OIL
# MD AND TVD ARE REFERENCED (=0) AT KB AND INCREASE DOWNWARDS
# ANGLES ARE GIVEN IN DEGREES
# XYZ TRACE IS GIVEN IN COORDINATE SYSTEM WGS_1924_UTM_Zone_42N
# AZIMUTH REFERENCE TRUE NORTH
# DX DY ARE GIVEN IN GRID NORTH IN m-UNITS
# DEPTH (Z, TVD) GIVEN IN ft-UNITS
#======================================================================================================================================
MD X Y Z TVD DX DY AZIM INCL DLS
#======================================================================================================================================
0.0000000000 999999.00000 9999999.0000 159.00000000 0.0000000000 0.0000005192 -0.000000000 1.3487006929 0.0000000000 0.0000000000
132.00000000 999999.08032 9999999.9116 27.000774702 131.99922530 0.0803153923 -0.088388779 139.08870069 0.3400000000 0.2575757504
221.00000000 999999.19115 9999999.8017 -61.99775149 220.99775149 0.1911487882 -0.198290891 132.93870069 0.3200000000 0.0456726104
它是不正確的CSV文件,因爲您有評論。我記得'read_csv'有選擇跳過一些行。 – furas
閱讀文檔:[read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)它可以選擇跳過一些行和識別評論 - 你必須使用「評論=「#」' – furas