解析數據

我有數據看起來像下面的文件a.dat：解析數據

01/Jul/2016 00:05:09  8438.2 
01/Jul/2016 00:05:19  8422.4 g

我希望把它們解析成三列：時間表，浮點數，字符串（無或g）

我曾嘗試：

df=pd.read_csv('a.dat',sep='  | ',engine='python')

，其與4列結束了：日期，時間，浮動和g

df=pd.read_csv('a.dat',sep='  | (g)',engine='python')

其給出5列與第1列和4的NaN

有沒有更好的方式來創建沒有任何後處理的datafram？

來源

2016-07-25 Chenming Zhang

您可以使用read_csv：

import pandas as pd 
import io 

temp=u'''01/Jul/2016 00:05:09  8438.2 
01/Jul/2016 00:05:19  8422.4 g''' 
#after testing replace io.StringIO(temp) to filename 
df = pd.read_csv(io.StringIO(temp), 
       sep='\s+', 
       names=['date','time','float','string'], 
       parse_dates=[['date','time']]) 
print (df) 
      date_time float string 
0 2016-07-01 00:05:09 8438.2 NaN 
1 2016-07-01 00:05:19 8422.4  g

或者：

import pandas as pd 
import io 

temp=u'''01/Jul/2016 00:05:09  8438.2 
01/Jul/2016 00:05:19  8422.4 g''' 
#after testing replace io.StringIO(temp) to filename 
df = pd.read_csv(io.StringIO(temp), 
       delim_whitespace=True, 
       names=['date','time','float','string'], 
       parse_dates=[['date','time']]) 
print (df) 
      date_time float string 
0 2016-07-01 00:05:09 8438.2 NaN 
1 2016-07-01 00:05:19 8422.4  g

解決方案與read_fwf：

import pandas as pd 
import io 

temp=u'''01/Jul/2016 00:05:09  8438.2 
01/Jul/2016 00:05:19  8422.4 g''' 
#after testing replace io.StringIO(temp) to filename 
df = pd.read_fwf(io.StringIO(temp), 
       names=['date','time','float','string'], 
       parse_dates=[['date','time']]) 
print (df) 
      date_time float string 
0 2016-07-01 00:05:09 8438.2 NaN 
1 2016-07-01 00:05:19 8422.4  g

你也可以指定列的寬度：

df = pd.read_fwf(io.StringIO(temp), 
       fwidths = [20,12,2], 
       names=['date','time','float','string'], 
       parse_dates=[['date','time']]) 
print (df) 
      date_time float string 
0 2016-07-01 00:05:09 8438.2 NaN 
1 2016-07-01 00:05:19 8422.4  g

來源

2016-07-25 06:49:56 jezrael

回答

相關問題