您可以使用矢量化str
方法split
的字符串,然後每個組件轉換爲分鐘爲單位:
In [108]:
df['Time(mins)'] = df['Time'].str.split(':').str[0].astype(float) * 60 \
+ df['Time'].str.split(':').str[1].str.split('.').str[0].astype(float) \
+ df['Time'].str.split('.').str[-1].astype(float)/100
df
Out[108]:
Year Winner Sire Time Time(mins)
0 2016 Creator Tapit 2:28.51 148.51
1 2015 Pharoah Pioneerof 2:26.65 146.65
2 2014 Tonalist Tapit 2:28.52 148.52
3 2013 Palace Curlin 2:30.70 150.70
感謝主@Jeff使用to_timedelta
如果格式化字符串來解析這個提示HH:MM:SS
第一:
In [115]:
df['timedelta'] = pd.to_timedelta('00:0'+ df['Time'], unit='m')
df
Out[115]:
Year Winner Sire Time Time(mins) timedelta
0 2016 Creator Tapit 2:28.51 148.51 00:02:28.510000
1 2015 Pharoah Pioneerof 2:26.65 146.65 00:02:26.650000
2 2014 Tonalist Tapit 2:28.52 148.52 00:02:28.520000
3 2013 Palace Curlin 2:30.70 150.70 00:02:30.700000
這會給你一個timedelta
D型比只是一個字符串更有用IMO算術運算將工作:
In [116]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 6 columns):
Year 4 non-null int64
Winner 4 non-null object
Sire 4 non-null object
Time 4 non-null object
Time(mins) 4 non-null float64
timedelta 4 non-null timedelta64[ns]
dtypes: float64(1), int64(1), object(3), timedelta64[ns](1)
memory usage: 272.0+ bytes
如果你想要做閱讀,那麼你可以定義一個自定義功能的轉換,並通過這種對Arg的read_csv
:
In [131]:
import io
import pandas as pd
t="""Year Winner Sire Time
2016 Creator Tapit 2:28.51
2015 Pharoah Pioneerof 2:26.65
2014 Tonalist Tapit 2:28.52
2013 Palace Curlin 2:30.70"""
def func(x):
return float(x.split(':')[0]) * 60 + float(x.split(':')[1].split('.')[0]) + float(x.split('.')[-1])/100
df = pd.read_csv(io.StringIO(t), delim_whitespace=True, converters={'Time':func})
df
Out[131]:
Year Winner Sire Time
0 2016 Creator Tapit 148.51
1 2015 Pharoah Pioneerof 146.65
2 2014 Tonalist Tapit 148.52
3 2013 Palace Curlin 150.70
to_timedelta會解析這個(可能需要一個領先的0) – Jeff
@Jeff謝謝你的建議,已經更新了 – EdChum
其實我想導入我的csv這樣。導入後我不想轉換。有沒有辦法做到這一點? –