2017-03-16 62 views
1

我有Belmont鐵砧的CSV是像下面大熊貓數據幀對象爲datetime分鐘

Year Winner Sire   Time 
2016 Creator Tapit  2:28.51 
2015 Pharoah Pioneerof 2:26.65 
2014 Tonalist Tapit  2:28.52 
2013 Palace Curlin  2:30.70 

「時間」列是對象格式。我想導入我的CSV的方式,這樣可以看到類似下面 -

Year Winner Sire   Time(mins) 
2016 Creator Tapit  148.51 
2015 Pharoah Pioneerof 146.65 
2014 Tonalist Tapit  148.52 
2013 Palace Curlin  150.70 

更確切地說,我想我的時間列到分鐘。我不想在導入後轉換列。我想在導入時轉換數據。

回答

1

您可以使用矢量化str方法split的字符串,然後每個組件轉換爲分鐘爲單位:

In [108]: 
df['Time(mins)'] = df['Time'].str.split(':').str[0].astype(float) * 60 \ 
+ df['Time'].str.split(':').str[1].str.split('.').str[0].astype(float) \ 
+ df['Time'].str.split('.').str[-1].astype(float)/100 
df 

Out[108]: 
    Year Winner  Sire  Time Time(mins) 
0 2016 Creator  Tapit 2:28.51  148.51 
1 2015 Pharoah Pioneerof 2:26.65  146.65 
2 2014 Tonalist  Tapit 2:28.52  148.52 
3 2013 Palace  Curlin 2:30.70  150.70 

感謝主@Jeff使用to_timedelta如果格式化字符串來解析這個提示HH:MM:SS第一:

In [115]: 
df['timedelta'] = pd.to_timedelta('00:0'+ df['Time'], unit='m') 
df 

Out[115]: 
    Year Winner  Sire  Time Time(mins)  timedelta 
0 2016 Creator  Tapit 2:28.51  148.51 00:02:28.510000 
1 2015 Pharoah Pioneerof 2:26.65  146.65 00:02:26.650000 
2 2014 Tonalist  Tapit 2:28.52  148.52 00:02:28.520000 
3 2013 Palace  Curlin 2:30.70  150.70 00:02:30.700000 

這會給你一個timedelta D型比只是一個字符串更有用IMO算術運算將工作:

In [116]: 
df.info() 

<class 'pandas.core.frame.DataFrame'> 
RangeIndex: 4 entries, 0 to 3 
Data columns (total 6 columns): 
Year   4 non-null int64 
Winner  4 non-null object 
Sire   4 non-null object 
Time   4 non-null object 
Time(mins) 4 non-null float64 
timedelta  4 non-null timedelta64[ns] 
dtypes: float64(1), int64(1), object(3), timedelta64[ns](1) 
memory usage: 272.0+ bytes 

如果你想要做閱讀,那麼你可以定義一個自定義功能的轉換,並通過這種對Arg的read_csv

In [131]: 
import io 
import pandas as pd 

t="""Year Winner Sire   Time 
2016 Creator Tapit  2:28.51 
2015 Pharoah Pioneerof 2:26.65 
2014 Tonalist Tapit  2:28.52 
2013 Palace Curlin  2:30.70""" 
​ 
def func(x): 
    return float(x.split(':')[0]) * 60 + float(x.split(':')[1].split('.')[0]) + float(x.split('.')[-1])/100 
​ 
df = pd.read_csv(io.StringIO(t), delim_whitespace=True, converters={'Time':func}) 
df 

Out[131]: 
    Year Winner  Sire Time 
0 2016 Creator  Tapit 148.51 
1 2015 Pharoah Pioneerof 146.65 
2 2014 Tonalist  Tapit 148.52 
3 2013 Palace  Curlin 150.70 
+1

to_timedelta會解析這個(可能需要一個領先的0) – Jeff

+0

@Jeff謝謝你的建議,已經更新了 – EdChum

+0

其實我想導入我的csv這樣。導入後我不想轉換。有沒有辦法做到這一點? –

0

我不知道,但是這可能工作:

df.Time = df.Time.astype(str).apply(lambda x: x.split(':')) 
df.Time = df.Time.apply(lambda x: int(x[0]) * 60 + float(x[1])) 
0

可以使用應用方法變換時間:

import pandas as pd 
df = pd.DataFrame({"Year":[2016,2017], 
        "Time":["2:28.51", "2:26.65"], 
        "Winner":["Creator","Tapit"]}) 
def format_time(s): 
    s = s.replace(".",":") 
    s = s.split(":") 
    s = map(float, s) 
    return round(s[0]*60.+s[1]+s[2]/60., 2) 
df["Time"] = df["Time"].apply(format_time)