2017-10-29 141 views
-1

我在大熊貓數據幀以下列柱: ​​熊貓數據幀添加基於字符串

在「統計」列,每個統計以由空格隔開。我想爲每個統計信息創建新的列。問題是不是每行都有每種類型的屬性。例如。第2行沒有「trey」。我該如何完成這一壯舉?

我想這一點,但每個「後,剛添加了新列:

nba_2017_revised4 = nba_2017_revised3.join(nba_2017_revised3['Stats'].str.split(' ', 7, expand=True).rename(columns={0:'Points', 1:'Rebounds', 2:'Assists', 3:'Steals', 4:'Turnovers', 5:'3_Pointers', 6:'FG_Attempts', 7:'FT_Attempts'})) 

enter image description here

 
Date First Last Stats Minutes DKP Team Opp DRPM 
0 20170412.0 Ron Baker 8pt 1rb 4as 2to 1trey 3-6fg 1-2ft 29.350000 14.75 nyk phi -0.56 
1 20170409.0 Ron Baker 11pt 8rb 8as 1st 2to 1trey 5-12fg 38.100000 34.50 nyk tor -0.56 
2 20170407.0 Ron Baker 2pt 2rb 7as 1to 1-7fg 30.500000 14.50 nyk mem -0.56 
3 20170406.0 Ron Baker 12pt 2rb 2as 2to 5-9fg 2-2ft 27.166667 16.50 nyk was -0.56 
4 20170404.0 Ron Baker 9pt 4rb 6as 2st 4to 1trey 4-7fg 0-1ft 37.300000 25.50 nyk chi -0.56 

感謝。

+4

沒有圖片請以文本形式添加數據。我們如何複製數據以嘗試我們的解決方案。 – Dark

+0

什麼是預期輸出 – Dark

+0

nba_2017_revised4 = nba_2017_revised3.join(nba_2017_revised3 ['Stats']。str.split('',7,expand = True).rename(columns = {0:'Points',1''Rebounds' ,2:'助攻',3:'搶斷',4:'失誤',5:'3_Pointers',6:'FG_Attempts',7:'FT_Attempts'})) –

回答

1

我會使用正則表達式來解析每個拆分,通過在最後連續字母作爲列值和字符串之前的值作爲值。

import pandas as pd 
import re 

pat = lambda x: re.match('^(.+?)([a-z]+)$', x).groups() 
prs = lambda s: pd.Series(*zip(*[pat(x) for x in s.split()])) 

df.drop('Stats', 1).join(df.Stats.apply(prs)) 

     Date First Last Minutes DKP Team Opp DRPM as fg ft pt rb st to trey 
0 20170412 Ron Baker 29.350000 14.75 nyk phi -0.56 4 3-6 1-2 8 1 NaN 2 1 
1 20170409 Ron Baker 38.100000 34.50 nyk tor -0.56 8 5-12 NaN 11 8 1 2 1 
2 20170407 Ron Baker 30.500000 14.50 nyk mem -0.56 7 1-7 NaN 2 2 NaN 1 NaN 
3 20170406 Ron Baker 27.166667 16.50 nyk was -0.56 2 5-9 2-2 12 2 NaN 2 NaN 
4 20170404 Ron Baker 37.300000 25.50 nyk chi -0.56 6 4-7 0-1 9 4 2 4 1 
+0

我很高興看到你的結果,piRSquared,但我在嘗試代碼時出現以下錯誤:AttributeError:'float'object has no attribute'split' –

+0

@ GilO'Brien這是因爲'Stats'中的一些值列是'np.nan',它們被定義爲'float'。你應該用'''填充na值。嘗試這個'df.drop('Stats',1).join(df.Stats.fillna('')。apply(prs))' – piRSquared

+1

它工作!我欠你一杯啤酒! –