2017-06-01 195 views
2

我有一個Python數據框,其中包含一個名爲「SEGMENT」的列。我想把列分成三列。請看看我想要的輸出用黃色突出顯示。Python:使用Lambda將字符串字段拆分爲3個獨立字段

enter image description here

以下是我已經嘗試了代碼。不幸的是,我甚至無法得到第一個替換聲明的工作。該:不會被 - 取代。任何幫助是極大的讚賞!

df_stack_ranking['CURRENT_AUM_SEGMENT'] = df_stack_ranking['CURRENT_AUM_SEGMENT'].replace(':', '-') 

s = df_stack_ranking['CURRENT_AUM_SEGMENT'].str.split(' ').apply(Series, 1).stack() 

s.index = s.index.droplevel(-1) 

s.name = 'SEGMENT' 

df_stack_ranking.join(s.apply(lambda x: Series(x.split(':')))) 

回答

2

設置

df = pd.DataFrame({'SEGMENT': {0: 'Hight:33-48', 1: 'Hight:33-48', 2: 'Very Hight:80-88'}}) 

df 
Out[17]: 
      SEGMENT 
0  Hight:33-48 
1  Hight:33-48 
2 Very Hight:80-88 

解決方案

使用拆分柱突破3份,然後擴展到創建一個新的DF。

df.SEGMENT.str.split(':|-',expand=True)\ 
    .rename(columns=dict(zip(range(3),\ 
    ['SEGMENT','SEGMENT RANGE LOW','SEGMENT RANGE HIGH']))) 
Out[13]: 
     SEGMENT SEGMENT RANGE LOW SEGMENT RANGE HIGH 
0  Hight    33     48 
1  Hight    33     48 
2 Very Hight    80     88 
0
columns = ['SEGMENT', 'SEGMENT RANGE LOW', 'SEGMENT RANGE HIGH'] 
df['temp'] = df['SEGMENT'].str.replace(': ','-').str.split('-') 
for i, c in enumerate(columns): 
    df[c] = df['temp'].apply(lambda x: x[i]) 
del df['temp'] 

替換冒號連字符,然後分裂的連字符獲得值列表爲3列。然後將值分配給3列中的每一列並刪除臨時列。

0

我會與str.extract使用正則表達式

df.SEGMENT.str.extract('([A-Za-z ]+):(\d+)-(\d+)', expand = True).rename(columns = {0: 'SEGMENT', 1: 'SEGMENT RANGE LOW', 2: 'SEGMENT RANGE HIGH'}) 

    SEGMENT  SEGMENT RANGE LOW SEGMENT RANGE HIGH 
0 High  33     48 
1 High  33     48 
2 Very High 80     88 
2

使用str.split通過這樣做:(|)\s*-\s*\s*意味着零個或多個空格):

df = pd.DataFrame({'SEGMENT': ['Hight: 33 - 48', 'Hight: 33 - 48', 'Very Hight: 80 - 88']}) 

cols = ['SEGMENT','SEGMENT RANGE LOW','SEGMENT RANGE HIGH'] 
df[cols] = df['SEGMENT'].str.split(':\s*|\s*-\s*',expand=True) 
print (df) 
     SEGMENT SEGMENT RANGE LOW SEGMENT RANGE HIGH 
0  Hight    33     48 
1  Hight    33     48 
2 Very Hight    80     88 

解決方案與str.extract

cols = ['SEGMENT','SEGMENT RANGE LOW','SEGMENT RANGE HIGH'] 
df[cols] = df['SEGMENT'].str.extract('([A-Za-z\s*]+):\s*(\d+)\s*-\s*(\d+)', expand = True) 
print (df) 
     SEGMENT SEGMENT RANGE LOW SEGMENT RANGE HIGH 
0  Hight    33     48 
1  Hight    33     48 
2 Very Hight    80     88 
+0

命名列完美地工作!非常感謝你:) – PineNuts0

+0

很高興可以幫忙;) – jezrael

2

因爲我喜歡從str.extract正則表達式

regex = '\s*(?P<SEGMENT>\S+)\s*:\s*(?P<SEGMENT_RANGE_LOW>\S+)\s*-\s*(?P<SEGMENT_RANGE_HIGH>\S+)\s*' 
df.SEGMENT.str.extract(regex, expand=True) 

    SEGMENT SEGMENT_RANGE_LOW SEGMENT_RANGE_HIGH 
0 High    33     48 
1 High    33     48 
2 High    80     88 

設置

df = pd.DataFrame({'SEGMENT': ['High: 33 - 48', 'High: 33 - 48', 'Very High: 80 - 88']})