在熊貓數據框中獲取重疊年齡段的年齡總和

target_value  title people  start end twitter_map 
0 AGE_13_TO_17  13 to 17  1  13 17 AGE_13_TO_17 
1 AGE_13_TO_24  13 to 24  NaN  13 24   NaN 
2 AGE_13_TO_34  13 to 34  NaN  13 34   NaN 
3 AGE_13_TO_49  13 to 49  NaN  13 49   NaN 
4 AGE_13_TO_54  13 to 54  NaN  13 54   NaN 
5 AGE_OVER_13  Age Over 13 NaN  13 -   NaN 
6 AGE_18_TO_24  18 to 24  7  18 24 AGE_18_TO_24 
7 AGE_18_TO_54  18 to 54  NaN  18 54   NaN 
8 AGE_OVER_18  Age Over 18 NaN  18 -   NaN 
9 AGE_21_TO_34  21 to 34  NaN  21 34   NaN 
10 AGE_21_TO_49  21 to 49  NaN  21 49   NaN 
11 AGE_21_TO_54  21 to 54  NaN  21 54   NaN 
12 AGE_25_TO_34  25 to 34  34  25 34 AGE_25_TO_34 
13 AGE_25_TO_49  25 to 49  NaN  25 49   NaN 
14 AGE_OVER_25 Age Over 25 NaN  25 -   NaN 
15 AGE_35_TO_44  35 to 44  15  35 44 AGE_35_TO_44 
16 AGE_OVER_35 Age Over 35 NaN  35 -   NaN 
17 AGE_45_TO_54  45 to 54  1  45 54 AGE_45_TO_54 
18 AGE_OVER_50 Age Over 50 NaN  50 -   NaN 
19 AGE_55_TO_64  55 to 64  3  55 64 AGE_55_TO_64 
20 AGE_OVER_65   65+  6  65 - AGE_OVER_65 
21   None  All Ages NaN All Ages -   NaN

因此，我有如上所示的這個數據框，其中包含一些年齡開始和年齡結束的值。但是有一些重疊的年齡段。我需要的基礎上，專門值欄填寫正確的人人列在熊貓數據框中獲取重疊年齡段的年齡總和

料到產出的前兩行

target_value title people start end twitter_map 0 AGE_13_TO_17 13 to 17 1 13 17 AGE_13_TO_17 1 AGE_13_TO_24 13 to 24 8 13 24 NaN

來源

2016-11-25 Immad Imtiaz

前三欄已經加入了與過去的三列 –

什麼是預期的輸出是什麼呢？ –

我在前兩行給出了一個示例...我希望它解釋 –

我將在一個簡單的例子工作：

people start end 
    1 13 17 
    NaN 13 24 
    NaN 13 34 
    NaN 13 - 
    7 18 24 
    NaN 18 - 
    34 25 34

首先更換-與無窮大，將所有浮動：

import numpy as np 
df = df.replace({'-': np.inf}).astype(float)

然後選擇其中給出的「人」的數列，這將是輸入：

df_input = df.dropna()

現在定義以下功能：

def func(row): 
    return df_input.loc[ 
      (df_input['start'] >= row['start']) & (df_input['end'] <= row['end']), 
      'people' 
     ].sum()

對於在每一行數據框，它將輸入中滿足定義年齡段條件的所有數字相加（這是無窮大有用的地方）。

最後應用功能：

In [36]: df.apply(func, axis=1) 
Out[36]: 
0  1.0 
1  8.0 
2 42.0 
3 42.0 
4  7.0 
5 41.0 
6 34.0

來源

2016-11-25 09:58:35 IanS

謝謝！有一次，我更快;） – IanS

是的，我知道在說什麼...... – jezrael

在熊貓數據框中獲取重疊年齡段的年齡總和

回答

相關問題