大熊貓的兩列分組

這顯然很簡單，但作爲一個熊貓newbe我卡住了。大熊貓的兩列分組

我有一個CSV文件，其中包含3列，國家，bene_1_count和bene_2_count。

我想計算給定狀態下'bene_1_count'和'bene_2_count'的比率。

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3, 
      'bene_1_count': [np.random.randint(10000, 99999) 
        for _ in range(12)], 
      'bene_2_count': [np.random.randint(10000, 99999) 
        for _ in range(12)]})

我嘗試以下，但它給我一個錯誤：「無對象來連接」

df['ratio'] = df.groupby(['state']).agg(df['bene_1_count']/df['bene_2_count'])

我無法弄清楚如何「可達」的以組的狀態級別來取列的比例。

我需要的列比w.r.t的狀態，就像我希望我的輸出如下：

State  ratio 

    CA 
    WA 
    CO 
    AZ

來源

2017-02-04 Sanjeev

或者，聲明：您可以創建接受數據幀的自定義函數。 groupby將返回子數據框。然後可以使用apply函數將自定義函數應用於每個子數據框。

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3, 
      'bene_1_count': [np.random.randint(10000, 99999) 
        for _ in range(12)], 
      'bene_2_count': [np.random.randint(10000, 99999) 
        for _ in range(12)]}) 

def divide_two_cols(df_sub): 
    return df_sub['bene_1_count'].sum()/float(df_sub['bene_2_count'].sum()) 

df.groupby('state').apply(divide_two_cols)

現在說你想要每一行除以每組的總和（例如AZ的總和）並且還保留所有的原始列。只需調整上述功能（更改計算並返回整個子數據幀）：

def divide_two_cols(df_sub): 
    df_sub['divs'] = df_sub['bene_1_count']/float(df_sub['bene_2_count'].sum()) 
    return df_sub 

df.groupby('state').apply(divide_two_cols)

來源

2017-02-25 01:51:10 ansonw

我相信你首先需要做的是總和狀態計數找到比之前。您可以使用apply訪問df中的其他列，然後將它們存儲在字典中以映射到原始數據框中的相應狀態。

import pandas as pd 
import numpy as np 
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3, 
      'bene_1_count': [np.random.randint(10000, 99999) 
         for _ in range(12)], 
      'bene_2_count': [np.random.randint(10000, 99999) 
         for _ in range(12)]}) 

ratios = df.groupby('state').apply(lambda x: x['bene_1_count'].sum()/
            x['bene_2_count'].sum().astype(float)).to_dict() 

df['ratio'] = df['state'].map(ratios)

來源

2017-02-04 23:59:57 rtk22

Thanks..it正在工作...但它返回一個系列類型，但我想追加計算比例到列數據幀像df ['ratio'] .. – Sanjeev

我更新了我的帖子以將比率添加回原始數據幀。這是你尋找的結果嗎？ – rtk22

太棒了...它正在工作.. – Sanjeev

大熊貓的兩列分組

回答

相關問題