2012-03-26 59 views
0

我有由R提供的瑞士數據集,其具有下述形式的數據幀:組由包含在數據幀中的各個變量的四分位數

  Fertility Agriculture Examination Education Catholic Infant.Mortality 
Courtelary  80.2  17.0   15  12  9.96    22.2 
Delemont   83.1  45.1   6   9 84.84    22.2 
Franches-Mnt  92.5  39.7   5   5 93.40    20.2 
    .    .   .   .   .  .     . 
    .    .   .   .   .  .     . 
    .    .   .   .   .  .     . 

V. De Geneve  35.0   1.2   37  53 42.34    18.0 
Rive Droite  44.7  46.6   16  29 50.43    18.2 
Rive Gauche  42.8  27.7   22  29 58.33    19.3 

我想知道如果有一個簡單的或簡單的方法,在四組,一個用於教育變量的每個四分位數的數據進行分類,然後得到相應的Infant.Mortality每個省,這樣我就可以得到這樣的:

 Group1stQ   Group1stQ   Group1stQ   Group1stQ 

    <Mortality for  <Mortality for  <Mortality for  <Mortality for 
    1st province  1st province   1st province  1st province 
    on this group>  on this group>  on this group>  on this group> 

    <Mortality for  <Mortality for  <Mortality for  <Mortality for 
    2nd province  2nd province   2nd province  2nd province 
    on this group>  on this group>  on this group>  on this group> 

    <Mortality for  <Mortality for  <Mortality for  <Mortality for 
    3rd province  3rd province   3rd province  3rd province 
    on this group>  on this group>  on this group>  on this group> 
      .     .     .     . 
      .     .     .     . 
      .     .     .     . 

在此先感謝您的幫助!

+0

要clarfiy,你要爲每個分位數的_average_嬰兒死亡率? – MattLBeck 2012-03-26 11:30:12

+0

對不起......我將編輯問題......那不是我所需要的......即使我很困惑......我真的很抱歉 – Throoze 2012-03-26 11:32:52

+0

我假設你的意思是「Group1stQ Group2ndQ Group3rdQ Group4thQ'的列?每個位置是否有多行? – MattLBeck 2012-03-26 11:47:17

回答

4

怎麼樣:

> swiss$qEdu <- cut (swiss$Education, 
        breaks = quantile (swiss$Education, c (0, .25, .5, .75, 1)), 
        include.lowest = TRUE) 

> aggregate (swiss$Infant.Mortality, list (qEdu = swiss$qEdu), FUN = mean) 
    qEdu  x 
1 [1,6] 19.31429 
2 (6,8] 21.93636 
3 (8,12] 19.38182 
4 (12,53] 19.30909 

(我真的不知道你的數字是什麼 - 他們不與平均值我得到一致)

(這是編輯之前... )

(第2編輯:) 後如果你想爲每個省belongig到教育探的那四分之一的Infant.Mortality,使用list()作爲聚合功能:

> aggregate (swiss$Infant.Mortality, list (qEdu = swiss$qEdu), FUN = list) 
    qEdu                     x 
1 [1,6] 20.2, 24.5, 18.7, 21.2, 22.4, 15.3, 21.0, 18.0, 15.1, 19.8, 18.3, 19.4, 20.2, 16.3 
2 (6,8]     20.3, 26.6, 23.6, 24.9, 21.0, 19.1, 20.0, 23.8, 22.5, 20.0, 19.5 
3 (8,12]     22.2, 22.2, 16.5, 22.7, 20.0, 18.0, 16.7, 16.3, 17.8, 20.3, 20.5 
4 (12,53]     20.6, 24.4, 20.2, 10.8, 20.9, 18.1, 18.9, 23.0, 18.0, 18.2, 19.3 

或:

> Infant.Mortality <- lapply (levels (swiss$qEdu), function (x) swiss$Infant.Mortality [swiss$qEdu == x]) 
> names (Infant.Mortality) <- levels (swiss$qEdu) 
> Infant.Mortality 
$`[1,6]` 
[1] 20.2 24.5 18.7 21.2 22.4 15.3 21.0 18.0 15.1 19.8 18.3 19.4 20.2 16.3 

$`(6,8]` 
[1] 20.3 26.6 23.6 24.9 21.0 19.1 20.0 23.8 22.5 20.0 19.5 

$`(8,12]` 
[1] 22.2 22.2 16.5 22.7 20.0 18.0 16.7 16.3 17.8 20.3 20.5 

$`(12,53]` 
[1] 20.6 24.4 20.2 10.8 20.9 18.1 18.9 23.0 18.0 18.2 19.3 
+0

請閱讀編輯和其他評論...我不需要平均數,但像我在新問題中說的那樣對數據進行分組......我真的很抱歉...謝謝你的幫助! – Throoze 2012-03-26 12:16:29

+0

在aggregate()方法中,'x'(第二列)是相應分位數的死亡率值列表?如果是這樣,那麼這正是我需要的!非常感謝你! =) – Throoze 2012-03-26 12:49:09

+0

是的。我通過給出四分位數的因子來聚合Infant.Mortality。而不是計算一些彙總值,我使用'list'函數來獲取所有這些值。 – cbeleites 2012-03-26 13:09:21