2012-07-16 58 views
2

這是我的小數據集。使用r中的各個值工作的循環

Indvidual <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J") 
Parent1 <- c(NA, NA, "A", "A", "C", "C", "C", "E", "A", NA) 
Parent2 <- c(NA, NA, "B", "C", "D", "D", "D", NA, "D", NA) 
mydf <- data.frame (Indvidual, Parent1, Parent2) 

    Indvidual Parent1 Parent2 
1   A <NA> <NA> 
2   B <NA> <NA> 
3   C  A  B 
4   D  A  C 
5   E  C  D 
6   F  C  D 
7   G  C  D 
8   H  E <NA> 
9   I  A  D 
10  J  <NA>  <NA> 

只要考慮有兩個或一個已知父母的人。我需要通過計算父母的分數來比較和剝奪分數。

規則是parent(parent1或parent2列中的名稱)中的任一個是已知的(不是NA),會得到1個額外的分數加上他們的父母得分。如果有兩位父母知道,最高得分者將被考慮在內。

下面是一個例子:

Individual "A", has both parents unknown so will get score 0 
Indiviudal "C", has both parents known (i.e. A, B) 
will get 0 score (maximum of their parents) 

加1(因爲它具有任一已知的父母之一)從上述數據幀(有解釋)

因此預期輸出是:

Indvidual Parent1 Parent2 Scores  Explanation 
1   A <NA> <NA> 0  0 (Max of parent Scores NA) + 0 (neither parent knwon) 
2   B <NA> <NA> 0  0 (Max of parent Scores NA) + 0 (neither parent knwon) 
3   C  A  B  1 0 (Max of parent Scores) + 1 (either parent knwon)  
4   D  A  C  2  1 (Max of parent scores) + 1 (either parent knwon) 
5   E  C  D  3  2 (Max of parent scores) + 1 (either parent knwon) 
6   F  C  D  3  2 (Max of parent scores) + 1 (either parent knwon) 
7   G  C  D  3  2 (Max of parent scores) + 1 (either parent knwon) 
8   H  E <NA>  4  3 (Max of parent scores) + 1 (either parent knwon) 
9   I  A  D  3  2 (Max of parent scores) + 1 (either parent knwon) 
10  J  <NA> <NA> 0  0 (Max of parent scores NA) + 0 (neither parent knwon) 

說明:隨着循環的進行,它將考慮已計算的分數。 父分數的最大值

編輯:基於追逐的質詢

例如:

Individual C has two parents A and B, each of which has Scores calculated as 0 and 0 
(in row 1 and 2 and column Scores), means that max (c(0,0)) will be 0 

Individual E has parents C and D, whose scores in Scores column is (in row 3 and 4), 
1 and 2, respectively. So maximum of max(c(1,2)) will be 2. 
+0

你能解釋一下「家長分數的最大值」是什麼意思?起初,我認爲這是你需要的,但我不認爲是這種情況:'rowSums(!is.na(mydf [, - 1]))' – Chase 2012-07-16 12:34:55

+0

謝謝Chase,看看我最近的編輯,如果製作一個感覺......這個想法就像我們走下來一樣,我們計算每個人的分數,如果它碰巧是父母,那麼它的分數就會用來計算其子/女兒的分數。 – SHRram 2012-07-16 12:54:11

+0

啊,我現在明白了,那些「個人」的人也可以是父母......好的 - 會考慮這個。但現在更清楚了,謝謝。 – Chase 2012-07-16 13:10:41

回答

1
Individual <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J") 
Parent1 <- c(NA, NA, "A", "A", "C", "C", "C", "E", "A", NA) 
Parent2 <- c(NA, NA, "B", "C", "D", "D", "D", NA, "D", NA) 
mydf <- data.frame (Individual, Parent1, Parent2, stringsAsFactors = FALSE) 

mydf$Scores <- NA 
mydf$Scores[rowSums(is.na(mydf[, c("Parent1", "Parent2")])) == 2] <- 0 
while(any(is.na(mydf$Scores))){ 
    KnownScores <- mydf[!is.na(mydf$Scores), c(1, 4)] 
    ToCalculate <- mydf[ 
    mydf$Parent1 %in% c(KnownScores$Individual, NA) & 
    mydf$Parent2 %in% c(KnownScores$Individual, NA) & 
    is.na(mydf$Scores), 
    -4] 
    ToCalculate$Score <- apply(
    merge(
     merge(
     ToCalculate, 
     KnownScores, 
     by.x = "Parent1", 
     by.y = "Individual", 
     all.x = TRUE 
    ), 
     KnownScores, 
     by.x = "Parent2", 
     by.y = "Individual", 
     all.x = TRUE 
    )[, 4:5], 
    1, 
    max, 
    na.rm = TRUE) + 1 
    mydf <- merge(mydf, ToCalculate[, c(1, 4)], all.x = TRUE) 
    mydf$Scores[!is.na(mydf$Score)] <- mydf$Score[!is.na(mydf$Score)] 
    mydf$Score <- NULL 
} 
+0

我長期這個循環預計finsh在......我等了大約15分鐘,並停止運行......但循環不停止......我怕什麼會發生在我的大數據集....我使用的是RGui 64位...是預期的... – SHRram 2012-07-16 13:57:14

+0

事實上,循環似乎沒有結束......我想了30分鐘 – SHRram 2012-07-16 14:11:10

+0

你是否複製粘貼我的代碼與mydf data.frame或你使用另一個data.frame ?因爲我得到了一個直接的結果。如果您使用自己的數據,那麼數據可能有問題。例如。不包括在個人中的父母。手動運行循環,看看是否有任何ToCalculate $ Score變成不適用 – Thierry 2012-07-16 14:18:51

2

實施例使用plyr和一個遞歸參數

library(plyr) 
Indvidual <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J") 
Parent1 <- c(NA, NA, "A", "A", "C", "C", "C", "E", "A", NA) 
Parent2 <- c(NA, NA, "B", "C", "D", "D", "D", NA, "D", NA) 
mydf <- data.frame (Indvidual, Parent1, Parent2) 
scor.fun<-function(x,mydf){ 
    Explanation<-0 
    P1<-as.character(x$Parent1) 
    P2<-as.character(x$Parent2) 
    score<-as.numeric(!(is.na(P1)||is.na(P1))) 
    if(!(is.na(P1)||is.na(P2))){ 
     Explanation<-max(scor.fun(subset(mydf,Indvidual==P1),mydf)[1],scor.fun(subset(mydf,Indvidual==P2),mydf)[1]) 
     score<-score+Explanation 
    }else{ 
     Explanation<-ifelse(is.na(P1),0,scor.fun(subset(mydf,Indvidual==P1),mydf)[1]) 
     Explanation<-max(Explanation,ifelse(is.na(P2),0,scor.fun(subset(mydf,Indvidual==P2),mydf)[1])) 
     score<-score+Explanation 
    } 
    c(score,Explanation) 
} 

adply(mydf,1,scor.fun,mydf) 

大概不會最好的在大數據框上遞歸的想法。