這很好,你有你的滾動數據表,這使計算分位數的工作更容易。
第1步:由參與者,條件組,位置
Individ <- data.frame(Participant = c("Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill",
"Harry", "Harry", "Harry", "Harry","Harry", "Harry", "Harry", "Harry", "Paul", "Paul", "Paul", "Paul"),
Time = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
Condition = c("Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr",
"Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr"),
Location = c("Home", "Home", "Home", "Home", "Away", "Away", "Away", "Away", "Home", "Home", "Home", "Home",
"Home", "Home", "Home", "Home", "Away", "Away", "Away", "Away", "Home", "Home", "Home", "Home"),
Power = c(400, 250, 180, 500, 300, 450, 600, 512, 300, 500, 450, 200, 402, 210, 130, 520, 310, 451, 608, 582, 390, 570, NA, NA))
library(dplyr)
library(zoo)
for (summaryFunction in c("mean")) {
for (i in seq(2, 4, by = 1)) {
tempColumn <- Individ %>%
group_by(Participant) %>%
transmute(rollapply(Power,
width = i,
FUN = summaryFunction,
align = "right",
fill = NA,
na.rm = T))
colnames(tempColumn)[2] <- paste("Rolling", summaryFunction, as.character(i), sep = ".")
Individ <- bind_cols(Individ, tempColumn[2])
}
}
Individ
Participant Time Condition Location Power Rolling.mean.2 Rolling.mean.3 Rolling.mean.4
(fctr) (dbl) (fctr) (fctr) (dbl) (dbl) (dbl) (dbl)
1 Bill 1 Placebo Home 400 NA NA NA
2 Bill 2 Placebo Home 250 325 NA NA
3 Bill 3 Placebo Home 180 215 276.6667 NA
4 Bill 4 Placebo Home 500 340 310.0000 332.5
5 Bill 1 Expr Away 300 400 326.6667 307.5
6 Bill 2 Expr Away 450 375 416.6667 357.5
7 Bill 3 Expr Away 600 525 450.0000 462.5
8 Bill 4 Expr Away 512 556 520.6667 465.5
9 Bill 1 Expr Home 300 406 470.6667 465.5
10 Bill 2 Expr Home 500 400 437.3333 478.0
讓所有7或8列(該數據集包括位置),所以它回答對方的問題,以及在新的Individ後數據集,這是我做了什麼來解決你的問題。我100%肯定有一個更清潔和更有效的方式來做到這一點,但這裏有邏輯,它應該輸出很好。
步驟2:獲取位數爲基
library(plyr)
Individ[is.na(Individ)]<- 0
Top_percentiles <- ddply(Individ,
c("Participant", "Condition", "Location"),
summarise,
Power2 = quantile(Rolling.mean.2, .95),
Power3 = quantile(Rolling.mean.3, .95),
Power4 = quantile(Rolling.mean.4, .95)
)
Top_percentiles
Participant Condition Location Power2 Power3 Power4
1 Bill Expr Away 551.350 510.0667 465.050
2 Bill Expr Home 464.650 465.6667 476.125
3 Bill Placebo Home 337.750 305.0000 282.625
4 Harry Expr Away 585.175 533.4000 485.425
5 Harry Placebo Home 322.150 280.7667 268.175
6 Paul Expr Home 556.500 556.5000 408.000
其是用於爲每個組和相應的滾動平均值的前5%的閾值。
現在唯一要做的就是計算數據集中高於每個閾值的觀測值。
第3步:匹配滾動平均值列與原始數據集
像這樣的事情是有點什麼,我擺弄周圍。
Individ$Power2 <- Top_percentiles$Power2[match(Individ$Participant, Top_percentiles$Participant) &&
match(Individ$Condition, Top_percentiles$Condition) &&
match(Individ$Location, Top_percentiles$Location)]
Individ$Power3 <- Top_percentiles$Power3[match(Individ$Participant, Top_percentiles$Participant) &&
match(Individ$Condition, Top_percentiles$Condition) &&
match(Individ$Location, Top_percentiles$Location)]
Individ$Power4 <- Top_percentiles$Power4[match(Individ$Participant, Top_percentiles$Participant) &&
match(Individ$Condition, Top_percentiles$Condition) &&
match(Individ$Location, Top_percentiles$Location)]
Individ
Participant Time Condition Location Power Rolling.mean.2 Rolling.mean.3 Rolling.mean.4 Power2 Power3
(fctr) (dbl) (fctr) (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 Bill 1 Placebo Home 400 0 0.0000 0.0 551.350 510.0667
2 Bill 2 Placebo Home 250 325 0.0000 0.0 464.650 465.6667
3 Bill 3 Placebo Home 180 215 276.6667 0.0 337.750 305.0000
4 Bill 4 Placebo Home 500 340 310.0000 332.5 585.175 533.4000
5 Bill 1 Expr Away 300 400 326.6667 307.5 322.150 280.7667
6 Bill 2 Expr Away 450 375 416.6667 357.5 556.500 556.5000
7 Bill 3 Expr Away 600 525 450.0000 462.5 551.350 510.0667
8 Bill 4 Expr Away 512 556 520.6667 465.5 464.650 465.6667
9 Bill 1 Expr Home 300 406 470.6667 465.5 337.750 305.0000
10 Bill 2 Expr Home 500 400 437.3333 478.0 585.175 533.4000
我的想法是將分位列匹配到Individual數據集。
第4步:篩選數據集
這應該得到你想要的,你想要的。
選項1:三個獨立的數據集
top_percentile_2sec <- Individ %>% filter(Rolling.mean.2 >= Power2)
top_percentile_3sec <- Individ %>% filter(Rolling.mean.3 >= Power3)
top_percentile_4sec <- Individ %>% filter(Rolling.mean.4 >= Power4)
選項2:一個大的數據集合並
top_percentile_all_times <- Individ %>% filter(Rolling.mean.2 >= Power2 | Rolling.mean.3 >= Power3 | Rolling.mean.4 >= Power4)
top_percentile_all_times
Participant Time Condition Location Power Rolling.mean.2 Rolling.mean.3 Rolling.mean.4 Power2 Power3
(fctr) (dbl) (fctr) (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 Bill 1 Expr Away 300 400.0 326.6667 307.50 322.15 280.7667
2 Bill 4 Expr Away 512 556.0 520.6667 465.50 464.65 465.6667
3 Bill 1 Expr Home 300 406.0 470.6667 465.50 337.75 305.0000
4 Bill 3 Expr Home 450 475.0 416.6667 440.50 322.15 280.7667
5 Harry 1 Expr Away 310 415.0 320.0000 292.50 322.15 280.7667
6 Harry 3 Expr Away 608 529.5 456.3333 472.25 551.35 510.0667
7 Harry 4 Expr Away 582 595.0 547.0000 487.75 464.65 465.6667
8 Paul 3 Expr Home 0 570.0 480.0000 0.00 322.15 280.7667
9 Paul 4 Expr Home 0 0.0 570.0000 480.00 556.50 556.5000
下面是一個鏈接,極大地幫助了我。
how to calculate 95th percentile of values with grouping variable in R or Excel
這是否解決了從其他後你的問題呢?
這個能幫忙嗎? http://stackoverflow.com/questions/19608618/r-percentile-calculations-on-subsets-of-data – 2016-02-29 05:54:09
是的,它是有益的,謝謝你的鏈接。然而,我怎樣才能將每個參與者的所有出現次數都大於95%?我不瞭解其他分位數。 – user2716568
如果我正確理解你的問題,用'dplyr'就可以得到'df%>%group_by(Participant)%>%filter(between(Power, ,1,na.rm = TRUE)))' – alistaire