R For循環失敗應用最大功能

我的前提我與R新，實際上我試圖獲得基本面。目前我正在處理大型數據框（稱爲「ppl」），爲了過濾一些行，我必須對其進行編輯。每行都包含在一個組中，它的特點是強度（到）值和樣本值。R For循環失敗應用最大功能

 mz rt  into sample tracker  sn grp 
100.0153 126 2.762664  3 11908 7.522655 0 
100.0171 127 2.972048  2 5308 7.718521 0 
100.0788 272 30.217969  2 5309 19.024807 1 
100.0796 272 17.277916  3 11910 7.297716 1 
101.0042 128 37.557324  3 11916 27.991320 2 
101.0043 128 39.676014  2 5316 28.234918 2

那麼，第一個問題是：「我怎樣才能從每個組中選擇最高強度的樣本？」我試過一個循環：

for (i in ppl$grp) { 
temp<-ppl[ppl$grp == i,] 
sel<-rbind(sel,temp[max(temp$into),]) 
}

事實是，它適用於$ GRP == 0脂肪酶，但下個週期返回來港行。然後，過濾的數據幀（稱爲「sel」）也應該存儲已除去的行的樣本值。它應該是如下：

 mz rt  into sample tracker  sn grp 
100.0171 127 2.972048 c(2,3) 5308 7.718521 0 
100.0788 272 30.217969 c(2,3) 5309 19.024807 1 
101.0043 128 39.676014 c(2,3) 5316 28.234918 2

爲了得到這個我會用這種方法：

lev<-factor(ppl$grp) 
samp<-ppl$sample 
samp2<-split(samp,lev) 
sel$sample<-samp2

任何提示？因爲我還沒有解決以前的問題，所以我無法測試它。

非常感謝。

來源

2016-09-19 AeonRed

使用ave甲base R選項是

ppl[with(ppl, ave(into, grp, FUN = max)==into),]

如果在預期的輸出的 '樣本' 列具有在每個unique元件'grp'，然後在'grp'分組之後向上日期'sample'作爲'sample'的paste d unique元素，然後arrange'進入'下降和slice第一行。

library(dplyr) 
ppl %>% 
    group_by(grp) %>% 
    mutate(sample = toString(sort(unique(sample)))) %>% 
    arrange(desc(into)) %>% 
    slice(1L) 
#  mz rt  into sample tracker  sn grp 
#  <dbl> <int>  <dbl> <chr> <int>  <dbl> <int> 
#1 100.0171 127 2.972048 2, 3 5308 7.718521  0 
#2 100.0788 272 30.217969 2, 3 5309 19.024807  1 
#3 101.0043 128 39.676014 2, 3 5316 28.234918  2

來源

2016-09-20 02:58:56 akrun

不知道我是否按照你的問題。但也許這會讓你開始。

library(dplyr) 
ppl %>% group_by(grp) %>% filter(into == max(into))

來源

2016-09-19 18:28:12 user51855

甲data.table替代：

library(data.table) 
setkey(setDT(ppl),grp) 
ppl <- ppl[ppl[,into==max(into),by=grp]$V1,] 
##   mz rt  into sample tracker  sn grp 
##1: 100.0171 127 2.972048  2 5308 7.718521 0 
##2: 100.0788 272 30.217969  2 5309 19.024807 1 
##3: 101.0043 128 39.676014  2 5316 28.234918 2

來源

2016-09-19 21:03:00 aichao

我不知道爲什麼這段代碼會工作

for (i in ppl$grp) { 
    temp<-ppl[ppl$grp == i,] 
    sel<-rbind(sel,temp[max(temp$into),]) 
}

MAX（臨時$成）應該返回最大值 - 這似乎不是在大多數情況下一個整數。

另外，在每個for循環實例中用rbind構建一個data.frame並不是很好的做法（用任何語言）。它需要退出一些類型檢查和陣列增長，可能會非常昂貴。

此外，當該組有任何NAs時，max將返回NA。

還有一個問題，你想要做什麼關係？你只想要一個結果還是全部？代碼Akrun會給你所有的人。

此代碼將寫有該組最大

ppl$grpmax <- ave(ppl$into, ppl$grp, FUN=function(x) { max(x, na.rm=TRUE) })

然後，您可以選擇一組中如果你想只是等於最大與

pplmax <- subset(ppl, into == grpmax)

所有值的新列然後你可以刪除重複項

pplmax[!duplicated(pplmax$grp),]

來源

2016-09-20 05:46:16

R For循環失敗應用最大功能

回答

相關問題