如何識別具有歸一化向量的數據幀的每個實例爲分位數（如0,0.25,0.5,0.75,1）？

我有一個數據框有20個變量和400k實例。所有的變量用均值0和標準差1歸一化。我想寫一個函數，可以將每個變量的每個實例分類爲分位數。如何識別具有歸一化向量的數據幀的每個實例爲分位數（如0,0.25,0.5,0.75,1）？

Lets say we have a normalized vector 
a <- c(0.2132821 -1.5136988 0.6450274 1.5085178 0.2132821 1.5085178 0.6450274) 

And the quantiles for this vector are 
quant.a <- c(-1.5136988 -1.0819535 0.2132821 1.0767726 1.5085178) 

where -1.5136988 is 0% 
     -1.0819535 is 25% 
     0.2132821 is 50% 
     1.0767726 is 75% 
     1.5085178 is 100% (all are elements in vector 'quant.a') 

Now, I want to classify each element of vector 'a' as follows 
new.a <- c(0.5, 0, 0.75, 1, 0.5, 1, 0.75) 

You can use the following code to workout through the example as it is not possible for me to share the actual data 

# Generate random data 
set.seed(99) 

# All variables are on a scale of 1-9 
a <- floor(runif(500, min = 1, max = 9)) 
b <- floor(runif(500, min = 1, max = 9)) 
c <- floor(runif(500, min = 1, max = 9)) 

# store variables as dataframe 
x <- data.frame(cbind(a,b,c)) 

#Scale variables 
scaled.dat <- data.frame(scale(x)) 

# check that we get mean of 0 and sd of 1 
colMeans(scaled.dat) 
apply(scaled.dat, 2, sd) 

# generate quantiles for each variables 
quantiles <- data.frame(apply(scaled.dat,2,quantile))

預先感謝

來源

2017-09-22 Nikhil

library(dplyr) 
yourdataframe %>% 
    mutate_all(funs(ntile(., 4)/4)

來源

2017-09-23 02:13:39 Brian

嘿，Brian！它是找到第i個分位數的好方法（如分位數1，分位數2，等等）。但我一直在尋找輸出形式（0，.25，.5，.75，1）。但無論如何謝謝。總是善於學習新東西 – Nikhil

@尼克希爾，你只需要包含'/ 4'來獲得這個表示。如果你想要更明確的標籤，你可以使用'percent_rank'代替，然後再次調用'mutate_all（funs（cut（。，0：4/4）））' – Brian

非常感謝你！它運作良好！ – Nikhil

a <- c(0.2132821, -1.5136988, 0.6450274 , 1.5085178 , 0.2132821 , 1.5085178 , 0.6450274) 

quant.a = quantile(a) 

aux_matrix = findInterval(a, quant.a) 

new.a = ifelse(aux_matrix == 1|aux_matrix == 0, 0, 
       ifelse(aux_matrix == 2, 0.5, 
         ifelse(aux_matrix==3,0.75, 
          1))) 

print(new.a) 

0.50 0.00 0.75 1.00 0.50 1.00 0.75

來源

2017-09-22 18:58:19

'臨時= findInterval（一，quant.a）; as.numeric（gsub（「％」，「」，names（quant.a）[ifelse（temp == 1，1，pmin（length（quant.a），temp + 1））]））/ 100' –

謝謝阿列克謝！完美的工作方式，我打算它。 – Nikhil

嘿d.b！試過你的方式。它仍然是1分位數。就像.25而不是.25一樣，它在我的數據框上給出了.5。但我會試圖找出造成這種情況的原因。並感謝您的幫助。 – Nikhil

如何識別具有歸一化向量的數據幀的每個實例爲分位數（如0,0.25,0.5,0.75,1）？

回答

相關問題