2016-02-05 75 views
2

如果我有1223455567 1777666666我想輸出爲5和6。我怎樣才能在R語言中做到這一點?如何計算一個大型數據集中每10個數字集的模式(統計)

我知道如何找到每10個數據的平均值,但我想要的是模式。

這裏是我試過的平均

mean10 <- aggregate(level, list(rep(1:(nrow(level) %/% n+1),each = n, len = nrow(level))), mean)[-1];

,有一個功能模式如下:

MODE <- function(dataframe){ 
    DF <- as.data.frame(dataframe) 

    MODE2 <- function(x){  
    if (is.numeric(x) == FALSE){ 
     df <- as.data.frame(table(x)) 
     df <- df[order(df$Freq), ]   
     m <- max(df$Freq)   
     MODE1 <- as.vector(as.character(subset(df, Freq == m)[, 1])) 

     if (sum(df$Freq)/length(df$Freq)==1){ 
      warning("No Mode: Frequency of all values is 1", call. = FALSE) 
     }else{ 
      return(MODE1) 
     } 

    }else{ 
     df <- as.data.frame(table(x)) 
     df <- df[order(df$Freq), ]   
     m <- max(df$Freq)   
     MODE1 <- as.vector(as.numeric(as.character(subset(df, Freq == m)[, 1]))) 

     if (sum(df$Freq)/length(df$Freq)==1){ 
      warning("No Mode: Frequency of all values is 1", call. = FALSE) 
     }else{ 
      return(MODE1) 
     } 
    } 
} 

return(as.vector(lapply(DF, MODE2))) 
} 
+0

通過 「每10個數據」 你的意思是:一個模式行1:10,一個模式行2:11,一個模式行3:12, ...或者你的意思是行模式爲1:10,行模式爲行11:20,行模式爲21:30 ... – MichaelChirico

+0

1:10 11:20 21:30這樣 –

回答

0

可以使用zoo包來計算移動模式:

library(zoo) 

# sample data 
d <- data.frame(x = sample(1:3, 100, T)) 

# mode function (handles ties by choosing one) 
my_mode <- function(x) as.numeric(which.max(table(x))) 

# add moving mode as new variable 
transform(d, moving_mode = rollapply(x, 10, FUN = my_mode, fill = NA)) 
0

您可以隨時轉換爲character,看看哪char是最大的一張桌子。例如。

> which.max(table(strsplit(as.character(1777666666),""))) 
6 
2 
2

這應該工作

Mode <- function(x) { 
    y <- unique(x) 
    y[which.max(tabulate(match(x, y)))] 
} 

library(zoo) 
x<- c(1,2,2,3,4,5,5,5,6,7,1,7,7,7,6,6,6,6,6,6) 
rollapply(data = x, width = 10, FUN = Mode, by = 10) 
1

鑑於你不是一個滾動的模式,但真是一羣模式後, ,沒有其他答案是準確的。在你想到的情況下做到這一點實際上要容易得多;我將使用data.table

#fixed cost: set-up of 'data.table' 
library(data.table) 
setDT(DF) 

現在解決:

#this works on a single column; 
# the rep(...) bit is about creating the 
# sequence (1, ..., 1, 2, ..., 2, ...) 
# of integers each repeated 10 times. 
# Here, .N will give the frequency -- i.e., 
# this first step is basically running 'table' for every 10 rows 
DF[ , .N, by = .(col1, grp = rep(1:(.N %/% 10 + 1), length.out = .N))) 
    #by going in descending order on frequency, we can simply 
    # extract the first element of each 'grp' to get the mode. 
    # (this glosses over the issue of ties, but you haven't given 
    # any guidance to that end) 
    ][order(-N), .SD[1L], by = grp] 
相關問題