2017-02-04 99 views
2

給定一個數據集相等的寬度合併,我想同時使用相同的頻率離散化和相等寬度合併所描述here將其分割成4個箱,但是,我想使用R輸入語言。相等的頻率和作爲R

數據集:

0, 4, 12, 16, 16, 18, 24, 26, 28 

我試圖寫的寬度相同的分級一些代碼,但它只是產生一個直方圖。

bins<-4; 
minimumVal<-min(dataset) 
maximumVal<-max(dataset) 
width=(maximumVal-minimumVal)/bins; 
edges = minimumVal:width:maximumVal; 
hist(dataset, breaks = "Sturges", freq = TRUE, xlim = range(edges)) 

我是新來的R,所以就產生這兩種中的R binnings的一點點幫助,將不勝感激。

回答

3

的寬度相同的分級,我建議使用classInt包:

dataset <- c(0, 4, 12, 16, 16, 18, 24, 26, 28) 

library(classInt) 
classIntervals(dataset, 4) 
x <- classIntervals(dataset, 4, style = 'equal') 

要使用的休息,你可以檢查x$brks

至於頻率相等分級,你可以使用相同的封裝,選項style = 'quantile'

classIntervals(dataset, 4, style = 'quantile') 

它不會因重複值完全相等大小的垃圾箱中dataset(16)分開,因爲由於數據集有9個元素,因此無法將數據集完全分割成4個元素,並且元素數量嚴格相同。我不知道這是一個問題,因爲在提供的參考,它說,

「......每個組包含大約相同數量的值。」

當你沒有明確你所尋找的,我建議參照this post的另一種方法的準確方法,在你的例子那就是:

library(Hmisc) 
table(cut2(dataset, m = length(dataset)/4)) 

此外,在其他職位上面提出的鏈接提供了其他選擇和一些關於這些方法的相關討論

+0

classIntervals完美運行兩種類型的分級的。謝謝! –

0

您可以嘗試爲equal-width-binning如下:

set.seed(1) 
dataset <- runif(100, 0, 10) # some random data 
bins<-4 
minimumVal<-min(dataset) 
maximumVal<-max(dataset) 
width=(maximumVal-minimumVal)/bins; 
cut(dataset, breaks=seq(minimumVal, maximumVal, width)) 

#[1] (2.58,5.03] (2.58,5.03] (5.03,7.47] (7.47,9.92] (0.134,2.58] (7.47,9.92] (7.47,9.92] (5.03,7.47] (5.03,7.47] (0.134,2.58] (0.134,2.58] (0.134,2.58] 
#[13] (5.03,7.47] (2.58,5.03] (7.47,9.92] (2.58,5.03] (5.03,7.47] (7.47,9.92] (2.58,5.03] (7.47,9.92] (7.47,9.92] (0.134,2.58] (5.03,7.47] (0.134,2.58] 
#[25] (2.58,5.03] (2.58,5.03] <NA>   (2.58,5.03] (7.47,9.92] (2.58,5.03] (2.58,5.03] (5.03,7.47] (2.58,5.03] (0.134,2.58] (7.47,9.92] (5.03,7.47] 
#[37] (7.47,9.92] (0.134,2.58] (5.03,7.47] (2.58,5.03] (7.47,9.92] (5.03,7.47] (7.47,9.92] (5.03,7.47] (5.03,7.47] (7.47,9.92] (0.134,2.58] (2.58,5.03] 
#[49] (5.03,7.47] (5.03,7.47] (2.58,5.03] (7.47,9.92] (2.58,5.03] (0.134,2.58] (0.134,2.58] (0.134,2.58] (2.58,5.03] (5.03,7.47] (5.03,7.47] (2.58,5.03] 
#[61] (7.47,9.92] (2.58,5.03] (2.58,5.03] (2.58,5.03] (5.03,7.47] (0.134,2.58] (2.58,5.03] (7.47,9.92] (0.134,2.58] (7.47,9.92] (2.58,5.03] (7.47,9.92] 
#[73] (2.58,5.03] (2.58,5.03] (2.58,5.03] (7.47,9.92] (7.47,9.92] (2.58,5.03] (7.47,9.92] (7.47,9.92] (2.58,5.03] (5.03,7.47] (2.58,5.03] (2.58,5.03] 
#[85] (7.47,9.92] (0.134,2.58] (5.03,7.47] (0.134,2.58] (0.134,2.58] (0.134,2.58] (0.134,2.58] (0.134,2.58] (5.03,7.47] (7.47,9.92] (7.47,9.92] (7.47,9.92] 
#[97] (2.58,5.03] (2.58,5.03] (7.47,9.92] (5.03,7.47] 
#Levels: (0.134,2.58] (2.58,5.03] (5.03,7.47] (7.47,9.92] 

#plot frequencies in the bins 
barplot(table(cut(dataset, breaks=seq(minimumVal, maximumVal, width)))) 

enter image description here