如何找到數字簇的中心？統計問題？

我有一個問題，例如，我有一組數字。如何找到數字簇的中心？統計問題？

5,7，7，8，8,8，7，20，23，23，24，24，24，25

在上述集，有兩個數字「簇」，我想編寫一個程序來查找這些集羣的中心。你能像分形理論那樣稱它們爲吸引子嗎？

所以該方案將，我想，發現集可被分爲兩種：

A - 5，7,7，8,8，8,7

乙 - 20，23 ，23,24,24,24,25

A組可以然後有它的平均計算，B組可以有它的平均計算然後我有兩個吸引子的中心。

也許這是一個很好的數學/統計人員的簡單問題？任何人都可以將我指向正確的方向嗎？我可能有1到5個「吸引子/聚類」。

來源

2010-01-08 Phil

什麼是同一集羣成員之間允許的最大偏差？謝謝 – 2010-01-08 11:36:58

k-均值聚類是否適合你？ http://en.wikipedia.org/wiki/K-means_clustering – 2010-01-08 11:39:48

例如，k-means clustering在R產生如下：

R> x <- c(5, 7, 7, 8, 8, 8, 7, 20, 23, 23, 24, 24, 24, 25) 
R> kmeans(as.matrix(x), centers=2) 
K-means clustering with 2 clusters of sizes 7, 7 

Cluster means: 
    [,1] 
1 23.286 
2 7.143 

Clustering vector: 
[1] 2 2 2 2 2 2 2 1 1 1 1 1 1 1 

Within cluster sum of squares by cluster: 
[1] 15.429 6.857 

Available components: 
[1] "cluster" "centers" "withinss" "size"

來源

2010-01-08 12:38:15 rcs

那樣？

public class Cluster { 
    public static void main(String[] args) { 
     int maxDist = 5; 
     char cluster = 'A'; 
     int[] values = { 5 , 7 , 7 , 8 , 8 , 8 , 7 , 20 , 23 , 23 , 24 , 24 , 24 , 25 }; 
     int prev = values[0]; 
     System.out.print(cluster + " - " + prev + " "); 
     for (int i = 1 ; i < values.length ; i++) { 
      if (Math.abs(prev - values[i]) >= maxDist) { 
       System.out.print("\n" + ++cluster + " - "); 
      } 
      System.out.print(values[i] + " "); 
      prev = values[i]; 
     } 
    } 
}

編輯：在集羣不是你的價值觀的例子太近，像這種方法將工作。 k-means需要一個已知的k（簇數），這在你的問題中沒有提到。在分離簇之後，你很容易找到「中心」作爲平均值。

來源

2010-01-08 11:56:32 stacker

情節的概率密度（認爲直方圖）與一些平滑因子，然後找到峯（羣的中心）和波谷（集羣之間的分工）

來源

2010-01-09 04:16:52 wroscoe

這個問題有很多好的方法，你最終應該使用的方法將取決於你處理的數據類型（例如，它是如何分佈的，數據點的維度，可能重疊的集羣，對異常值的魯棒性等）。

如前所述，首先要嘗試的是k-means聚類。您可能還想看看一個簡單的變體，稱爲k-medoids（又名分區中心（PAM）），它比k-means更強大的異常值。

關於k-means和k-medoids都需要注意的一件事是存在參數k（簇數）。如果您不知道集羣的數量先驗，有多種技術可以自動選擇（交叉驗證，輪廓分數等）;請參閱Cluster Analysis and Finite Mixture Models以獲取R中聚類分析實施的更全面列表。

我個人最喜歡的聚類技術是高斯混合模型（GMM）。我通常通過稱爲MCLUST的R包使用良好的GMM實現，它使用Bayesian Information Criterion自動識別簇的數量。

一旦您選擇一種方法來識別羣集成員資格（即，哪些數據點被組合在一起成爲集合），您就可以對它們進行平均或對數據進行處理。

來源

2010-01-16 06:34:38 awesomo

+1 for GMM：k-means的推廣。 – 2010-01-18 06:17:21

如何找到數字簇的中心？統計問題？

回答

相關問題