2016-01-20 149 views
0

hclust中,可以指定method = "average"以在聚類中使用平均鏈接。用於計算兩組點之間平均鏈接距離的函數

我的情況是,我有兩個固定的集羣,我想計算這兩個集羣之間的平均關聯。

在R中有這樣的功能嗎? hclust似乎使用Fortran代碼來執行此操作。

樣本數據:

structure(list(lon = c(106.0081819, 106.0621591, 106.0787142, 
105.9581624, 105.9982149, 105.9455287, 106.0726373, 106.12575, 
106.1110501, 106.060344, 106.0635147, 105.9575665, 105.9494248, 
106.0475363, 105.9564829, 105.9964291, 106.1037006, 105.9964291, 
106.1639749, 106.1110501), lat = c(21.1400879, 21.1766814, 21.1738006, 
21.202957, 21.1244525, 21.1101074, 21.1861204, 21.163438, 21.121444, 
21.169068, 21.1815923, 21.1085185, 21.0994022, 21.1688445, 21.1158848, 
21.1122605, 21.1988765, 21.1122605, 21.0178933, 21.121444), group = c("domestic", 
"foreign", "domestic", "domestic", "foreign", "domestic", "domestic", 
"foreign", "domestic", "domestic", "domestic", "domestic", "domestic", 
"domestic", "foreign", "domestic", "domestic", "foreign", "domestic", 
"domestic")), .Names = c("lon", "lat", "group"), class = c("tbl_df", 
"data.frame"), row.names = c(NA, -20L)) 

回答

1

也許

d <- dist(df[, 1:2]) 
idx <- as.matrix(expand.grid(which(df$group=="domestic"), which(df$group=="foreign"))) 
mean(as.matrix(d)[idx]) 
# [1] 0.09028491 

如果平均鍵是的平均距離(這裏:歐幾里得)在簇1中的每個點之間,並且在簇中的每個點2.

+0

我想出了同樣的解決方案,儘管'as.matrix(d)[which(df $ group ==「domestic」),which(df $ group ==「foreign」)]''代替。這和'expand.grid'一樣嗎? – Heisenberg

+0

應該是。有時我覺得太複雜了。 - > – lukeA

相關問題