1

我有一大組數據包含81432圖像的描述。這些描述由圖像描述符生成,該圖像描述符生成具有127個位置的矢量(針對每個圖像)。所以,我有一個有81432行和127列的矩陣。如何解釋R kmeans函數的結果?

而我正在運行kmeans從R,但我只是不知道如何解釋結果。我已經設置了許多集羣,算法運行的還有哪些?我想繪製彎頭規則,但我甚至不知道如何去做。

+1

請仔細閱讀[如何創建一個可重複的例子(http://stackoverflow.com/questions/5963269/how-to-make-a-great-r -reproducible-例子)。包括一些示例數據,並準確描述你想讓劇情看起來像什麼。如果你只是尋找可視化建議,那麼這真的不是一個編程問題,可能更適合[stats.se]而不是Stack Overflow。 – MrFlick

+0

感謝@MrFlick的解釋。實際上,我真的不知道我在找什麼樣的可視化(也許像散點圖那樣)。我也把這個問題放在了交叉驗證中。 –

回答

0

繪製彎頭規則(這是關於如何附近是指向它的質心),我們必須使用tot.withinss(羣內總平方和)。

這個答案是關於使用R.

2

使用K均值和主成分分析,用於分析和可視化數據集的示例代碼片段:

library(calibrate) 
library(plyr) 
library(gclus) 
library(scatterplot3d) 
library(cluster) 
library(fpc) 
library(mclust) 
library(rpanel) 
library(rgl) 
library(lattice) 
library(tm); 
library(RColorBrewer) 



#Read data 
mydata <- read.table(file="c:/data.mtx", header=TRUE, row.names=1, sep=""); 

# Lets look at the correlations 
mydata.cor = abs(cor(scale(mydata))) 
mydata.cor[,1:2] 

#lets look at the data in interactive 3D plot before PCA 
rp.plot3d(mydata[,1],mydata[,2], mydata[,3]) 

# Doing the PCA 
mydata.pca<- prcomp(mydata, retx=TRUE, center=TRUE, scale=TRUE); 
summary(mydata.pca) 
#3D plot of first three PCs 
rp.plot3d(mydata.pca$x[,1],mydata.pca$x[,2],mydata.pca$x[,3]) 


#Eigenvalues of components for Kaiser Criterion 
mydata.pca$sdev ^2 


#scree test for determining optimal number of PCs (Elbow rule) 
par(mfrow=c(1,2)) 
screeplot(mydata.pca,main="Scree Plot",xlab="Components") 
screeplot(mydata.pca,type="line",main="Scree Plot") 

#Scores 
scores = mydata.pca$x 
## Plot of the scores, with the axes 
pdf("scores.pdf") 
plot (scores[,1], scores[,2], xlab="Scores 1", ylab="Scores 2") 
text (x=scores[,1], y=scores[,2], labels=row.names (scores), cex=c(0.4,0.4), col = "blue") 
lines(c(-5,5),c(0,0),lty=2) ## Draw the horizontal axis 
lines(c(0,0),c(-4,3),lty=2) ## Draw the vertical axis 
dev.off() 

#finding possible number of clusters in Kmeans 
wss <- (nrow(scale(mydata))-1)*sum(apply(scale(mydata),2,var)); 
for (i in 2:20) wss[i] <- sum(kmeans(scale(mydata),centers=i)$withinss); 
plot(1:20, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares"); 

#Performing K-Means and visualizing the result 
km1<-kmeans(scores[,1:2], algorithm = "Hartigan-Wong", centers=4) 
#par(mfrow = c(1, 1)) 
pdf("km.pdf") 
plot(scores[,1:2], col = km1$cluster); 
points(km1$centers, col = 1:5, pch = 8, cex=2); 
scatterplot3d(km1$centers, pch=20, highlight.3d = TRUE, type="h"); 
# getting cluster means 
aggregate(scores[,1:2],by=list(km1$cluster),FUN=mean); 
# appending cluster assignment 
clustercounts <- data.frame(scores[,1:2], km1$cluster); 
#Cluster Plot against 1st 2 principal components 
clusplot(scores[,1:2], km1$cluster, color=TRUE, shade=TRUE, labels=2, lines=0, cex=c(0.2,0.2)); 
dev.off() 
+0

這個答案沒有幫助,因爲我們大多數人可能沒有''c:/data.mtx「'坐在我們的機器上 –

+0

@SeñorO這個問題沒有幫助,因爲它沒有包含可複製的數據集 – C8H10N4O2

+1

@ C8H10N4O2 ok你想讓我對此做些什麼? –