2016-11-25 70 views
1

我想在Python 3中使用Numpy實現k-means算法。我的輸入數據矩陣是一個點的簡單的N×2矩陣數據:k-means算法不起作用

[[1, 2], 
[3, 4], 
    ... 
[7, 13]] 

出於某種原因,在迭代的每個步驟,沒有我的標籤是相同的。每一個標籤都是不同的。有人看到我在做什麼明顯的錯誤嗎?我試圖給我的代碼添加一些評論,以便人們可以瞭解我正在做的各種步驟。

def kmeans(X,k): 

    # Initialize by choosing k random data points as centroids 
    num_features = X.shape[1] 
    centroids = X[np.random.randint(X.shape[0], size=k), :] # find k centroids 
    iterations = 0 
    old_labels, labels = [], [] 

    while not should_stop(old_labels, labels, iterations): 
     iterations += 1 

     clusters = [[] for i in range(0,k)] 
     for i in range(k): 
      clusters[i].append(centroids[i]) 

     # Label points 
     old_labels = labels 
     labels = [] 
     for point in X: 
      distances = [np.linalg.norm(point-centroid) for centroid in centroids] 
      max_centroid = np.argmax(distances) 
      labels.append(max_centroid) 
      clusters[max_centroid].append(point) 

     # Compute new centroids 
     centroids = np.empty(shape=(0,num_features)) 
     for cluster in clusters: 
      avgs = sum(cluster)/len(cluster) 
      centroids = np.append(centroids, [avgs], axis=0) 

    return labels 

def should_stop(old_labels, labels, iterations): 
    count = 0 
    if len(old_labels) == 0: 
     return False 
    for i in range(len(labels)): 
     count += (old_labels[i] != labels[i]) 
    print(count) 
    if old_labels == labels or iterations == 2000: 
     return True 
    return False 

回答

1
max_centroid = np.argmax(distances) 

你想找到的距離,而不是最大化它的一個最小的質心。

+0

呃 - 非常感謝。 – Apollo