2015-11-05 80 views
1

我試圖用PCA降低數據集的維數。然後,根據某些標準(取決於從中獲取數據點的文件名的編號),爲每個數據點分配一個「類/類別」,並將所有數據點繪製爲散點圖,其中包含遺留的有關散點圖的其他信息的問題

如同每個數據點的另一個列表我有一些附加信息存儲,我希望每個數據點都可以選擇,以便我可以讀取終端中的信息。 在繪製我的散點圖時 - 我假設因爲我繪製了子集明智的 - 訂單被搞亂了。 接收到的事件的標記不再適用於具有附加信息的陣列。

我試圖在繪圖時重新排列信息數組,但不知何故它仍然無法工作。這裏是我的代碼:

targets = [] 
trainNames = [] 

# React on to a click on a datapoint. 
def onPick(event): 
    indexes = event.ind 
    xy = event.artist.get_offsets() 
    for index in indexes: 
    print trainNames[index] 


# Load the additonal information for each datapoint. It's stored in the 
# same order as the datapoints in 'trainingfile.csv'. 
modelNamesFile = open("training-names.csv") 
for line in modelNamesFile: 

    # Save target for datapoint. It's the class of the object, seperated 
    # into "rectangular", "cylindrical", "irregular", dependend on the 
    # objects file number. 
    objnum = int(line.split(",")[-1].split("/")[-1].split(".")[0]) 
    if (objnum <= 16): 
    objnum = 0 
    elif (objnum >= 17 and objnum <= 34): 
    objnum = 1 
    else: 
    objnum = 2 
    targets.append(objnum) 

    # Save name description for datapoint. 
    sceneName = line.split(",")[0].split("/")[-1] 
    modelName = line.split(",")[-1].split("/")[-1].split(".")[0] 
    trainNames.append(sceneName + ", " + modelName) 


target_names = ["rectangular", "cylindrical", "irregular"] 


# Load the actual data. 
f = open("trainingfile.csv") 
tData = [] 
for line in f: 
    lsplit = line.split(",") 
    datapoint = [] 
    for feature in lsplit: 
    datapoint.append(float(feature)) 

    tData.append(datapoint) 
data = np.array(tData) 

# Transform it into 2D with PCA. 
y = np.array(targets) 
X = np.delete(data, data.shape[1] - 1, 1) # Strip class. 
pipeline = Pipeline([('scaling', StandardScaler()), ('pca', PCA(n_components=2))]) 
X_reduced = pipeline.fit_transform(data) 


# Create plot. 
trainNames = np.array(trainNames) 
tmpTrainNames = np.array([]) 
fig = plt.figure() 
for c, i, target_name in zip("rgb", [0, 1, 2], target_names): 
    plt.scatter(X_reduced[y == i, 0], X_reduced[y == i, 1], c=c, label=target_name, picker=True) 

    # Here i try to rearrange the order of the additonal information int he order the points 
    # were plotted. 
    tmpTrainNames = np.append(tmpTrainNames, trainNames[y == i]) 

trainNames = tmpTrainNames 

plt.legend() 
plt.xlabel('Feature 1') 
plt.ylabel('Feature 2') 
fig.canvas.mpl_connect('pick_event', onPick) 
plt.show() 

如果它太複雜,我可以嘗試簡化。就告訴我嘛。

回答

0

由於找不到索引問題的解決方案,我用不同的方法解決了這個問題。我沒有分配類{0, 1, 2},然後與zip()進行映射,而是直接將顏色值分配爲類,並將顏色參數作爲整個類目標。通過這個,我可以一次繪製所有內容並保持數據點的原始順序。

# y is here the class target with color values, e.g. ['r', 'g',..., 'r'] 
plt.scatter(X_reduced[:,0], X_reduced[:,1], c=y, picker=True) 
相關問題