2017-08-30 66 views
1

我有下面的代碼,我正在使用python編寫一個簡單的電影推薦器的一部分,這樣我就可以模擬我作爲coursera由Andrew NG教授的機器學習課程的一部分獲得的結果。在從熊貓數據框轉換後修改numpy數組

我想修改我呼籲大熊貓數據框as_matrix()後,即可獲取numpy.ndarray和列向量添加到它就像我們可以在MATLAB

Y = [ratings Y] 

以下是我的Python代碼

dataFile='/filepath/' 

userItemRatings = pd.read_csv(dataFile, sep="\t", names=['userId', 'movieId', 'rating','timestamp']) 
movieInfoFile = '/filepath/' 
movieInfo = pd.read_csv(movieInfoFile, sep="|", names=['movieId','Title','Release Date','Video Release Date','IMDb URL','Unknown','Action','Adventure','Animation','Childrens','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir','Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western'], encoding = "ISO-8859-1") 

userMovieMatrix=pd.merge(userItemRatings, movieInfo, left_on='movieId', right_on='movieId') 
userMovieSubMatrix = userMovieMatrix[['userId', 'movieId', 'rating','timestamp','Title']] 


Y = pd.pivot_table(userMovieSubMatrix, values='rating', index=['movieId'], columns=['userId']) 
Y.fillna(0,inplace=True) 
movies = Y.shape[0] 
users = Y.shape[1] +1 



ratings = np.zeros((1682, 1)) 

ratings[0] = 4 
ratings[6] = 3 
ratings[11] = 5 
ratings[53] = 4 
ratings[63] = 5 
ratings[65] = 3 
ratings[68] = 5 
ratings[97] = 2 
ratings[182] = 4 
ratings[225] = 5 
ratings[354] = 5 

features = 10 

theta = pd.DataFrame(np.random.rand(users,features))# users 943*3 
X = pd.DataFrame(np.random.rand(movies,features))# movies 1682 * 3 


X = X.as_matrix() 
theta = theta.as_matrix() 

Y = Y.as_matrix() 


"""want to insert a column vector into this Y to get a new Y of dimension 
    1682*944, but only seeing 1682*943 after the following statement 

""" 
np.insert(Y, 0, ratings, axis=1) 

R = Y.copy() 
R[R!=0] = 1 





Ymean = np.zeros((movies, 1)) 
Ynorm = np.zeros((movies, users)) 



for i in range(movies): 
    idx = np.where(R[i,:] == 1)[0] 
    Ymean[i] = Y[i,idx].mean() 
    Ynorm[i,idx] = Y[i,idx] - Ymean[i] 

print(type(Ymean), type(Ynorm), type(Y), Y.shape) 
Ynorm[np.isnan(Ynorm)] = 0. 
Ymean[np.isnan(Ymean)] = 0. 

插入了一個內嵌評論,但我的問題是當我創建一個新的numpy數組並調用insert時,它工作得很好。然而,在調用pivot_table()的pandas數據框上調用as_matrix()後,我得到的numpy數組無效。有其他選擇嗎?

回答

1

insert不能正常工作,您需要將輸出分配給一個變量。試試:

Y = np.insert(Y, 0, ratings, axis=1)