1
我有下面的代碼,我正在使用python編寫一個簡單的電影推薦器的一部分,這樣我就可以模擬我作爲coursera由Andrew NG教授的機器學習課程的一部分獲得的結果。在從熊貓數據框轉換後修改numpy數組
我想修改我呼籲大熊貓數據框as_matrix()後,即可獲取numpy.ndarray和列向量添加到它就像我們可以在MATLAB
Y = [ratings Y]
以下是我的Python代碼
dataFile='/filepath/'
userItemRatings = pd.read_csv(dataFile, sep="\t", names=['userId', 'movieId', 'rating','timestamp'])
movieInfoFile = '/filepath/'
movieInfo = pd.read_csv(movieInfoFile, sep="|", names=['movieId','Title','Release Date','Video Release Date','IMDb URL','Unknown','Action','Adventure','Animation','Childrens','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir','Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western'], encoding = "ISO-8859-1")
userMovieMatrix=pd.merge(userItemRatings, movieInfo, left_on='movieId', right_on='movieId')
userMovieSubMatrix = userMovieMatrix[['userId', 'movieId', 'rating','timestamp','Title']]
Y = pd.pivot_table(userMovieSubMatrix, values='rating', index=['movieId'], columns=['userId'])
Y.fillna(0,inplace=True)
movies = Y.shape[0]
users = Y.shape[1] +1
ratings = np.zeros((1682, 1))
ratings[0] = 4
ratings[6] = 3
ratings[11] = 5
ratings[53] = 4
ratings[63] = 5
ratings[65] = 3
ratings[68] = 5
ratings[97] = 2
ratings[182] = 4
ratings[225] = 5
ratings[354] = 5
features = 10
theta = pd.DataFrame(np.random.rand(users,features))# users 943*3
X = pd.DataFrame(np.random.rand(movies,features))# movies 1682 * 3
X = X.as_matrix()
theta = theta.as_matrix()
Y = Y.as_matrix()
"""want to insert a column vector into this Y to get a new Y of dimension
1682*944, but only seeing 1682*943 after the following statement
"""
np.insert(Y, 0, ratings, axis=1)
R = Y.copy()
R[R!=0] = 1
Ymean = np.zeros((movies, 1))
Ynorm = np.zeros((movies, users))
for i in range(movies):
idx = np.where(R[i,:] == 1)[0]
Ymean[i] = Y[i,idx].mean()
Ynorm[i,idx] = Y[i,idx] - Ymean[i]
print(type(Ymean), type(Ynorm), type(Y), Y.shape)
Ynorm[np.isnan(Ynorm)] = 0.
Ymean[np.isnan(Ymean)] = 0.
插入了一個內嵌評論,但我的問題是當我創建一個新的numpy數組並調用insert時,它工作得很好。然而,在調用pivot_table()的pandas數據框上調用as_matrix()後,我得到的numpy數組無效。有其他選擇嗎?