2016-07-22 67 views
1

我有兩個數據框,前者包含> 700列的預測變量,後者包含一列。前者被用作預測因子(所有值都爲0和1,但由於稀疏性大多爲0),第二個作爲模型訓練和測試的響應。第一個名稱爲ser,第二個爲star使用tf-idf變換的線性迴歸

我使用TF-IDF轉型以下

from sklearn.feature_extraction.text import TfidfTransformer 
transformer = TfidfTransformer() 

A = transformer.fit_transform(ser) 

print(A)

(0, 302) 0.613133438876 
(0, 202) 0.789979358042 
(1, 556) 1.0 
(2, 556) 0.432375068194 
(2, 17) 0.901693850708 
(3, 556) 0.269567465847 
(3, 335) 0.671245025218 
(3, 256) 0.400099662956 
(3, 238) 0.562746618986 
(4, 556) 0.401348891903 
(4, 137) 0.915925251846 
(5, 641) 0.785485510985 
(5, 396) 0.618880046562 
(6, 317) 0.525163047715 
(6, 305) 0.851001629443 
... (more are cut) 

下面顯示了部分難道我用這個TF-IDF改造好嗎?正如我有以下內容,我收到了我將在帖子末尾發佈的錯誤。

star = pd.DataFrame({"star": star}) 
data = pd.concat([ser, star], axis = 1) 

from sklearn.linear_model import LinearRegression 

D = LinearRegression() 

Dfit = D.fit(ser, star, sample_weight = A) 
Dpred = D.predict(ser) 
Dscore = D.score(ser,star) 
print(Dscore) 

錯誤

Traceback (most recent call last): 
File "categories_model.py", line 67, in <module> 
Dfit = D.fit(ser, star, sample_weight = A) 
File "/opt/conda/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 434, in fit 
sample_weight=sample_weight) 
File "/opt/conda/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 127, in center_data 
X_mean = np.average(X, axis=0, weights=sample_weight) 
File "/opt/conda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 937, in average 
"1D weights expected when shapes of a and weights differ.") 
TypeError: 1D weights expected when shapes of a and weights differ. 

誰能幫助我理解這一切,如何提高代碼?謝謝!!

回答

0

錯誤來自錯位變換矩陣。這解決了這個問題。

Dfit = D.fit(A, star)