2016-11-18 64 views
1

我試圖從quanteda NB預測情緒分析與驗證碼:[R quanteda錯誤predict.textmodel_NB_fitted:沒有實現

library(quanteda) 
X_train <-c("I love this sandwich.", 
      "This is an amazing place!", 
      "I feel very good about these beers.", 
      "This is my best work.", 
      "What an awesome view", 
      "I do not like this restaurant", 
      "I am tired of this stuff.", 
      "I can't deal with this", 
      "He is my sworn enemy!", 
      "this guy is horrible.") 

Y_train <- c(1,1,1,1,1,0,0,0,0,0) 

Y_train <- c(1,1,1,1,1,0,0,0,0,0) 
X_test <- c("The beer was good.", 
      "I do not enjoy my job", 
      "I ain't feeling dandy today.", 
      "I feel amazing! pos", 
      "Gary is a friend of mine.", 
      "I can't believ I'm doing this.", 
      "very sad about Iran", 
      "You're the only one who can see this cause no one else is following me this is for you because you're pretty awesome", 
      "ok thats it you win.", 
      "My horsie is moving on Saturday morning.", 
      "times by like a million", 
      "but i'm proud.", 
      "i want a hug)") 
Y_test <- c(1,0,0,1,1,0,0,1,1,0,1,1,1) 
dfm_mat <- dfm(X_train) 
tfidf_mat <- tfidf(dfm_mat, normalize = TRUE) 
model <- textmodel_NB(tfidf_mat, Y_train, distribution = "multinomial") 

predict(model, X_test) 

而且我得到了以下錯誤消息:

Error in newdata %*% t(log(object$PwGc)) : not-yet-implemented method for <character> %*% <dgeMatrix> 
5.stop(gettextf("not-yet-implemented method for <%s> %%*%% <%s>", class(x), class(y)), domain = NA) 
4.newdata %*% t(log(object$PwGc)) 
3.newdata %*% t(log(object$PwGc)) 
2.predict.textmodel_NB_fitted(model, X_test) 
1.predict(model, X_test) 

運行: quanteda_0.9.8.5
Matrix_1.2-7.1
R版本3.3.1(2016-06-21)
平臺:x86_6 4-pc-linux-gnu(64位)
運行於:Ubuntu 16.10

任何人有任何想法?

回答

2

這裏的問題在於你試圖從一個字符向量中預測擬合的Naives Bayes模型,該字符向量沒有爲字符向量定義(如錯誤消息所述,儘管承認沒有以最清晰的方式)。

解決方案是在dfm對象上預測您的模型,但是其功能與訓練dfm匹配。

# this creates a test dfm, and matches its features to the training dfm 
dfm_test <- dfm_select(dfm(X_test), dfm_mat) 
## found 15 features from 36 supplied types in a dfm, padding 0s for another 21 

然後predict()法正常工作:

predict(model, dfm_test) 
## Predicted textmodel of type: Naive Bayes 
## 
##    lp(1)  lp(0)  Pr(1) Pr(0) Predicted 
## text1 -4.2419639 -4.3728368 0.5327 0.4673   1 
## text2 -15.1799166 -14.8238632 0.4119 0.5881   0 
## text3 -4.2637198 -4.2239433 0.4901 0.5099   0 
## text4 -11.3125631 -11.5833225 0.5673 0.4327   1 
## text5 -7.9101340 -7.7336472 0.4560 0.5440   0 
## text6 -11.5324821 -11.2864767 0.4388 0.5612   0 
## text7 -7.7907806 -8.0525264 0.5651 0.4349   1 
## text8 -18.3944576 -18.5330895 0.5346 0.4654   1 
## text9 -0.6931472 -0.6931472 0.5000 0.5000   1 
## text10 -7.7792864 -7.7569503 0.4944 0.5056   0 
## text11 -4.3754953 -4.2186861 0.4609 0.5391   0 
## text12 -0.6931472 -0.6931472 0.5000 0.5000   1 
## text13 -4.2637198 -4.2239433 0.4901 0.5099   0 
+2

THX很多肯。這似乎很明顯,當我有解決方案:) – alEx

相關問題