PIG：Twitter情緒分析

我正在嘗試實施Twitter情緒分析。我需要獲取所有積極的推文和消極推文，並將它們存儲在特定的文本文件中。PIG：Twitter情緒分析

sample.json

{"id": 252479809098223616, "created_at": "Wed Apr 12 08:23:20 +0000 2016", "text": "google is a good company", "user_id": 450990391}{"id": 252479809098223616, "created_at": "Wed Apr 12 08:23:20 +0000 2016", "text": "facebook is a bad company","user_id": 450990391}

dictionary.text讓所有的正面和negetive單詞列表

weaksubj 1 bad  adj  n negative 
strongsubj 1 good adj  n positive

豬腳本： -

tweets = load 'new.json' using JsonLoader('id:chararray,text:chararray,user_id:chararray,created_at:chararray'); 

dictionary = load 'dictionary.text' AS (type:chararray,length:chararray,word:chararray,pos:chararray,stemmed:chararray,polarity:chararray); 

words = foreach tweets generate FLATTEN(TOKENIZE(text)) AS word,id,text,user_id,created_at; 

sentiment = join words by word left outer, dictionary by word; 

senti2 = foreach sentiment generate words::id as id,words::created_at as created_at,words::text as text,words::user_id as user_id,dictionary::polarity as polarity; 

res = FILTER senti2 BY polarity MATCHES '.*possitive.*';

描述RES： -

res: {id: chararray,created_at: chararray,text: chararray,user_id: chararray,polarity: chararray}

但是，當我傾倒RES我沒有看到任何輸出，但它沒有任何錯誤執行罰款。

我在這裏做了什麼錯誤。

請給我建議。

Mohan.V

來源

2016-09-20 Bunny

我看到2個錯誤，這裏

1：2號線 - 當你傾倒字典，你會看到所有的記錄在第1列與列的其餘部分顯示爲空。

解決方案：指定使用PigStorage適當分隔符（）;

dictionary = load 'dictionary.text' AS  (type:chararray,length:chararray,word:chararray,pos:chararray,stemmed:chararray,polarity:chararray); 

DUMP dictionary; 
(weaksubj 1 bad  adj  n negative,,,,,) 
(strongsubj 1 good adj  n positive,,,,,)

二錯誤：線6：更正的積極拼寫！使用類似

res = FILTER senti2 BY UPPER(polarity) MATCHES '.*POSITIVE.*';

來源

2016-09-20 08:02:48

感謝您的回覆@Sandesh。 – Bunny

我想什麼ü建議。但仍然，它的運行成功，但沒有輸出。 – Bunny

我已經去掉空格編輯字典文件。 – Bunny

我看到拼寫錯誤的：

res = FILTER senti2 BY polarity MATCHES '.*possitive.*';

是不是很'.*positive.*'？

來源

2016-10-07 23:44:50 Deepti

根據我的建議，您應該使用自定義UDF來解決您的問題。現在你可以使用elephant-bird-pig-4.1.jar，json-simple-1.1.1.jar。另外，如果你想看看這些例子，那麼你可以使用這些Sentiment Analysis Tutorial。如果你想代碼，那麼你可以參考這些代碼，並按照教程和我的代碼格式化你的代碼，

REGISTER ‘/usr/local/elephant-bird-hadoop-compat-4.1.jar'; 
REGISTER '/ usr/local /elephant-bird-pig-4.1.jar'; 
REGISTER '/ usr/local /json-simple-1.1.1.jar’; 
load_tweets = LOAD '/user/new.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap; 
extract_details = FOREACH load_tweets GENERATE myMap#'id' as id,myMap#'text' as text; 
tokens = foreach extract_details generate id,text, FLATTEN(TOKENIZE(text)) As word; 
dictionary = load '/user/dictionary.text' AS (type:chararray,length:chararray,word:chararray,pos:chararray,stemmed:chararray,polarity:chararray); 
word_rating = join tokens by word left outer, dictionary by word using 'replicated’; describe word_rating; 
rating = foreach word_rating generate tokens::id as id,tokens::text as text, dictionary::rating as rate; 
word_group = group rating by (id,text); 
avg_rate = foreach word_group generate group, AVG(rating.rate) as tweet_rating; 
positive_tweets = filter avg_rate by tweet_rating>=0;

來源

2017-09-16 13:40:23

PIG：Twitter情緒分析

回答

相關問題