2014-09-28 85 views
0

我遇到了一個豬腳本的問題,我嘗試了很多不同的方法。任何人都可以指出我究竟做錯了什麼?它應該是非常簡單的,我試圖在計算平均值後得到最大值。pigscript error not calculation max

a = LOAD 'default.books' using org.apache.hcatalog.pig.HCatLoader(); 
b = LOAD 'default.book_rating' using org.apache.hcatalog.pig.HCatLoader(); 

books_and_ratings = join a by isbn, b by isbn; 

by_isbn = GROUP books_and_ratings BY (a::isbn); 

DESCRIBE by_isbn; 

average_book_rating = FOREACH by_isbn 
     GENERATE books_and_ratings.book_title, books_and_ratings.a::isbn as isbn1, 
     books_and_ratings.book_author, books_and_ratings.publisher, 
     AVG(books_and_ratings.book_rating) as AVG_RATING; 

DESCRIBE average_book_rating; 

group_avg = GROUP average_book_rating ALL; 

DESCRIBE group_avg; 

max_avg_rating = FOREACH group_avg 
    GENERATE FLATTEN average_book_rating.a::book_title, isbn1, 
      average_book_rating.a::book_author, average_book_rating.a::publisher, MAX(AVG_RATING); 

dump max_avg_rating; 

解析失敗:不匹配的輸入 'average_book_rating' 期待LEFT_PAREN

+0

您是否收到錯誤,或者只是沒有正確計算最大值? – Eyal 2014-09-28 13:56:13

+0

@eyal實際上得到一個錯誤.... – Hades 2014-09-28 20:24:43

+0

計算max_avg_rating的最後一個stmt不正確。你能粘貼確切的錯誤嗎? – 2014-09-29 00:48:24

回答

2

你可以嘗試這樣的。

max_avg_rating = ORDER average_book_rating BY AVG_RATING DESC; 
top_most_rating = LIMIT max_avg_rating 1; 
dump top_most_rating; 
0

看到閻王最新評論後(「可以有多種書籍最高平均等級」),我想你需要另一組,第一個,獲得通過書號哪些羣體的收視率,你想要的東西之後。

開始是這樣的:由AVG_RATING

grouped_rating = GROUP average_book_rating;

然後你可以使用像@ Sivasakthi代碼:

ordered_avg_rating = ORDER BY grouped_rating DESC組;
top_most_rating = LIMIT ordered_avg_rating 1;
dump top_most_rating;

這樣一來,如果有與平等,最高收視多個結果,top_most_rating將所有的信息接受這個最高等級的書袋。當然,如果你不想把它作爲一個包,你可以把它設計得更方便些。

UPDATE:

這是我怎麼會改變上面的代碼。有一件事情不是純粹的功能,我會首先將評分平均,然後加入書籍/作者信息 - 這會更好地表現明智,否則你會增加評分的大小(其中有很多)時,他們去了。

所以它看起來像這樣:

-- assume a: book_title, isbn, book_author, publisher (and maybe more, which we'll ignore) 
    a = LOAD 'default.books' using org.apache.hcatalog.pig.HCatLoader(); 

    -- assume b: isbn, book_rating (and maybe more, which we'll ignore) 
    b = LOAD 'default.book_rating' using org.apache.hcatalog.pig.HCatLoader(); 

    by_isbn = GROUP b BY isbn; 

    average_book_rating = FOREACH by_isbn GENERATE AVG(b.book_rating) AS AVG_RATING, group AS isbn; 

    group_avg = GROUP average_book_rating BY AVG_RATING; 

    ordered_avg_rating = ORDER group_avg BY group DESC; 

    top_most_rating = LIMIT ordered_avg_rating 1; 

    b = FOREACH top_most_rating GENERATE flatten(average_book_rating); 

    -- now add the book information 

    books_and_ratings = JOIN a BY isbn, b BY isbn; 

    books_and_ratings = FOREACH books_and_ratings GENERATE a::book_title AS title, a::isbn AS isbn, a::book_author AS author,a::publisher AS publisher, b::average_book_rating::AVG_RATING AS max_rating; 

希望這個作品送給你。

+0

謝謝你的回答,我編輯了我的問題...你可以看看,看看什麼它錯了嗎? – Hades 2014-10-05 14:47:27

+0

在您編輯的代碼中(在您的原始描述中),按ALL分組,而不是按照我的答案中的AVG_RATING分組。這意味着所有的行將被分組到一個袋子裏。我仍然不確定你想要做什麼,但FLATTEN使用圓括號,這是你得到錯誤的直接原因。我的答案中的代碼確實會爲您帶來一包所有獲得最高平均評級的書籍。 – Eyal 2014-10-07 14:49:29

+0

我只想要最高的平均值....正確我的代碼,我會給你賞金 – Hades 2014-10-08 02:51:54

相關問題