4
我有一個Hive表,它跟蹤在進程的各個階段中移動的對象的狀態。該表是這樣的:使用python轉換函數的Hive:「無法識別'transform'附近的輸入」「錯誤
hive> desc journeys;
object_id string
journey_statuses array<string>
這裏有一個記錄的一個典型的例子:採用蜂巢0.13的collect_list
產生
12345678 ["A","A","A","B","B","B","C","C","C","C","D"]
在表中的記錄和狀態有一個訂單(如果爲了並不重要,我會用collect_set
)。對於每個object_id,我想縮短旅程以按照它們出現的順序返回旅程狀態。
我寫了一個快速的Python腳本,從標準輸入讀取:
#!/usr/bin/env python
import sys
import itertools
for line in sys.stdin:
inputList = eval(line.strip())
readahead = iter(inputList)
next(readahead)
result = []
for id, (a, b) in enumerate(itertools.izip(inputList, readahead)):
if id == 0:
result.append(a)
if a != b:
result.append(b)
print result
我計劃在蜂房transform
調用中使用此。看來工作時,本地運行:
$ echo '["A","A","A","B","B","B","C","C","C","C","D"]' | python abbreviate_list.py
['A', 'B', 'C', 'D']
然而,當我添加了文件,並嘗試蜂巢內執行,則返回一個錯誤:
hive> add file abbreviateList.py;
Added resource: abbreviateList.py
hive> select
> object_id,
> transform(journey_statuses) using 'python abbreviateList.py' as journey_statuses_abbreviated
> from journeys;
NoViableAltException(... wall of Java error messages ...)
FAILED: ParseException line 3:2 cannot recognize input near 'transform' '(' 'journey_statuses' in select expression
你能看到我在做什麼錯?