Apache Beam Python將句子分割成鍵，每個詞的值對

我有包含句子和組（標籤） - >鍵的數據，值對是（組，句子）開始，我想分割把句子變成單詞，使得我最終以（組，單詞）對所有句子中的所有單詞配對。我怎樣才能在管道內做到這一點？考慮該試驗例Apache Beam Python將句子分割成鍵，每個詞的值對

test_input = [{'group': '1', 'sentence': 'This is a sentence'}, 
       {'group': '1', 'sentence': 'This is another sentence'}, 
       {'group': '2', 'sentence': 'Here is a third sentence'}, 
       {'group': '3', 'sentence': 'The last example'}] 

test_transformation = (test_input 
         | 'split' >> beam.FlatMap(lambda x: (x["group"], x["sentence"].split())) 
        ) 

test_transformation

上面的例子並不句子分成單詞的列表，而是整個名單是搭配的單詞。我怎樣才能進一步分解這個列表？第一行的輸出如下：

['1',['This', 'is', 'a', 'sentence']]

而我想的東西更像

[('1', 'This'), ('1', 'is'), ('1', 'a'), ('1', 'sentence')]

這種感覺，就好像它應該做的，能，但我不能想出如何到這個。

來源

2017-10-09 reese0106

由於問題本身和解決方案都不涉及Beam API，因此這似乎更像是一個Python問題而不是Beam問題。你可以使用Python list comprehension：

>>> x = {'group': '1', 'sentence': 'This is a sentence'} 

>>> (x['group'], x['sentence'].split()) 
('1', ['This', 'is', 'a', 'sentence']) 

>>> [(x['group'], word) for word in x['sentence'].split()] 
[('1', 'This'), ('1', 'is'), ('1', 'a'), ('1', 'sentence')]

來源

2017-10-09 19:21:53 jkff

你需要你的lambda生成一個包含句子中每個單詞的輸出元組的列表。例如：

test_transformation = (test_input 
        | 'split' >> beam.FlatMap(lambda x: [(x["group"], word) for word in x["sentence"].split())) 
        )

來源

2017-10-09 19:21:23

Apache Beam Python將句子分割成鍵，每個詞的值對

回答

相關問題