0
我的問題類似於這個question。在spacy
中,我可以分別進行詞性標註和名詞短語標識,例如mergnig名詞短語塊的POS標籤
import spacy
nlp = spacy.load('en')
sentence = 'For instance , consider one simple phenomena :
a question is typically followed by an answer ,
or some explicit statement of an inability or refusal to answer .'
token = nlp(sentence)
token_tag = [(word.text, word.pos_) for word in token]
輸出的樣子:
[('For', 'ADP'),
('instance', 'NOUN'),
(',', 'PUNCT'),
('consider', 'VERB'),
('one', 'NUM'),
('simple', 'ADJ'),
('phenomena', 'NOUN'),
...]
對於名詞短語或塊,我可以得到noun_chunks
這是詞的一大塊如下:
[nc for nc in token.noun_chunks] # [instance, one simple phenomena, an answer, ...]
我想知道是否有是一種基於noun_chunks
對POS標籤進行聚類的方式,以便我得到輸出爲
[('For', 'ADP'),
('instance', 'NOUN'), # or NOUN_CHUNKS
(',', 'PUNCT'),
('one simple phenomena', 'NOUN_CHUNKS'),
...]