4
我試圖從下面的段落結構提取這種類型的信息:NLP - 在Python(spaCy)信息提取
women_ran men_ran kids_ran walked
1 2 1 3
2 4 3 1
3 6 5 2
text = ["On Tuesday, one women ran on the street while 2 men ran and 1 child ran on the sidewalk. Also, there were 3 people walking.", "One person was walking yesterday, but there were 2 women running as well as 4 men and 3 kids running.", "The other day, there were three women running and also 6 men and 5 kids running on the sidewalk. Also, there were 2 people walking in the park."]
我使用Python的spaCy
我的NLP圖書館。我更新NLP的工作,並希望得到一些指導,以便從這些句子中提取這些表格信息的最佳方式是什麼。
如果僅僅是確定是否有個人跑步或行走,我只是使用sklearn
來適應分類模型,但我需要提取的信息顯然比這更細化(我試圖檢索每個子類別和值)。任何指導將不勝感激。
我沒寫過一個XPath查詢或DOM選擇。你能解釋一下嗎? – kathystehl
@kathystehl XPath指定XML(HTML)文檔中的位置。所以XPath查詢是一種在XML或HTML中查找特定元素的方法。參見[wikipedia](https://en.wikipedia.org/wiki/XPath)。 DOM選擇器是HTML文檔中的任何CSS元素'id'或'class'(DOM是您在javascript中使用的HTML/XML文檔/樹的數據結構等)。所以你可以通過id和class來篩選元素。在NLP中,依賴關係解析器將非結構化文本轉換爲類似於HTML的樹數據結構,其中的標記可以像DOM選擇器過濾器和XPath查詢一樣進行查詢。 – hobs