from bs4 import BeautifulSoup
list = (glob.glob("/home/anastasiya/PycharmProjects/bachelor/rutexts/*.xhtml"))
for text in list:
print(text)
with open(text, "r", encoding="windows-1251") as file:
with open("ruscorpus.txt", "a") as file2:
for line in file:
soup = BeautifulSoup(line, "lxml")
if soup.w is not None:
file2.write("{wort}\t{gr}\t{lex}\n".format(
lex=soup.w.ana.get('lex'),
gr=test(soup.w.ana.get('gr')),
wort=soup.w.contents[-1]))
我嘗試從xml獲取一些信息。格式是這樣的。 的運行程序,但如果我們有2個字1瓦特標籤,它採取的第一個與整個標籤輸出: xml with BeautifulSoup
爲什麼你是聰明人讀你的'xml'數據線? –