我有span元素一些HTML文件:如何解決用西里爾文符號解析html文件的問題?
<html>
<body>
<span class="one">Text</span>some text</br>
<span class="two">Привет</span>Текст на русском</br>
</body>
</html>
得到 「一些文本」:
# -*- coding:cp1251 -*-
import lxml
from lxml import html
filename = "t.html"
fread = open(filename, 'r')
source = fread.read()
tree = html.fromstring(source)
fread.close()
tags = tree.xpath('//span[@class="one" and text()="Text"]') #This OK
print "name: ",tags[0].text
print "value: ",tags[0].tail
tags = tree.xpath('//span[@class="two" and text()="Привет"]') #This False
print "name: ",tags[0].text
print "value: ",tags[0].tail
這個節目:
name: Text
value: some text
Traceback: ... in line `tags = tree.xpath('//span[@class="two" and text()="Привет"]')`
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes
如何解決這個問題呢?
它也不起作用,我試着在Windows XP下運行, – HammerSpb 2010-11-15 09:49:52
我在Linux上做過,請繼續,我將啓動我的XP虛擬機,看看我能不能在XP上找到它 – 2010-11-15 09:56:12
謝謝Chris!在XP下這是ANSI文件 – HammerSpb 2010-11-15 10:03:05