2011-03-25 106 views
2
一個XHTML頁面

你好,我試圖解析與蟒蛇XHTML頁面,但我收到此錯誤:問題解析使用Python

**xml.parsers.expat.ExpatError: unbound prefix: line 6, column 0** 

[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1] mod_wsgi (pid=9156): Exception occurred processing WSGI script '/home/hidura/webapps/karinapp/Suite/Gate.py'. 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1] Traceback (most recent call last): 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1] File "/home/hidura/webapps/karinapp/Suite/Gate.py", line 32, in application 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]  response = assistant(buildReq.extrctEnv(environ, location))#Here the assistant takes the parameters and begins the work 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1] File "/home/hidura/webapps/karinapp/Suite/wsgi/Utilities/Assistant/Assistant.py", line 114, in __init__ 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]  self.websearch()#Finding the web. 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1] File "/home/hidura/webapps/karinapp/Suite/wsgi/Utilities/Assistant/Assistant.py", line 364, in websearch 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]  websource = self.manage.string2parse(result[0][1])#Transforming the web page into a tree. 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1] File "/home/hidura/webapps/karinapp/Suite/wsgi/Writer/tagsmanip.py", line 56, in string2parse 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]  self.doc = parseString(newData) 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1] File "/usr/local/lib/python3.1/xml/dom/minidom.py", line 1937, in parseString 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]  return expatbuilder.parseString(string) 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1] File "/usr/local/lib/python3.1/xml/dom/expatbuilder.py", line 940, in parseString 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]  return builder.parseString(string) 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1] File "/usr/local/lib/python3.1/xml/dom/expatbuilder.py", line 223, in parseString 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]  parser.Parse(string, True) 
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1] xml.parsers.expat.ExpatError: unbound prefix: line 6, column 0 

這是網頁的代碼:

<HTML xmlns:fb="http://www.facebook.com/2008/fbml"><HEAD><TITLE id="ttl">KarinApp(Karina application web maker)</TITLE><LINK id="css_front_1" type="text/css" href="http://www.karinapp.com/modules/front/css/main.css" rel="stylesheet"/><SCRIPT type="text/javascript" id="jQuery-front" src="/modules/general/scripts/jQuery.js"><!--empty--></SCRIPT><SCRIPT type="text/javascript" id="gnrlScrpt" src="/modules/general/scripts/general.js"><!--empty--></SCRIPT><SCRIPT type="text/javascript" id="ctchScrpt" src="/modules/general/scripts/Catcher.js"><!--empty--></SCRIPT><SCRIPT type="text/javascript" id="pdloadScr" src="/modules/general/scripts/loadPage.js"><!--empty--></SCRIPT><SCRIPT type="text/javascript" id="pdLoader">window.onload = function(){postLoad(); 
     } 
function __init__(){main();}</SCRIPT><LINK id="link1" href="/modules/front/css/jquery-ui-1.8.10.custom.css" type="text/css" rel="stylesheet"/><SCRIPT id="script5" src="/modules/front/scripts/ui/jquery.ui.core.js"><!--empty--></SCRIPT><SCRIPT id="script6" src="/modules/front/scripts/ui/jquery.ui.widget.js"><!--empty--></SCRIPT><SCRIPT id="script8" src="/modules/front/scripts/ui/jquery.ui.button.js"><!--empty--></SCRIPT><SCRIPT id="script10" src="/modules/front/scripts/main.js"><!--empty--></SCRIPT><SCRIPT id="script9"><!--empty--></SCRIPT><SCRIPT id="script11" type="text/javascript" src="http://connect.facebook.net/en_US/all.js#appId=150388711687556&amp;amp;xfbml=1"><!--empty--></SCRIPT></HEAD><BODY id="body"><IMG id="logo" father="@body" src="/modules/front/image/logo.png"/><DIV id="comments" father="@body"><!--Comment--><DIV id="fbK" father="@comments"><IFRAME src="http://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fpages%2FKarinapp%2F150388711687556&amp;width=295&amp;colorscheme=light&amp;show_faces=false&amp;stream=true&amp;header=false&amp;height=300" scrolling="no" frameborder="1" style="border:none; overflow:hidden; width:295px; height:300px;" allowtransparency="false">&amp;lt;!--empty--&amp;gt;</IFRAME> 

<LIKE-BOX href="http://www.facebook.com/pages/Karinapp/150388711687556" width="295" show_faces="false" stream="true" header="false"><!--empty--></LIKE-BOX></DIV></DIV><DIV id="head" father="@body"><!--Comment--></DIV><A id="fb" father="@body" href="http://www.facebook.com/karinapp#!/pages/Karinapp/150388711687556" border="0"><IMG src="/modules/front/image/fb.png" father="@fb"/></A><A id="tw" father="@body" href="http://www.twitter.com/#!/karinappm" border="0"><IMG src="/modules/front/image/tw.png" father="@tw"/></A><DIV id="div4" father="@body"><DIV id="fb-root"><!--empty--></DIV> 
<FB:LOGIN-BUTTON xmlns:fb="http://www.facebook.com/2008/fbml" show-faces="true" width="250" max-rows="1"/></DIV></BODY></HTML> 

提前致謝!

回答

2

問題是外籍人士使用fb作爲命名空間前綴,但該標籤是FB:LOGIN-BUTTON。外國人看到FB不受約束。 XHTML規範指出,自XML以來,所有HTML元素和屬性must be lowercase都區分大小寫。

我使用lxml XML parser來試用您的文檔,並將前綴自動轉換爲小寫。也許你可以切換到不同的解析器:

import lxml.etree 
data = open('fb.xhtml', 'rb').read() 
tree = lxml.etree.fromstring(data) 
ns_map = {'fb': 'http://www.facebook.com/2008/fbml'} 
print tree.xpath('.//fb:LOGIN-BUTTON', namespaces=ns_map) 

輸出:

[<Element {http://www.facebook.com/2008/fbml}LOGIN-BUTTON at 1011fa260>] 
+0

謝謝!男人爲你的答案問題是在大寫! – hidura 2011-03-25 18:16:06

-1

我認爲這個問題是http://www.facebook.com/2008/fbml是未找到頁面

+0

感謝,其中在利用字母 – hidura 2011-03-25 18:16:53

+0

一個命名空間標識是剛纔那個問題:一個標識符。它不需要引用現有頁面。 – 2011-09-08 14:58:53

+0

你是對的我的回答錯了 – 2011-09-14 18:21:49