2012-10-23 68 views
3

我使用lxml.html解析html文件,但是,我需要獲取用作樣式表中選擇器的HTML類,例如['c1','c2','c3','c4','c5','c6']以及相應的樣式信息。使用Python提取CSS樣式聲明

我提取的風格部分作爲字符串嘗試解析它使用cssutils.parseString但我結束了這一點:

ERROR CSSStyleRule: No start { of style declaration found: u'<style type="text/css">&#13;' [1:28: ;] 
ERROR Selector: Unexpected CHAR. [1:1: <] 
ERROR Selector: Unexpected CHAR. [1:12: =] 
ERROR Selector: Unexpected STRING. [1:13: "text/css"] 
ERROR Selector: Unexpected CHAR. [1:24: &] 
ERROR SelectorList: Invalid Selector: <style type="text/css">&#13 
ERROR CSSStyleRule: No style declaration or "}" found: u'<style type="text/css">&#13;' 
ERROR CSSStyleRule: No start { of style declaration found: u'&#13;' [2:47: ;] 
ERROR Selector: Unexpected CHAR. [2:43: &] 
ERROR SelectorList: Invalid Selector: &#13 
ERROR CSSStyleRule: No style declaration or "}" found: u'&#13;' 
ERROR CSSStyleRule: No start { of style declaration found: u'&#13;' [3:33: ;] 
ERROR Selector: Unexpected CHAR. [3:29: &] 
ERROR SelectorList: Invalid Selector: &#13 
ERROR CSSStyleRule: No style declaration or "}" found: u'&#13;' 
ERROR CSSStyleRule: No start { of style declaration found: u'&#13;' [4:32: ;] 
ERROR Selector: Unexpected CHAR. [4:28: &] 
ERROR SelectorList: Invalid Selector: &#13 
ERROR CSSStyleRule: No style declaration or "}" found: u'&#13;' 
ERROR CSSStyleRule: No start { of style declaration found: u'&#13;' [5:34: ;] 
ERROR Selector: Unexpected CHAR. [5:30: &] 
ERROR SelectorList: Invalid Selector: &#13 
ERROR CSSStyleRule: No style declaration or "}" found: u'&#13;' 
ERROR CSSStyleRule: No start { of style declaration found: u'&#13;' [6:34: ;] 
ERROR Selector: Unexpected CHAR. [6:30: &] 
ERROR SelectorList: Invalid Selector: &#13 
ERROR CSSStyleRule: No style declaration or "}" found: u'&#13;' 
ERROR CSSStyleRule: No start { of style declaration found: u'&#13;' [7:53: ;] 
ERROR Selector: Unexpected CHAR. [7:49: &] 
ERROR SelectorList: Invalid Selector: &#13 
ERROR CSSStyleRule: No style declaration or "}" found: u'&#13;' 
ERROR CSSStyleRule: No start { of style declaration found: u'</style>' [8:13: ] 
ERROR Selector: Unexpected CHAR. [8:5: <] 
ERROR Selector: Unexpected CHAR. [8:6: /] 
ERROR Selector: Cannot end with combinator: </style> 
ERROR SelectorList: Invalid Selector: </style> 
ERROR CSSStyleRule: No style declaration or "}" found: u'</style>' 
<cssutils.css.CSSStyleSheet object encoding='utf-8' href=None media=None title=None namespaces={} at 0x308ca90> 

我怎樣才能解決這個問題?

<style type="text/css">&#13; 
    p.c6 {font-weight: bold; text-align: left}&#13; 
    p.c5 {font-weight: bold}&#13; 
    p.c4 {text-align: left}&#13; 
    td.c3 {font-weight: bold}&#13; 
    p.c2 {text-align: center}&#13; 
    p.c1 {font-weight: bold; text-align: center}&#13; 
</style> 
+0

我想你應該擺脫周邊'style'標籤和印刷換行符('' )與沒有這些錯誤。 – feeela

+2

爲了擺脫所有的錯誤,還有很多東西需要刪除。由html tidy生成:) – root

回答

0

您用來獲取該錯誤消息的代碼是什麼?

CSS解析器(如cssutils或tinycss)期望您提供CSS源。看起來您正在給出HTML <style>元素的來源。您可以使用lxml.html來解析該HTML。然後,您應該提取您感興趣的節點的text屬性,而不是將它們序列化回HTML。

你的代碼應該是這樣的:

html = '<style>a.b &#123;}</style>' 
lxml.html.fromstring(html) 
for style_element in html.xpath('style') 
    # lmxl parses HTML entities like &#123; 
    assert style_element.text == 'a.b {}' 
    stylesheet = cssutils.parseString(style_element.text)