3
我使用lxml.html解析html文件,但是,我需要獲取用作樣式表中選擇器的HTML類,例如['c1','c2','c3','c4','c5','c6']
以及相應的樣式信息。使用Python提取CSS樣式聲明
我提取的風格部分作爲字符串嘗試解析它使用cssutils.parseString
但我結束了這一點:
ERROR CSSStyleRule: No start { of style declaration found: u'<style type="text/css"> ' [1:28: ;]
ERROR Selector: Unexpected CHAR. [1:1: <]
ERROR Selector: Unexpected CHAR. [1:12: =]
ERROR Selector: Unexpected STRING. [1:13: "text/css"]
ERROR Selector: Unexpected CHAR. [1:24: &]
ERROR SelectorList: Invalid Selector: <style type="text/css">
ERROR CSSStyleRule: No style declaration or "}" found: u'<style type="text/css"> '
ERROR CSSStyleRule: No start { of style declaration found: u' ' [2:47: ;]
ERROR Selector: Unexpected CHAR. [2:43: &]
ERROR SelectorList: Invalid Selector: 
ERROR CSSStyleRule: No style declaration or "}" found: u' '
ERROR CSSStyleRule: No start { of style declaration found: u' ' [3:33: ;]
ERROR Selector: Unexpected CHAR. [3:29: &]
ERROR SelectorList: Invalid Selector: 
ERROR CSSStyleRule: No style declaration or "}" found: u' '
ERROR CSSStyleRule: No start { of style declaration found: u' ' [4:32: ;]
ERROR Selector: Unexpected CHAR. [4:28: &]
ERROR SelectorList: Invalid Selector: 
ERROR CSSStyleRule: No style declaration or "}" found: u' '
ERROR CSSStyleRule: No start { of style declaration found: u' ' [5:34: ;]
ERROR Selector: Unexpected CHAR. [5:30: &]
ERROR SelectorList: Invalid Selector: 
ERROR CSSStyleRule: No style declaration or "}" found: u' '
ERROR CSSStyleRule: No start { of style declaration found: u' ' [6:34: ;]
ERROR Selector: Unexpected CHAR. [6:30: &]
ERROR SelectorList: Invalid Selector: 
ERROR CSSStyleRule: No style declaration or "}" found: u' '
ERROR CSSStyleRule: No start { of style declaration found: u' ' [7:53: ;]
ERROR Selector: Unexpected CHAR. [7:49: &]
ERROR SelectorList: Invalid Selector: 
ERROR CSSStyleRule: No style declaration or "}" found: u' '
ERROR CSSStyleRule: No start { of style declaration found: u'</style>' [8:13: ]
ERROR Selector: Unexpected CHAR. [8:5: <]
ERROR Selector: Unexpected CHAR. [8:6: /]
ERROR Selector: Cannot end with combinator: </style>
ERROR SelectorList: Invalid Selector: </style>
ERROR CSSStyleRule: No style declaration or "}" found: u'</style>'
<cssutils.css.CSSStyleSheet object encoding='utf-8' href=None media=None title=None namespaces={} at 0x308ca90>
我怎樣才能解決這個問題?
<style type="text/css">
p.c6 {font-weight: bold; text-align: left}
p.c5 {font-weight: bold}
p.c4 {text-align: left}
td.c3 {font-weight: bold}
p.c2 {text-align: center}
p.c1 {font-weight: bold; text-align: center}
</style>
我想你應該擺脫周邊'style'標籤和印刷換行符('' )與沒有這些錯誤。 – feeela
爲了擺脫所有的錯誤,還有很多東西需要刪除。由html tidy生成:) – root