所以我有以下保存到一個文本文件,但使用重新提取數據仍然不返回我什麼腳本代碼。我的代碼是:
file_object = open('source_test_script.txt', mode="r")
soup = BeautifulSoup(file_object, "html.parser")
pattern = re.compile(r"^var (chart[0-9]+) = new Highcharts.Chart\(({.*?})\);$", re.MULTILINE | re.DOTALL)
scripts = soup.find("script", text=pattern)
profile_text = pattern.search(scripts.text).group(1)
profile = json.loads(profile_text)
print profile["data"], profile["categories"]
我想從網站中提取該圖表的數據。以下是圖表的源代碼。
<script type="text/javascript">
jQuery(function() {
var chart1 = new Highcharts.Chart({
chart: {
renderTo: 'chart1',
defaultSeriesType: 'column',
borderWidth: 2
},
title: {
text: 'Productions'
},
legend: {
enabled: false
},
xAxis: [{
categories: [1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016],
}],
yAxis: {
min: 0,
title: {
text: 'Productions'
}
},
series: [{
name: 'Productions',
data: [1,1,0,1,6,4,9,15,15,19,24,18,53,42,54,53,61,36]
}]
});
});
</script>
有幾個圖表,例如,從網站,叫「chart1」,「chart2」等我想提取如下的數據:類線和數據線,每個圖表:
categories: [1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016]
data: [1,1,0,1,6,4,9,15,15,19,24,18,53,42,54,53,61,36]
我相信你可以使用硒這樣的東西,例如:http://stackoverflow.com/questions/10455130/can-selenium-web-driver-have-access-to-javascript-global-variables – CasualDemon
是啊我使用硒來解析html內容。我的代碼是: [code] req = urllib2.Request(productions_url,headers = {'User-Agent':'Mozilla/5.0(X11; Linux x86_64; rv:27.0)Gecko/20100101 Firefox/27.0'}) p = urllib2.urlopen(req) soup = BeautifulSoup(p.readlines()[0],'html.parser')[/ code]。我的問題是一旦我解析HTML,如何提取這2個特定的行。 – Ilumtics
HTML解析器不會幫助你,因爲那是JavaScript。所以,你必須自己解析它。 – zvone