2016-12-30 111 views
0

所以我有幾個問題,我正試圖解決。使用re.findall解析javascript

首先我想解析這個javascript我從html獲得。

$(文件)。就緒(函數(){ $( '#商品展縮覽圖')bxSlider({模式: '垂直',自動: 假,控制:真,尋呼機: false,minSlides:4,maxSlides:4, moveSlides:1,slideWidth:250}); itemSelector('commodity-show-form','commodity-show-addcart-submit', [['color','選擇顏色'],['尺寸','選擇尺寸']],{「39805」:{「Smokey Blue/Mica Blue」,「36」]},「39806」:{「params 「:」Smokey Blue/Mica Blue「,」36,5「]},」39807「:{」params「:[」Smokey Blue/Mica Blue「,」37,5「]},」39808「 :{「params」:[「Smokey Blue/Mica 藍色「,」38「]},」39809「:{」params「:[」Smokey Blue/Mica Blue「, 」38,5「]},」39810「:{」params「/Mica Blue「,」39「]}, 」39811「:{」params「:[」Smokey Blue/Mica Blue「,」40「]},」39812「:{」Smokey Blue/Mica Blue「,」40,5「]},」39814「:{」params「: [」Smokey Blue/Mica Blue「,」42「]}}, [39805,39806,39807,39808,39809 ,39810,39811,39812,39814],'主推車', 'commodity-show-image'); });

res = re.findall(r'{ "params": (.+?)}', text) # text is where javascript text is stored 

final = [eval(i) for i in res] 

print(final) 

我得到以下輸出

[[ '煙燻藍/藍雲母', '36'],[ '煙燻藍/藍雲母','36,5 '],['煙燻藍/雲母藍','37,5'],['Smokey Blue/Mica Blue','38'],['Smokey Blue/Mica Blue','38,5'],['Smokey Blue/Mica Blue ['Smokey Blue/Mica Blue','40'],['Smokey Blue/Mica Blue','40,5'],['Smokey Blue/Mica Blue','42' ]]

但現在我不知道該怎麼走,從這裏開始。我想從

找到這個值39805 10

{「39805」:{「params」:[「Smokey Blue/Mica Blue」,「36」]}。我如何解析它,以便說如果我正在尋找與36相關的價值,它會給我39805?

我很抱歉,但我真的很糟糕的解析,我對此很新。

+0

好像你真的想要解析的結果,要像一個字典'{39805 ':' 36' , '39807': '37',.. 。'是對的嗎? – saulspatz

+0

你想抓取一個由JavaScript生成的內容的網頁? –

+0

@saulspatz是的。但是,我並沒有完整的詞典,而是在考慮解析特定的價值。像解析值爲36並獲得它的值,39805 – b0baboi

回答

0

你可以得到36這樣的:

import re 
import ast 

a="""$(document).ready(function() { $('#commodity-show-thumbnails').bxSlider({ mode: 'vertical', auto: false, controls: true, pager: false, minSlides: 4, maxSlides: 4, moveSlides: 1, slideWidth: 250 }); itemSelector('commodity-show-form', 'commodity-show-addcart-submit', [['color', 'Choose color'], ['size', 'Choose size']], { "39805": { "params": ["Smokey Blue/Mica Blue", "36"]}, "39806": { "params": ["Smokey Blue/Mica Blue", "36,5"]}, "39807": { "params": ["Smokey Blue/Mica Blue", "37,5"]}, "39808": { "params": ["Smokey Blue/Mica Blue", "38"]}, "39809": { "params": ["Smokey Blue/Mica Blue", "38,5"]}, "39810": { "params": ["Smokey Blue/Mica Blue", "39"]}, "39811": { "params": ["Smokey Blue/Mica Blue", "40"]}, "39812": { "params": ["Smokey Blue/Mica Blue", "40,5"]}, "39814": { "params": ["Smokey Blue/Mica Blue", "42"]} }, [39805,39806,39807,39808,39809,39810,39811,39812,39814], 'main-cart', 'commodity-show-image'); });""" 
b = re.findall(r'.*?({ ".*?} }).*}', a)[0] 

d1 = ast.literal_eval(b) 
print d1, '\n' 

for a,b in d1.iteritems(): 
    if b['params'][1]=='36': 
     print a 

輸出:

{'39809': {'params': ['Smokey Blue/Mica Blue', '38,5']}, '39808': {'params': ['Smokey Blue/Mica Blue', '38']}, '39805': {'params': ['Smokey Blue/Mica Blue', '36']}, '39807': {'params': ['Smokey Blue/Mica Blue', '37,5']}, '39806': {'params': ['Smokey Blue/Mica Blue', '36,5']}, '39812': {'params': ['Smokey Blue/Mica Blue', '40,5']}, '39814': {'params': ['Smokey Blue/Mica Blue', '42']}, '39810': {'params': ['Smokey Blue/Mica Blue', '39']}, '39811': {'params': ['Smokey Blue/Mica Blue', '40']}} 

39805 
+0

我實際上是在尋找價值與36.所以這將是39805 – b0baboi

+0

@ b0baboi修改。現在檢查。您可以通過點擊'tick'標記來接受答案。 http://meta.stackoverflow.com/a/251399/4082217 – MYGz

0

編輯:我剛剛意識到,在某些情況下,有大小兩個號碼,如 「36,5」 。我假設這意味着36和一半。不管怎麼說,我原來的劇本沒有考慮對於這一點,這就是爲什麼它給了錯誤的答案這裏的,似乎工作修訂腳本(這是我不小心沒有注意到。):

import re 
text='''$(document).ready(function() { $('#commodity-show-thumbnails').bxSlider({ mode: 'vertical', auto: false, controls: true, pager: false, minSlides: 4, maxSlides: 4, moveSlides: 1, slideWidth: 250 }); itemSelector('commodity-show-form', 'commodity-show-addcart-submit', [['color', 'Choose color'], ['size', 'Choose size']], { "39805": { "params": ["Smokey Blue/Mica Blue", "36"]}, "39806": { "params": ["Smokey Blue/Mica Blue", "36,5"]}, "39807": { "params": ["Smokey Blue/Mica Blue", "37,5"]}, "39808": { "params": ["Smokey Blue/Mica Blue", "38"]}, "39809": { "params": ["Smokey Blue/Mica Blue", "38,5"]}, "39810": { "params": ["Smokey Blue/Mica Blue", "39"]}, "39811": { "params": ["Smokey Blue/Mica Blue", "40"]}, "39812": { "params": ["Smokey Blue/Mica Blue", "40,5"]}, "39814": { "params": ["Smokey Blue/Mica Blue", "42"]} }, [39805,39806,39807,39808,39809,39810,39811,39812,39814], 'main-cart', 'commodity-show-image'); });''' 
pattern = re.compile(r' "([0-9]+).*?params.*?([0-9]+(,5)?)') 

s={b:a for a,b,_ in pattern.findall(text)} 

print(s['36'], s['36,5']) 

現在這個打印39805 39806,這對我來說很合適。

這裏的所有數據:

for a in sorted(s):print(a, s[a]) 
36 39805 
36,5 39806 
37,5 39807 
38 39808 
38,5 39809 
39 39810 
40 39811 
40,5 39812 
42 39814