2016-11-27 83 views
-2

我正在學習Python的拆分技術,但我陷入了抓取Ajax頁面like this one的問題。如何使用python來刪除Ajax網頁

我想要取消所有進入頁面的藥物名稱和詳細信息。由於我閱讀了堆棧溢出的大部分答案,但是我沒有在報廢之後獲得正確的數據。我也嘗試使用硒廢料或發送僞造郵寄請求,但它失敗了。

所以請特意幫我解決這個Ajax特別是這個頁面的問題,因爲從下拉選項中選擇一個選項會觸發ajax。 也請爲我提供一些ajax頁面報廢資源。

//使用硒

from selenium import webdriver 
import bs4 as bs 
import lxml 
import requests 

path_to_chrome = '/home/brutal/Desktop/chromedriver' 

browser = webdriver.Chrome(executable_path = path_to_chrome) 

url = 'https://www.gianteagle.com/Pharmacy/Savings/4-10-Dollar-Drug-Program/Generic-Drug-Program/' 

browser.get(url) 
browser.find_element_by_xpath('//*[@id="ctl00_RegionPage_RegionPageMainContent_RegionPageContent_userControl_StateList"]/option[contains(text(), "Ohio")]').click() 

new_url = browser.current_url 
r = requests.get(new_url) 
print(r.content) 
+0

你能告訴我們什麼你曾嘗試硒與例如? –

+0

我試圖觸發城市選項,以便新的ajax頁面加載,我可以通過網址並獲取藥物表數據,但由於頁面加載相同的網址我得到了廢料數據,但沒有我需要的信息 –

+1

你應該提供一些代碼 –

回答

1

ChromeDriver您可以下載here

normalize-space是爲了從網絡文本中刪除垃圾使用,如x0

from time import sleep 
from selenium import webdriver 
from lxml.html import fromstring 

data = {} 

driver = webdriver.Chrome('PATH TO YOUR DRIVER/chromedriver') # i.e '/home/superman/www/myproject/chromedriver' 
driver.get('https://www.gianteagle.com/Pharmacy/Savings/4-10-Dollar-Drug-Program/Generic-Drug-Program/') 

# Loop states 
for i in range(2, 7): 
    dropdown_state = driver.find_element(by='id', value='ctl00_RegionPage_RegionPageMainContent_RegionPageContent_userControl_StateList') 

    # open dropdown 
    dropdown_state.click() 

    # click state 
    driver.find_element_by_xpath('//*[@id="ctl00_RegionPage_RegionPageMainContent_RegionPageContent_userControl_StateList"]/option['+str(i)+']').click() 

    # let download the page 
    sleep(3) 

    # prepare HTML 
    page_content = driver.page_source 
    tree = fromstring(page_content) 

    state = tree.xpath('//*[@id="ctl00_RegionPage_RegionPageMainContent_RegionPageContent_userControl_StateList"]/option['+str(i)+']/text()')[0] 
    data[state] = [] 

    # Loop products inside the state 
    for line in tree.xpath('//*[@id="ctl00_RegionPage_RegionPageMainContent_RegionPageContent_userControl_gridSearchResults"]/tbody/tr[@style]'): 
     med_type = line.xpath('normalize-space(.//td[@class="medication-type"])') 
     generic_name = line.xpath('normalize-space(.//td[@class="generic-name"])') 

     brand_name = line.xpath('normalize-space(.//td[@class="brand-name hidden-xs"])') 
     strength = line.xpath('normalize-space(.//td[@class="strength"])') 
     form = line.xpath('normalize-space(.//td[@class="form"])') 

     qty_30_day = line.xpath('normalize-space(.//td[@class="30-qty"])') 
     price_30_day = line.xpath('normalize-space(.//td[@class="30-price"])') 

     qty_90_day = line.xpath('normalize-space(.//td[@class="90-qty hidden-xs"])') 
     price_90_day = line.xpath('normalize-space(.//td[@class="90-price hidden-xs"])') 

     data[state].append(dict(med_type=med_type, 
           generic_name=generic_name, 
           brand_name=brand_name, 
           strength=strength, 
           form=form, 
           qty_30_day=qty_30_day, 
           price_30_day=price_30_day, 
           qty_90_day=qty_90_day, 
           price_90_day=price_90_day)) 

print('data:', data) 
driver.quit() 
+0

非常感謝。你可以告訴我爲什麼這行結尾有[0] state = tree.xpath('// * [@ id =「ctl00_RegionPage_RegionPageMainContent_RegionPageContent_userControl_StateList」]/option ['+ str(i)+']/text ()')[0] –

+0

@Abhinavrawat,因爲在這種情況下(當不使用'normalize-space'時)的'tree.xpath'返回列表,例如['Ohio']。使用[0]我從列表中提取值,因爲我們不需要列表 - 我們需要實際值:)。不要忘記接受答案:) – TitanFighter

+0

k thanx再一次 –