2016-04-25 53 views
1

我有這個網站: http://ga.healthinspections.us/georgia/search.cfm?start=21&1=1&f=s&r=ANY&s=&inspectionType=Food&sd=03/26/2016&ed=04/25/2016&useDate=NO&county=Fulton&使用VBA從網站刮,但它不工作。該怎麼辦?

,我已經寫的代碼,但即使是第一頁不工作。我的目標是提取如下設置細節,例如,從每一頁:

Column 1: 103 West Lounge (Food Service Inspections) 
Column 2: 103 WEST PACES FERRY RD ATLANTA, GA 30318 
(Skip this detail) View inspections: 
Column 3: July 10, 2012 Score: 92, Grade: A 
Column 4): July 26, 2013 Score: 90, Grade: A 
Column 5): February 19, 2014 Score: 98, Grade: A 
Column 6): December 12, 2014 Score: 100, Grade: A 
Column 6): November 13, 2015 Score: 99, Grade: A 

目前的代碼提取物無處唯一URL沒有任何細節,需要回顧一下改變或者是錯誤的:

Sub Test() 
Dim IE As New InternetExplorer 
Dim html As HTMLDocument 
Dim link As Object 
Dim ws As Worksheet 

Set ws = Sheets("Sheet1") 

Application.ScreenUpdating = False 
Set IE = New InternetExplorer 

' Test 2 pages (page 2 and page 3) starting from page 2. So far so good. 
For i = 2 To 4 Step 2 

myurl = "http://ga.healthinspections.us/georgia/search.cfm?start=" & i & "1&1=1&f=s&r=ANY&s=&inspectionType=Food&sd=03/26/2016&ed=04/25/2016&useDate=NO&county=Fulton&" 
IE.Visible = False 
IE.navigate myurl 
Do 
DoEvents 
Loop Until IE.readyState = READYSTATE_COMPLETE 

Set html = IE.document 
' I assume here is the problem, because I need to supplement code part to find these details. 
Set link = html.getElementsByTagName("a") 

' This part was intended to test if I can to extract at least one detail. 
For m = 1 To 2 
For Each myurl In link 
Cells(m, 1) = link 

Next 
Next m 
Next i 
'Also I tried to test with msgbox but no luck either 
'MsgBox link 

IE.quit 
Set IE = Nothing 
Application.StatusBar = "" 
Application.ScreenUpdating = True 

End Sub 

也許有些東西被搞砸了,或者我缺乏知識。 :)希望得到任何幫助。

回答

0

您是否已設置參考?對於Microsoft Internet控件和Microsoft HTML對象庫?如果是這樣,請嘗試替換你的代碼部分。

Dim IE As New InternetExplorer 
Dim html As MSHTML.HTMLDocument 
Dim link As Object 
Dim ws As Worksheet 

Set ws = Sheets("Sheet1") 

Application.ScreenUpdating = False 
Set IE = New InternetExplorer 
+0

當然,我已經啓用了這兩個庫,但沒有運氣。還將Dim html更改爲MSHTML.HTMLDocument。代碼本身沒有錯誤地運行,但它從某處提取URL,這不是我正在尋找。所有我知道的東西不提取,因爲設置link = html.getElementsByTagName(「a」)或其他地方。 – spriteup

0

您可以使用下面的方法獲取無限文本。

Sub DumpData() 

Set IE = CreateObject("InternetExplorer.Application") 
IE.Visible = True 

URL = "http://ga.healthinspections.us/georgia/search.cfm?start=1&1=1&f=s&r=ANY&s=&inspectionType=Food&sd=03/26/2016&ed=04/25/2016&useDate=NO&county=Fulton&" 

'Wait for site to fully load 
IE.Navigate2 URL 
Do While IE.Busy = True 
    DoEvents 
Loop 

RowCount = 1 

With Sheets("Sheet1") 
    .Cells.ClearContents 
    RowCount = 1 
    For Each itm In IE.Document.all 
     .Range("A" & RowCount) = itm.tagName 
     .Range("B" & RowCount) = itm.ID 
     .Range("C" & RowCount) = itm.className 
     .Range("D" & RowCount) = Left(itm.innerText, 1024) 

     RowCount = RowCount + 1 
    Next itm 
End With 
End Sub 

我從一個名叫喬爾的好人那裏得到了這個。他是這種東西的智者。

一旦你的數據被導入到你的工作表中,做一些簡單的清理,擺脫多餘的東西,你應該全部設置。

+0

嗨。謝謝Joel和你。它至少有一些東西,但我不會爲893頁進行手動清理。有太大的混亂。 :) – spriteup