我再次遇到一個非常特殊的問題。我對VBA非常陌生,尤其是HTML,所以請耐心等待。我已經在VBA中構建了一個功能正常的網頁抓取工具,但是我想完成一些我無法弄清楚的具體任務。如何從VBA中的特定父元素HTML元素的子元素中獲取數據
Here is the HTML sample that my question refers to.
我已經更換了所有不與橢圓重要的東西。我想刮掉的重要部分是「a」標籤,「數據簡寫」(或只是innerText
)。這是一個網站名稱。有多達五個,但並不總是五個。這也只是列出最多五個網站的兩個部分之一。發佈的部分有<div class="referralsSites referring">
的子女,另一個有子女<div class="referralsSites destination">
。
我想分配給「Up」變量的第一個網站分配給「Up1」,第二個分配給「Up2」等,但僅取決於有多少個網站在「轉介」部分。我想在「目標」部分執行相同的操作,但是根據目標站點的數量,分配給「下變量」(Down1,Down2等)。
例如,如果我只是使用getElementsByClassName("websitePage-listItemLink js-tooltipTarget")
,我將無法區分推介網站和目標網站。
這是到目前爲止我的代碼:
Sub GetSimilarWebData()
Dim appIE As InternetExplorer
Dim HTML As HTMLDocument
Dim ieWindow As SHDocVw.InternetExplorer
Dim URL As String
Dim Rankings As IHTMLElementCollection, Traffic As IHTMLElementCollection, ReferSites As IHTMLElementCollection, DestSites As IHTMLElementCollection, _
rSite As IHTMLElement, rSiteNo As Long, dSite As IHTMLElement, dSiteNo As Long, GlobalRank As String, CountryName As String, CountryRank As String, _
Visits As String, Direct As String, Refer As String, Search As String, Social As String, Display As String, _
Up1 As String, Up2 As String, Up3 As String, Up4 As String, Up5 As String, _
D1 As String, D2 As String, D3 As String, D4 As String, D5 As String
Dim FraudLast As Long
CheckLast = Worksheets("Sheet1").Range("I1").End(xlDown).Offset(1).Row
webStr = Worksheets("Sheet1").Range("A" & CheckLast).Value
Set appIE = New InternetExplorer
appIE.Visible = False
appIE.navigate "https://www.similarweb.com/website/" & webStr
Do While appIE.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Connecting to SimilarWeb..."
DoEvents
Loop
Set HTML = appIE.document
Set appIE = Nothing
Application.StatusBar = ""
Set Rankings = HTML.getElementsByClassName("rankingItem-value")
GlobalRank = Rankings(0).innerText
If GlobalRank = "N/A" Then
GlobalRank = "null"
CountryName = "null"
CountryRank = "null"
Else
CountryName = HTML.getElementsByClassName("rankingItem-subTitle")(1).innerText
CountryRank = Rankings(1).innerText
End If
Visits = HTML.getElementsByClassName("engagementInfo-value engagementInfo-value--large u-text-ellipsis")(0).innerText
If InStr(Visits, "M") <> 0 Then
Visits = Replace(Visits, ".", "")
Visits = Replace(Visits, "M", "00000")
ElseIf InStr(Visits, "K") <> 0 Then
Visits = Replace(Visits, ".", "")
Visits = Replace(Visits, "K", "00")
ElseIf InStr(Visits, "B") <> 0 Then
Visits = Replace(Visits, ".", "")
Visits = Replace(Visits, "B", "00000000")
End If
Set Traffic = HTML.getElementsByClassName("trafficSourcesChart-value")
Direct = Traffic(0).innerText
Refer = Traffic(1).innerText
Search = Traffic(2).innerText
Social = Traffic(3).innerText
Display = Traffic(4).innerText
'Here's what I've started off with:
Set ReferSite = HTML.getElementsByClassName("referralsSites referring")
rSiteNo = ReferSite.Length
Set DestSite = HTML.getElementsByClassName("referralsSites destination")
dSiteNo = DestSite.Length
'For Each rSite In ReferSite
End Sub
我真的不知道如何來解決這個問題。我的代碼中的其他所有代碼都可以正常工作,但是當然如果有什麼我可以做的來提高速度也是值得歡迎的。
所有這些都是指similarweb.com上的數據。
謝謝!!這正是我所期待的。我沒有意識到'getElementsByClassName()'也可以訪問子元素。 – TradorDave