你可以只用XHR和正則表達式做到這一點,而不是笨重的IE:
Sub Test()
Dim sContent
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US", False
.Send
sContent = .ResponseText
End With
With CreateObject("VBScript.RegExp")
.Pattern = ",""currInsiderVal"":(.*?),"
Range("A30").Value = .Execute(sContent).Item(0).SubMatches(0)
End With
End Sub
下面是代碼是如何工作的說明:
首先MSXML2.XMLHTTP
的ActiveX實例是創建。 GET請求以同步模式下的目標URL打開(執行中斷直到收到響應)。
Then VBScript.RegExp
is created。默認.IgnoreCase
,.Global
和.MultiLine
屬性是False
。模式是,"currInsiderVal":(.*?),
,其中(.*?)
是一個捕獲組,.
表示任何字符,.*
- 零個或多個字符,.*?
- 儘可能少的字符(惰性匹配)。模式中的其他字符可以被找到。 .Execute
方法返回匹配的集合,因爲.Global
是False
,所以只有一個匹配對象。該匹配對象具有一系列子匹配,因爲該模式包含唯一的捕獲組,所以只有一個子匹配。
上有正則表達式的一些有用的MSDN文章:
Microsoft Beefs Up VBScript with Regular Expressions
Introduction to Regular Expressions
這裏是我創建的代碼說明:
首先,我發現使用含網頁上的DOM目標值元素瀏覽器:
相應節點是:
<td align="right" id="currrentInsiderVal">143.51</td>
然後我做了XHR,發現在響應HTML這個節點上,但它並沒有包含值(你可以找到在網絡選項卡上的瀏覽器開發者工具響應刷新頁面後):
<td align="right" id="currrentInsiderVal">
</td>
這種行爲是DHTML的典型特徵。動態HTML內容是在加載網頁之後通過腳本生成的,無論是在通過XHR從網絡檢索數據還是僅處理已加載的網頁數據之後。然後,我只是搜索的響應值143.51
,位於JS函數中的片段,"currInsiderVal":143.51,
:
fundsArr = {"fundTotalHistVal":132.61,"mutualFunds":[[1,89,"#a71620"],[2,145,"#a71620"],[3,152,"#a71620"],[4,198,"#a71620"],[5,155,"#a71620"],[6,146,"#a71620"],[7,146,"#a71620"],[8,132,"#a71620"]],"insiderHisMaxVal":3.535,"institutions":[[1,273,"#283862"],[2,318,"#283862"],[3,351,"#283862"],[4,369,"#283862"],[5,311,"#283862"],[6,298,"#283862"],[7,274,"#283862"],[8,263,"#283862"]],"currFundData":[2,2202,"#a6001d"],"currInstData":[1,4370,"#283864"],"instHistMaxVal":369,"insiders":[[5,0.042,"#ff6c21"],[6,0.057,"#ff6c21"],[7,0.057,"#ff6c21"],[8,3.535,"#ff6c21"],[5,0],[6,0],[7,0],[8,0]],"currMax":4370,"histLineQuars":[[1,"Q2"],[2,"Q3"],[3,"Q4"],[4,"Q1<br>2015"],[5,"Q2"],[6,"Q3"],[7,"Q4"],[8,"Q1<br>2016"]],"fundHisMaxVal":198,"currInsiderData":[3,143,"#ff6900"],"currFundVal":2202.85,"quarters":[[1,"Q2"],[2,""],[3,""],[4,"Q1<br>2015"],[5,""],[6,""],[7,""],[8,"Q1<br>2016"]],"insiderTotalHistVal":3.54,"currInstVal":4370.46,"currInsiderVal":143.51,"use10YearData":"false","instTotalHistVal":263.74,"maxValue":369};
所以正則表達式模式基礎上創建,它應該找到片段,"currInsiderVal":<some text>,
其中<some text>
是我們的目標值。
你是一個絕對的傳奇人物,我花了好幾天的時間試圖做到這一點。我可以解釋一兩件事情,而不是無意識地使用它?如第二個..你已經使用currInsiderVal而不是currrentInsiderVal ..這是否意味着尋找具有這些字符的單詞,所以我將不得不確保我所尋找的是獨特的?並且該項目(0)是否查找第一個項目(本例中的項目實際上是什麼意思?)然後submatch(0)查找項目中的第一個元素??許多非常感謝您的幫助! –
@ AidanO'Farrell看看我添加的描述。 – omegastripes