這不是我第一次在XML庫中使用htmlParse
時遇到問題,但過去我剛剛放棄並使用regex來解析我需要的東西。我寧願通過解析XML/XHTML來完成,因爲我們都知道正則表達式不是解析器。在R的XML庫中調試htmlParse
這就是說,我發現從解析命令的錯誤信息是最好的沒有幫助,我不知道如何繼續。例如:
> htmlParse(getForm("http://www.takecarehealth.com/LocationSearchResults.aspx", location_query="Deer Park",location_distance=50))
Error in htmlParse(getForm("http://www.takecarehealth.com/LocationSearchResults.aspx", :
File
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head id="ctl00_Head1">
<title></title>
<script language="JavaScript" type="text/javascript">
var s_pageName = document.title;
var s_channel = "Take Care";
var s_campaign = "";
var s_eVar1 = ""
var s_eVar2 = ""
var s_eVar22 = ""
var s_eVar23 = ""
</script>
<meta name="keywords" content="take care clinic, walgreens clinic, walgreens take care clinic, take care health, urgent care clinic, walk in clinic" />
<meta name="description" content="Information about simple, quality healthcare for the whole family from Take Care Clinics at select Walgreens, including Take Care Clinic hours, providers, offers, insurance and quality of care." />
<link rel="shortcut icon" hre
我很高興它看到的東西在那裏,但我在哪裏鑽取「錯誤:文件」?
請注意,據我所知,這是形式良好的XHTML。當我訪問link manually時,我可以運行xpaths並且Firebug不會抱怨。
如何從這樣的htmlParse調試錯誤?
@ttmaccer有趣。畢竟這是一個畸形的代碼問題。 – 2012-07-29 23:12:00
這很有道理。謝謝。 – 2012-07-29 23:25:13