從網站解析Python XML

我想解析一個網站。我被卡住了。我將在下面提供XML。它來自網站。我有兩個問題。什麼是從網站上讀取XML的最佳方式，然後我無法挖掘到XML來獲得我需要的速度。從網站解析Python XML

我需要後面的數字爲基礎：OBS_VALUE 0.12

我到目前爲止有：

from xml.dom import minidom 
import urllib 


document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r') 
web = urllib.urlopen(document) 
get_web = web.read() 
xmldoc = minidom.parseString(document) 

ff_DataSet = xmldoc.getElementsByTagName('ff:DataSet')[0] 

ff_series = ff_DataSet.getElementsByTagName('ff:Series')[0] 

for line in ff_series: 
    price = line.getElementsByTagName('base:OBS_VALUE')[0].firstChild.data 
    print(price)

XML代碼網站：

-<Header> <ID>FFD</ID> 
<Test>false</Test> 
<Name xml:lang="en">Federal Funds daily averages</Name> <Prepared>2013-05-08</Prepared> 
<Sender id="FRBNY"> <Name xml:lang="en">Federal Reserve Bank of New York</Name> 
<Contact> 
<Name xml:lang="en">Public Information Web Team</Name> <Email>[email protected]</Email> 
</Contact> 
</Sender> 
<!--ReportingBegin></ReportingBegin--> 
</Header> 
<ff:DataSet> -<ff:Series TIME_FORMAT="P1D" DISCLAIMER="G" FF_METHOD="D" DECIMALS="2" AVAILABILITY="A"> 
<ffbase:Key> 
<base:FREQ>D</base:FREQ> 
<base:RATE>FF</base:RATE> 
<base:MATURITY>O</base:MATURITY> 
<ffbase:FF_SCOPE>D</ffbase:FF_SCOPE> 
</ffbase:Key> 
<ff:Obs OBS_CONF="F" OBS_STATUS="A"> 
<base:TIME_PERIOD>2013-05-07</base:TIME_PERIOD> 
<base:OBS_VALUE>0.12</base:OBS_VALUE>

來源

2013-05-08 Trying_hard

如果你想堅持xml.dom.minidom，試試這個...

from xml.dom import minidom 
import urllib 

url_str = 'http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily' 
xml_str = urllib.urlopen(xml_str).read() 
xmldoc = minidom.parseString(xml_str) 

obs_values = xmldoc.getElementsByTagName('base:OBS_VALUE') 
# prints the first base:OBS_VALUE it finds 
print obs_values[0].firstChild.nodeValue 

# prints the second base:OBS_VALUE it finds 
print obs_values[1].firstChild.nodeValue 

# prints all base:OBS_VALUE in the XML doc 
for obs_val in obs_values: 
    print obs_val.firstChild.nodeValue

然而，如果你想使用lxml，請使用underrun的解決方案。此外，您的原始代碼有一些錯誤。你實際上試圖解析文檔變量，這是網址。你需要解析從網站返回的xml，在你的例子中是get_web變量。

來源

2013-05-08 13:52:21 b10hazard

謝謝。我需要使用minidom。感謝您的更正。 – 2013-05-08 13:58:39

添加的信息讚賞 – 2013-05-08 14:01:58

爲什麼您將url_str更改爲xml_str？應該是： xml_str = urllib.urlopen（url_str）.read（） – Moulde 2016-01-30 18:35:17

看看你的代碼：

document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r') 
web = urllib.urlopen(document) 
get_web = web.read() 
xmldoc = minidom.parseString(document)

我不確定你的文檔是否正確，除非你想要http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=dailyr，因爲這就是你會得到的（在這種情況下，parens組和串聯列出的字符串彼此自動連接）。

之後，你做了一些工作來創建get_web，但是你不在下一行使用它。相反，你試圖解析你的document這是網址...

除此之外，我會完全建議你使用ElementTree，最好是lxml的ElementTree（http://lxml.de/）。另外，lxml的etree解析器還有一個類似文件的對象，它可以是一個urllib對象。如果你沒有，理順您的文檔後剩下的，你可以這樣做：

from lxml import etree 
from io import StringIO 
import urllib 

url = 'http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily' 
root = etree.parse(urllib.urlopen(url)) 

for obs in root.xpath('/ff:DataSet/ff:Series/ff:Obs'): 
    price = obs.xpath('./base:OBS_VALUE').text 
    print(price)

來源

2013-05-08 13:39:54 underrun

從網站解析Python XML

回答

相關問題