在您的問題中提出了幾個技術挑戰。
首先,是連接到服務器和檢索數據的簡單事情。正如您在下面的connect()
中看到的那樣,這非常簡單,只需創建一個套接字(s = socket.socket()
)並將其連接(s.connect(('hostname', port_number))
)即可。
下一個問題是以有用的形式檢索數據。該插座原生提供.recv()
,但我想要一個類似文件的界面。套接字模塊提供了Python獨有的方法:.makefile()
。 (return s.makefile('rb')
)
現在我們到了難關。 XML文檔通常每個文件存儲一個文檔,或者每個TCP傳輸存儲一個文檔。因此文檔的末尾很容易通過文件末尾指示或頭文件Content-Length:
發現。因此,Python XML API都沒有一種機制可以處理一個文件或一個字符串中的多個XML文檔。我寫了xml_partition()
來解決這個問題。 xml_partition()
從類文件對象中獲取數據並生成流中的每個XML文檔。 (注意:必須將XML文檔壓在一起,最後的>
之後不允許有空格)。
最後,還有一個簡短的測試程序(alerts()
),它連接到流並讀取一些XML文檔,將每個文檔存儲到它自己的文件中。
這裏是一個完整的程序,用於從Pelmorex的National Alert Aggregation &分發系統下載緊急警報。
import socket
import xml.etree.ElementTree as ET
def connect():
'Connect to pelmorex data stream and return a file-like object'
# Set up the socket
s = socket.socket()
s.connect(('streaming1.naad-adna.pelmorex.com', 8080))
return s.makefile('rb')
# We have to consume the XML data in bits and pieces
# so that we can stop precisely at the boundary between
# streamed XML documents. This function ensures that
# nothing follows a '>' in any XML fragment.
def partition(s, pattern):
'Consume a file-like object, and yield parts defined by pattern'
data = s.read(2048)
while data:
left, middle, data = data.partition(pattern)
while left or middle:
yield left
yield middle
left, middle, data = data.partition(pattern)
data = s.read(2048)
# Split the incoming XML stream into fragments (much smaller
# than an XML document.) The end of each XML document
# is guaranteed to align with the end of a fragment.
# Use an XML parser to determine the actual end of
# a document. Whenever the parser signals the end
# of an XML document, yield what we have so far and
# start a new parser.
def xml_partition(s):
'Read multiple XML documents from one data stream'
parser = None
for part in partition(s, b'>'):
if parser is None:
parser = ET.XMLPullParser(['start', 'end'])
starts = ends = 0
xml = []
xml.append(part)
parser.feed(part)
for event, elem in parser.read_events():
starts += event == "start"
ends += event == "end"
if starts == ends > 0:
# We have reached the end of the XML doc
parser.close()
parser = None
yield b''.join(xml)
# Typical usage:
def alerts():
for i, xml in enumerate(xml_partition(connect())):
# The XML is a bytes object that contains the undecoded
# XML stream. You'll probably want to parse it and
# somehow display the alert.
# I'm just saving it to a file.
with open('alert%d.xml' % i, 'wb') as fp:
fp.write(xml)
if i == 3:
break
def test():
# A test function that uses multiple XML documents in one
# file. This avoids the wait for a natural-disaster alert.
with open('multi.xml', 'rb') as fp:
print(list(xml_partition(fp)))
alerts()
「TCP feed」不是計算機網絡中的一項技術。你能否在你的問題上更精確? –
@Robᵩ我在帖子中增加了更多細節,對於第一次沒有那麼清楚感到遺憾。 – maldahleh