如何將yaml.load_all與fileinput.input一起使用？

如果不採用''.join，PyyAML的yaml.load_all和fileinput.input()是否有Pythonic的方式來輕鬆地從多個來源流式傳輸多個文檔？如何將yaml.load_all與fileinput.input一起使用？

我正在尋找類似如下（非工作示例）：

# example.py 
import fileinput 

import yaml 

for doc in yaml.load_all(fileinput.input()): 
    print(doc)

預期輸出：

$ cat >pre.yaml <<<'--- prefix-doc' 
$ cat >post.yaml <<<'--- postfix-doc' 
$ python example.py pre.yaml - post.yaml <<<'--- hello' 
prefix-doc 
hello 
postfix-doc

當然，yaml.load_all預計是一個字符串，字節，或文件類對象和fileinput.input()是沒有這些東西，所以上述示例不起作用。

實際輸出：

$ python example.py pre.yaml - post.yaml <<<'--- hello' 
... 
AttributeError: FileInput instance has no attribute 'read'

可以使示例工作，''.join，但那是作弊。我正在尋找一種不會將整個流一次讀入內存的方式。

我們可能改寫這個問題作爲是否有某種方式來模擬一個字符串，字節，或類似文件的對象，代理爲一個字符串底層迭代器？但是，我懷疑yaml.load_all實際上需要整個類似文件的界面，因此，措辭會要求超過嚴格需要。

理想我正在尋找最小的適配器，將支持這樣的事情：

for doc in yaml.load_all(minimal_adapter(fileinput.input())): 
    print(doc)

來源

2016-09-06 CJ Gaconnet

您的minimal_adapter應以fileinput.FileInput作爲參數並返回load_all可以使用的對象。 load_all或者以一個字符串作爲參數，但這需要連接輸入，或者期望該參數具有read()方法。

由於您minimal_adapter需要保持一定的狀態，我覺得最明顯的/最容易實現其作爲具有__call__方法的類的實例，並有一個方法返回的實例，並保存它以供將來使用的參數。實現這樣的類也應該有一個read()方法，因爲這將移交的實例load_all後，被稱爲：

import fileinput 
import ruamel.yaml 


class MinimalAdapter: 
    def __init__(self): 
     self._fip = None 
     self._buf = None # storage of read but unused material, maximum one line 

    def __call__(self, fip): 
     self._fip = fip # store for future use 
     self._buf = "" 
     return self 

    def read(self, size): 
     if len(self._buf) >= size: 
      # enough in buffer from last read, just cut it off and return 
      tmp, self._buf = self._buf[:size], self._buf[size:] 
      return tmp 
     for line in self._fip: 
      self._buf += line 
      if len(self._buf) > size: 
       break 
     else: 
      # ran out of lines, return what we have 
      tmp, self._buf = self._buf, '' 
      return tmp 
     tmp, self._buf = self._buf[:size], self._buf[size:] 
     return tmp 


minimal_adapter = MinimalAdapter() 

for doc in ruamel.yaml.load_all(minimal_adapter(fileinput.input())): 
    print(doc)

有了這個，運行你的例子調用正好給你想要的輸出。

對於較大的文件，這可能只有更高的內存效率。load_all嘗試一次讀取1024個字節塊（通過在MinimalAdapter.read()中放置打印語句很容易找到），並且fileinput也會執行一些緩衝（如果您有興趣瞭解其行爲方式，請使用strace）。

_{這是使用ruamel.yaml一個YAML 1.2分析器，其中我的作者來完成。這應該適用於PyYAML，其中ruamel.yaml也是派生的超集。}

來源

2016-09-07 05:06:08 Anthon

與fileinput.input的問題是，得到的對象不具有read方法，這是什麼yaml.load_all期待對於。如果你願意放棄fileinput，你可以只寫自己的類，將你想要做什麼：

import sys                  
import yaml                  

class BunchOFiles (object):              
    def __init__(self, *files):             
     self.files = files              
     self.fditer = self._fditer()            
     self.fd = self.fditer.next()            

    def _fditer(self):               
     for fn in self.files:             
      with sys.stdin if fn == '-' else open(fn, 'r') as fd:    
       yield fd               

    def read(self, size=-1):              
     while True:                
      data = self.fd.read(size)           

      if data:                
       break               
      else:                
       try:                
        self.fd = self.fditer.next()         
       except StopIteration:           
        self.fd = None            
        break              

     return data                

bunch = BunchOFiles(*sys.argv[1:])            
for doc in yaml.load_all(bunch):             
    print doc

的BunchOFiles類讓你用read方法的對象，會很樂意遍歷列表文件，直到一切都用盡。鑑於上面的代碼和你的示例輸入，我們得到你正在尋找的輸出。

來源

2016-09-07 02:41:55 larsks

這是一個很好的答案，我希望我可以標記多個答案接受（「✅Acceptable」？）;然而，另一個解決方案重新使用'fileinput'並不需要重新實現或替換它，我認爲這更接近這個問題的最小意圖。我可以看到這個答案如何滿足不同類型的最小盡管，所以謝謝你的貢獻！ –

如何將yaml.load_all與fileinput.input一起使用？

回答

相關問題