2011-02-05 84 views
2

我有一個儲存在含有ASCII文本標題,以空字符結尾的文件有些成像數據,其次是二進制數據。 ascii標題的長度各不相同,我想知道什麼是打開文件,讀取標題並找到空字符,然後加載二進制數據(以Python)的最佳方式。如何在ASCII頭後的二進制數據讀取在Python

感謝您的幫助,
詹姆斯

+1

你看過http://docs.python.org/library/struct.html模塊了嗎? – 2011-02-05 00:14:57

+0

除了結構模塊,如果有均勻的大的塊(即,相同的類型,32位浮點,16位的uint等)數據看看陣列模塊:http://docs.python.org /library/array.html或者,如果碰巧你會使用numpy,numpy.fromfile對於這類事情非常有用。 – 2011-02-05 00:32:02

回答

1

請問像這樣的工作:

with open('some_file','rb') as f: 
    binary_data = f.read().split('\0',1)[1] 
1

或許應該開始像這樣的東西。

with open('some file','rb') as input: 
    aByte= input.read(1) 
    while aByte and ord(aByte) != 0: aByte= input.read(1) 
    # At this point, what's left is the binary data. 

Python版本號碼對於這類事情很重要。這個問題是read函數的結果。一些版本可以返回字節(數字)。其他版本將返回字符串(這需要ord(aByte))。

0

其他人已經回答了你的方向問題,但我想我會添加此。

當處理二進制數據,我常常覺得有用的子類file和添加各種說服方法用於讀取/寫入打包二進制數據。

這是矯枉過正簡單的事情,但如果你發現自己分析大量的二進制文件格式,這是值得額外的努力以避免重複自己。

如果不出意外,希望它作爲如何使用struct一個有用的例子。在一個側面說明,這是從舊代碼拉,並且是非常多很多python 2.x. Python 3.x處理這個(特別是字符串與字節)顯着不同。

import struct 
import array 

class BinaryFile(file): 
    """ 
    Automatically packs or unpacks binary data according to a format 
    when reading or writing. 
    """ 
    def __init__(self, *args, **kwargs): 
     """ 
     Initialization is the same as a normal file object 
     %s""" % file.__doc__ 
     super(BinaryFile, self).__init__(self, *args, **kwargs) 

    def read_binary(self,fmt): 
     """ 
     Read and unpack a binary value from the file based 
     on string fmt (see the struct module for details). 
     This will strip any trailing null characters if a string format is 
     specified. 
     """ 
     size = struct.calcsize(fmt) 
     data = self.read(size) 
     # Reading beyond the end of the file just returns '' 
     if len(data) != size: 
      raise EOFError('End of file reached') 
     data = struct.unpack(fmt, data) 

     for item in data: 
      # Strip trailing zeros in strings 
      if isinstance(item, str): 
       item = item.strip('\x00') 

     # Unpack the tuple if it only has one value 
     if len(data) == 1: 
      data = data[0] 

     return data 

    def write_binary(self, fmt, dat): 
     """Pack and write data to the file according to string fmt.""" 
     # Try expanding input arguments (struct.pack won't take a tuple) 
     try: 
      dat = struct.pack(fmt, *dat) 
     except (TypeError, struct.error): 
      # If it's not a sequence (TypeError), or if it's a 
      # string (struct.error), don't expand. 
      dat = struct.pack(fmt, dat) 
     self.write(dat) 

    def read_header(self, header): 
     """ 
     Reads a defined structure "header" consisting of a sequence of (name, 
     format) strings from the file. Returns a dict with keys of the given 
     names and values unpaced according to the given format for each item in 
     "header". 
     """ 
     header_values = {} 
     for key, format in header: 
      header_values[key] = self.read_binary(format) 
     return header_values 

    def read_nullstring(self): 
     """ 
     Reads a null-terminated string from the file. This is not implemented 
     in an efficient manner for long strings! 
     """ 
     output_string = '' 
     char = self.read(1) 
     while char != '\x00': 
      output_string += char 
      char = self.read(1) 
      if len(char) == 0: 
       break 
     return output_string 

    def read_array(self, type, number): 
     """ 
     Read data from the file and return an array.array of the given 
     "type" with "number" elements 
     """ 
     size = struct.calcsize(type) 
     data = self.read(size * number) 
     return array.array(type, data)