2017-06-15 208 views
1

我正在研究一個python腳本來讀取mzXML文件。我可以解析我想要的數據,但它是zlib壓縮的。我一直在使用zlib庫嘗試,但它給了我下面的錯誤:Python:解壓縮zlib字符串

zlib.error: Error -3 while decompressing data: incorrect header check

我理解的錯誤是由於缺少頭部檢查,但我不知道如何給它。下面是數據我想從蛋白質組學掃描解碼

compressedLen="3830" precision="64" byteOrder="network" contentType="m/z-int"eJwtWXdclNcShVhjRQUVC116c+lFWFl6c+lLiVQReIpYAFFQBCxYELazgCIqFtAYlWjsGpJYYoxRX2yJGsEeXzDYYnvxHPlnfjNz5szMvfvdhnDRxQda//4J/coqyz/KxSeWn/goR9rfFH6UpTPGQxoFHId9SYED8FMO9739UZab/ATdrX8JcOVWC4CbHHQnDbrzAaOP0nhRKvUZoyHdvmqDvVyajDjrJBfwlB95DLtVtwXqWda9HXlML7oiriL0FOyCxUmQFcnZwE9YshF5K7JUsJuefEZ/7VTw62/8hfrDodBtK1Y1f5SVV81gH7t8I+yVPWJIvd2XwVc1KBf8Y/QnAl+VYow6rao7gFsZMQG6Q8Is1Lmq30Lo4wc2wV+d0wzdpEAH+prKM+AzKvsbcl0y40w6JqKOmgXlyOO4/zvga5Z+x/7FcZA1e7YCZ15oSPwzAeoUDHMH33pzGew2na8QX5uegPwBNg+oV8bAP2pxKvC1x3+D39n9Cca39ko/+D27i6k/eQJ+Q58BjH/1FnYjKz3E1U04D//orgzUXWe1EtJqsSd46oQjIa0vL6ffj78z203nwFcX0AkelzFjqEdVQEZHD6aeLkFcsFhEveAI4r2Ef5O/SAMZen4T9ZKTwFmmNKOuunVOnK/5F1hv3ShI8bMU1qMKhR6/tJnxG6wQHzy1iPmau9CvaAx//3WtSuAsFeXEt4cAJzw0APNTt/tr4Hz8ihl/8A74gwML6D9agLjwJXmMP/kbcL6604g/rUZ8cpaM9f04H3qiaxHmq+6SCnbXD87kuzoL/E57bNjfjZvA295TsN5bc4ATv3iPPur+bAHOy0+X/D068HvE1xPfmwgZPTgPfqmWHucjV4J6pf1F4PH3uw8pHSKBnKk1iH6dM5DGZ2ORRzqqCPwhWdeIHzMafMk2FyCl+krgPF/o0284Dflj5m+jbsrfZ1TyefKZf4+4EEEh67Ohf+qQ69QdAiEjfrFlvCAfeB9bAeqQOv8Oe6hyFuv1/BL4jDQ/6kIhZFDvLIy31J+/N0FrIvWw4eATVOmQP2Iz5y+/l/nF26E7/q+E9cYVQo+N3Mp+Ex7C7mwfSH/KO/Qr8f4eUpr6GLhpDQLWkz0CvFl3OJ/S3FLETd8yinyzTsDuf13GegrCYQ/tKCa+iOPi0m7O+sp9YU8oUTC+0hR6RFM58SvfIK94yWD2W9MImfjiv4yX6gE3Yw7XUWm9LfTE1b2Mb+pEHWl/DaK/JQR8mWcr6d+6g/P3jyPr3W4Nu/huKfnbazleulnE75mB+Pi4HPr3+UK6vshh/IF9wIcn1NJ/zBvznDjFjPV3fgP/Fx0biT9dBd4MMddn6QWOf7IV1z/pjU7E+X2tYf5b3E8kj+cQ38W+om64Mt+DTOBi5xVR/3MT4mPuTuH89fB7zAjUZj0vvoE9ooXrrPT1VdgT/PXofz8I+Mhj3M9k2jLIkHlv0Zes32Pk8Y+PQr2yofngSWzn+icbwbgU0WrgZKN/QL3xig7Gj3vK+diZCbvMYCHyprazXpnZWcRLOrk+ySzjIL304rSg2+4Fb+IKSpmDHFIc10Z+wQrwCky5f8lcWyGTb31O3YvrsyB7APl9T4I3yCyKumg/+oqN+xQf6MH44KUcj5A+wCWkjaT+7w/3o3Rw+IfxUUM4f4Y9HJ+4bOD8l3LdliVuAl/epqGMT+M8ex/65M/8FnGC2l30z7wAOcZyE/ubxXOO02Br1pffBRmTeIr4edOBC5qdTn9hMnjDWpzIv9AXMua0NvlKub8lfdBiveW+qD9rugf5lldgfKLqa2GXrbZHvGgOf1eydWcgxSscyFfL/T/8VSjzy3g+8/9wjn61FPlmDHej3vgCuMViW+qb+B190TaT9W5N5vilF4BHtvPTemTxnv5d3Odiu2fCLvvqAHAZm3Xo38/9MUF8nPwHPoc9oyuN9R3i/pfdVkb+Y9yfUhMu0X/SHjK/jOcK2Q8DwTM35wb958Tgy594kOPxUwrsOdrcR2UXsyHzIn2Z//JXkBEdZ1nf1W7gc/ee4Pj+zvlcNor7iuxuDPSChAvE3/cB31xdnjtlj3geiwqOpf+pAfyFxycxvmcZ8mV67SL+Oc+txZmL6H99Fv6SvS8Z/47zukTCc7lc63vYPafvgV3el99riVskpHwg53duEs9R8qHGsC85w/OrXIe/h8zsFki5rjns854vR175mBLoC/LX0T/uOvpdWDCefIac96pa/r7lps5pGJ/r1cRbpGNeFhRzPZPbcH4Wz/dkvP0LyMVBKsYLrBGf9Y7zInfl73f5/XrY5Z7ZwBcufEG/z0zoM3WeIo98agfnW8LvSx7I9arQnd+7PDQd/ozuLPYXyfUuv2go64uxA19cp5D8kqPQp93+nfr0m+BJ2rmf+PS7kHbts+mfaYLxdby4mv68ctQhHLia9czrgX1snxvs//JExHnM/4713S+DbrsxjPP3aBn6cjtkz3p7uK9ayfidyf9uAt47XMH4f3hODN/B71OhxXOHwzoN4hV9imF3WV4AvELvFewB6/m9KsbzXOx+egH9xivBZ3iJ50OFKe9NTnfOAaewsoDdu/cB8ii8p6Gv2M4K8vncBs46ZjzxftHgc33YQj14HaSvC+97irBMxLukX2C+T+d9z2beJxTxHyCzjH6FXZHO353n+R7qWTfQz6SmVPLntnH83zSyngUbwDtZWkN/aRukv/Zd8lcvQh/OfXnuVKx5CSn86yXxNcc4v+rhrFd6GLwB3q30N54F3uJznr8VzS+Bs97Nc52ibSnyi3feYz1HlnC81f8j/kQ/4CZ+a8x+Tn0Fv5/Un+P7/SX4Q6b/Rb4fryHOxm0k43/hOS/qZ55/FTczIcO2riS+awtkxNUy5n/QirhAMxOO95OrwPs/HEP9L57Lorp471U8fwZ9knwj8731Aj4hrwu8Si1v4MIKtoBfOYDfQeTmceBTDrmP8fCztKR/xGDEhR/byfjRRuB3zX8JfuU4nheEU7gPK615LxMee0O//VTOXzb3a+XkXuD9mnguVPo8BW7a8BXURdmQIelFxAcyXlheiLqUoavg91yfTD3CD3UlbR/BfFHGsIeuEJEvjvu+6K4N9UyuRzH3etlPTjFk0G6uJ8rZQ4B3j/Enfn4p+AQ5n/ot+5b9Kbj/KyvNEZfQfRl9KVfMhgxy4flXuXYvcF79LlJf/woytus78Cg1PEdJxt+j3rYE+cSv3xO/ez38bkXryH9YAJz7CgPWd5Tfe8jmDI7HSd4vA1dJOZ8/8nuKnNGf+J957hbncj1XXue5KUCdRPwfvD8bJemxnvuxqD+ybDD1R8MwXo5mU1jPnzOR17KD7yrKns8g/d/y/qPsvQ2cccan8Xsdj7zenx1hvW/5jmN3So78qs94r7LYVwK7amgZ+2mqg12lw/t+iHYGeFR6Ssjgc79R19cFT/D8n5FfNVGC/HGWkeQzPAmcjfoK7CqzvYx/eYT8VlwHRAe5TqjsOF7WE+ayPqcNsDtlBZPfnetscDZ/Ryqvs8AFtuaRX6RC38ZxNehXFbwWcbHpfaiHtwNvZ9ZFvujbsIe9MkCcKpHzPv1dK+tPuY580x7dYr+p3F9SGrmeq2Y9Bz4qyot6USJ4RDdrWF+pKWS8vTb9Sw9wfpv70l/Hc5nZwuPUFanw21XfIl7FdTGojfuIqnEQ6+2zg3pzPqT4fSfrb+X92svGlXo774+WlbyXqb55jfoFzkc4Xhdegc9RJuB4/7offPp6R9jvzZuIS07je5vqThV4TXdK6H/SAp5xC69wvHrvwW91Jpv53zSAzyHfE7p6gB1ksBf3A7U+78lii1bwqSfwPOfVZYW61EYO8E8o4b1AbVGNOh3nVcOutvaF7nztH+RR25kj3if6FupSC9bC71E9lvxu5LVM5bus2ssZumc1f3fqgNuQgRG8t6rDcpEnpL8j46P2IG5KVgXxEhfU5fNQzX5m893R5V4a6yuMB09kLs+v6jKeb6esVVNfOxEySKub/a3/gHomTvqD+TRzweMq4blK3XQKftMPD1lfazTq8LuUTX1HCXCu1ziP6j0diJ9q1s76Dp9GvKEB3zHUJ1ciPkDYzvgfd0CPKOV7ifqP8+ALUw0l332es3zKeO9UP1oMXvvECvb7ZyLwVrt4/le/WcD5rtJGfL32euj610aAv37IVujO+yIQXz/6AnRLhT79BnbIMzm6HXz1JkXQp36xiH7LJMSZTOZ9sN7OAf0FxLYzn6AYunjYafDWC0dBmkdwna73Wwndtn0f9TC+v7lU2TI+rhv8Nv5Z1BN/Ap/Nh+HUv7jC+ozLWE/WMYyrQdAU5svpD2nXxne++vxxiA9c0Jf6vDDEeyxSM77EDv6Q7Z+znuXZ4LNTXCO+djr0IB3eu+o38T3c+zXPD/U7+d7k9Nlp1reX770BY/n91x/UgvQ4e4X1HU1Ff7b7uR/Wd56DtAqJpX7rMsereBX57t4Av9s+Fet7NAQ40ToF+Z+8A07g95Dxb/MhAzzeAK8Z2AK+8C/z0KdG5z+IEy30pq73jvMRy3crjf4a+B1O8/1RY8D1R+8G3780NgaIc8jfhro0nj8jj0m3IfP5813Udh/fhTVBW4B3D+C7iCZCAt3CZAP9abvB7xLC71qTk4N+PJ5uw3hrZqfA7vxhNHCaCr7PhAw2p75CHzxmq7hva9SPgfcp5T6oaSwETqTh+66mpRC4yTX8P5DmHr8zx93jGP88AbjAl9vQb4P2eeQLfsL3m4Z+4+F3G3sY+AbdB+jHtYLvUA0GnEf/hB7YG0zmIr/X0QD6HfIRb6TH98iG8JXwO4Y/h70hm+dd1wFrmH8O3/WcJUOIL+L/M6YOfEy98inqcHjPd9+Gmh3g8/7yAP34mf+rx/G9ouEH/r/BbT/PXw2Xf0Eek9zJ1J88RB2CaI5jQ89hzq+I72SNAzYj3mK9C/I06v4K/6S5fA9rtOG7m+M5vm82CoaB19GV77qN7nxPdfVzox5rhzzGfXQZv5nf0

它的峯值數據。任何幫助理解這些信息將不勝感激。

+0

實際的zlib壓縮數據是二進制的,不是ASCII文本這樣。也許這是zlib壓縮數據的Base64(或類似)編碼。 – jasonharper

+0

展望未來,這看起來確實如此。 –

+0

Base64解碼[結果](https://pastebin.com/0bLsDA8H)。 這給出了相同的錯誤 –

回答

0

解決了這個問題。能夠閱讀的Base64從這個library

這裏使用的解碼碼數據是代碼片段:

def parse_peaks(peaks_decoded): 
#Based on code by Taejoon Kwon (https://code.google.com/archive/p/massspec-toolbox/) 
tmp_size = len(peaks_decoded)/4 
unpack_format1 = ">%dL" % tmp_size 

idx = 0 
mz_list = [] 
intensity_list = [] 
for tmp in struct.unpack(unpack_format1,peaks_decoded): 
    tmp_i = struct.pack("I",tmp) 
    tmp_f = struct.unpack("f",tmp_i)[0] 
    if(idx % 2 == 0): 
     mz_list.append(float(tmp_f)) 
    else: 
     intensity_list.append(float(tmp_f)) 
    idx += 1 
return mz_list,intensity_list