2013-01-21 502 views
0

我在解析MIME消息的特定正文部分時遇到問題。在Python中解析MIME正文部分

我有一個電子郵件客戶端的Web界面。我想讓用戶下載電子郵件的附件。過去,每次我想下載一個附件時,我都會使用參數RFC822來調用IMAP服務器來獲取整個消息,以便我可以用Python輕鬆解析。

但是,這不是有效的,我需要一種方法來獲得所需的附件。我正在使用另一種方式,使用特定正文部分的BODY [1],BODY [2]等索引對IMAP服務器進行調用。

當我進行IMAP調用時,獲取正確的正文部分(當我調用BODYSTRUCTURE時,我正在查找的部分中的字節數加起來,所以我肯定會獲得正確的部分)。

但是,我不能將這個身體部分解析成可用的東西,或者爲此保存它。

一個具體的例子:我打電話以獲得電子郵件的[1]所述主體和獲得背面

('4 (UID 26776 BODY[2] {5318}', '/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAoHBwkHBgoJCAkLCwoMDxkQDw4ODx4WFxIZJCAmJSMg\r\nIyIoLTkwKCo2KyIjMkQyNjs9QEBAJjBGS0U+Sjk/QD3/2wBDAQsLCw8NDx0QEB09KSMpPT09PT09\r\nPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT3/wAARCABRAQIDASIA\r\nAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQA\r\nAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3\r\nODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWm\r\np6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEA\r\nAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSEx\r\nBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElK\r\nU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3\r\nuLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD2aiii\r\ngAooooAKKKKACorm4jtLaS4mbbFEpdzjOAOtS1neIf8AkXdR/wCvaT/0E1UVeSRM3yxbRbtbuC9t\r\n1ntZUliboyHIqauA+F+f+JkMnH7vj/vqu/rSvS9lUcE9jLDVvbUlNq1wooorE3CiiigAooooAKKK\r\nKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKzvEP8AyLuo/wDX\r\ntJ/6Cabruu23h/T2urlJZAPuxxLuZv8A63vXlWo+M/EXjC9FnpMUsMRORBb8sR6u3p+QrehSlJ83\r\nRGFerGKcerOp+F/XUv8Atn/7NXfE4GT0FcN4fx4Ss5jqTwS6jPt3w2vRcZxuPQHnnH5VDe6vqOuS\r\n+RGG2t0hi7/X1/HiniZqpVckThKbpUVCW534IYAggg8gilqG0QxWcKMMMsaqR6YFTVznSFFFFABR\r\nRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFITgZPSuen8daJDqLW\r\na3Xmug3SPENyRjIHJ+p7ZqowlLSKuTKcYq8nY6Kio1uIngEyyIYmGQ4PBH1qCLVLSa5Nuky+bjIU\r\n8bh7etSUcn8TGaOz05kYqwlYgg4I4qbRiLXw7ZG3SOFrmLzJmjQKZGyeSRUHxO/48dP/AOurfyqb\r\nTf8AkXNK/wCvf+prtn/usPVnn0/98n6Ikh8KzXt9LPdP5UDOWAXlmH9K6Wy0+20+Ly7WJUHc9z9T\r\n3pXdotOZ0OGWLcPqBXF6V4m1S4k0xpLvzPtTESRvaeWijBPyv0Y8dBXPClKabXQ6qlaNNpPqd5RX\r\nKWuvX8ug6HdPIhlvLsRTHYMFSW6Dt0FM0HxBqF/faZHcSIyXEM7yAIBkq5A+nFU8PNJvt/wf8iVi\r\nYNpd7fjb/M66iuZvrzVZ/Ed3ZWV7HbRQWqzDdAH3E5461lr4w1E2clyRESumLcBNvHmGTZn1x7UR\r\nw8pK6/q4SxMIuzT/AOGO6orjL7XdW0QzR3FzDds1ibmNvJ2bGBAxgHkc1saXJeJewx3+rxTySwea\r\nLcW4Q445yD0HSlKi4q9xxxCk+VJ/h/mbdFcodS1G6uNVnOqw2FnY3Bh+a3D8DHJJPqa09B1G4v7j\r\nVEuHVlt7too8Lj5QBSlSaVxxrxk7W/r+kbFFcW+v6t9in1ZbiEW0V79n+y+T1XcFzvznPNVF8T6q\r\nS8gvBuF55Iia0xGV345l6A4q1hpvqZvFwW6Z39FcRdeJr+P7VeJfWqrBeGBbEoNzoGC7s5znv0rf\r\n0HUbi/l1MXDBhb3jwx4XGFGMD3qZUJRjzMuGIhOXKv6/qxsUVzWva9d6XqsqQ7Gij057gIy9XDYH\r\nPpVOTXNV0lomu7iG8W4sZLlR5Pl+WyqGA4PI5ojQk0muopYmEW0+h2NFcjBrOq2U+nteXMN1HfWs\r\nk+wQ7PLKpvABB5Haq665rMFnpt5LdQzLqKviEQBfKO0kYOeenen9Xl3X9X/yF9aj2f8AVv8ANHbU\r\nVysXiG6e18PyCVHa7jd7gBR821CT9ORVTT/EmoPJpc0t/azi/Yq9skYDQcEjkHPbvR9Xn/Xz/wAh\r\n/Woaf12/zO1orkdI8R6tLpVpdXsFmYZiVEzT7Gc5PATHXjpntUdnrmriHSL+4uYZINSm8s24h2+X\r\nnOCGzk9O9H1eSuhLFQaTSev9fqdlRXB6Z4m1K4Gnyf2lbXM9xcCOSyWEB0XJy2Qc9Bn8as6T4gvp\r\n9Rgj1K+kt5JJin2Y2JCnk4USU5Yaav5Cji4Stbr6f5/8E7OikornOoWoLyWWC0llt4DcSopKxBgp\r\nc+mTwKnrP1LWrPTFInkzJ2jTlj/h+NAHkPiLxL4m8Rak2lyW9xbEnH2GFCGP+8erfyq7F4FvdA8P\r\n3ep6jIiSsixrbpztBdeWPTPHQV1g8Q3OoavDsVYIySMIPmIwTgt6e3SpPFF5JdeD71ZcEoY/m9cs\r\nK7qWIbnGEVZXRwV6CVOc5O7syj4SYnwlgkkLdsACeg2iofFhGn2cF8gkmldSBCi8gKfvZ9Km8I/8\r\nim3/AF+N/wCgiqXjpikGkMpKsElIIOCPmFDpqpinF92TGq6WDjNdkcoPEGu+IZ0SQrc28fAjkHyJ\r\n77uuffOa9JtvIXSbKG1mEqwQ7GI7H0NadnpFnqHh6xWeFQTAjb0G1gSAScj3qnZ+EDb3xka8byl+\r\n6EGGb2NY1qvN7iVkjpo0lH327tm5cyJFpEskgLIkBZgvUgLziuUtLPTrWSEhbyQWsgFtbTXWfnLB\r\nQQmOB82c+hrspII5rd4ZFDRupRl9QRjFVZNHs5m3Sxs5Awu6Rjs5B+Xn5eg6Y6VjGco6JmsqcZWc\r\nkc+ulWulXsDyWtz5UKyXSRG63xQ7cbiq+vzcVAunaUmlwXJkurcwRyGAQ3XzupO5hkDrk11v2C3I\r\nUOhfajRguxY7WxkEnrnA601dOt1tHtdrtA67SjyM3GMY5PAqvbT7k+wp/wApy0thpxa2nM2qCS5z\r\nbl1ucMwWQJ8x78t+VW9T0TSLEW8EkM4iuo1sSUfiNAdwJz/tAc+9bDaJYOWLQH5ju4dhtJYMSvPy\r\n/MAeMVLNplrcWwgnjMsYVlxIxY4YEHknPQmj20+4ewp/ynM3Fxp+rySzGxuJRFF9kIaUIGRpNoP4\r\nkA59DUxt7Tw7qMM5a6muBauQJ7oEKgK5Vc9TzwK310myQyFbdR5hUtgnnacj8jUzWsL3KzvGrSqp\r\nRWPYEg/zApe0la1x+yhfmtqcZfQ6ZdiS4eO+hh1Bt+xbkIkpDqhLA/d5INTQ2dvNfLNajUIzdXD7\r\n/KvNqF1zk8dRgf0rpRotgH3fZwfm3AFiQp3BuBnA+YA8VLHp9tE6tHCqlXaRcZ4Zup/Gn7WdrXF7\r\nCne9jlpLLTJLi2eOC8YXpS6S18/bCXbJyR+GahmsdMgeRZReNAswklhW8BQykb+F7jpz/hXULodg\r\ni4WAjGNpEjZTGcBTn5RyeBjrSLoGmLHsFnGRkHJyWzjHXr0o9tU7h7Cn/KZd1oWkDSWmltmIuZ0m\r\naTI8xWd1/i7AE9PTNU47SyfXrmO1lv4pPPMswW72ITk5IUDn7p49q6ePTreO0e2Cs0DrtKSOzjGM\r\nY5Jpkek2cRjMUbJsQINkjDKgk4bB+bknrnqaPaz7h7Gn2OfvZbHVLuKS7tLsSXlqIYxE4OYny2cd\r\nj8v61JfRaZdxFpYrkrZxC0G1wCVkAB/EVuPo9k6xgw48pFjjKsVKKvTBByOtMGh6eCuLcAKANoZs\r\nHGcEjOCRk8nml7SS2ZTpwd7o55b/AE+efS1FuzSQQbYB9oUpsZMEOR/FgdPerMOj2Wm6rAsUV1N9\r\nkj84JLcEx26kkfKO54NbJ0Wx3xusGx40CI0bshVRkAZBB7mppNPt5Z0mdW8xF2bg7DK9cHB5H1zR\r\n7Se1xeyhvY5nT4dPtbxmtrCYXF9GDCjSgqscgZjt/ufdOR9Kgt00m3h0xl05oHiKSQO0qKXDBgPM\r\nb8Dx7iupg0iyt3R44cNGRsJZm24BAAyeAAx46c0DSbIGEi3T9yAsecnaBnH8zR7WfcPYw7HJ2Mel\r\nB5LiGC6eCxDzfZ5LnKIwBJ2J0I54Oe9T/ZNK0u8siIdSfEp8i3eQlYWJAyEJ/wBr+ddO+mWryyO0\r\nZzKNsih2CuMbeVzg8cdKi/sLTyOYCzZyHZ2LA8YO4nPG0Y9MU/az7iVGmvsmEtppkQstNtrSdpYJ\r\nVmhLOFbdlyQzdcDaePpTLRLK4vrW7dtSnjSdNpnudyxyuMj5fbIGa6EaJYgDbCVYHO9XYPnLHO7O\r\nc/M3PfNPj0qziQJHAqqHWQAZ4ZQAp/AAUvaz7j9jDsW6KKKg0AjIIPQ1zV34NjluxJBcskbHLq/z\r\nEfQ/4101FAGbDpFpptjMttF85QgueWbj1rlNYu7ebSriwDl3mKZZOQu056967zqOawtX8LW17DM9\r\noqwXLL8pyQmfcD+lVB2kmiZpSi01c5HToZLO0aZLlLOyhY75pXwgOORj+I9OKx/EXijTddntrW2e\r\nRFtlZEmkTCyliM8clenGf0pn/CBeJ9W1U2t6iQQRHPmlv3IB7oB1P6+pr0Pw34G0rw4Fkij+0Xg6\r\n3EoBYf7o6L+HPvXYpQoy5+bmkccoSrx9ny2ibGkRtFo9lHIpV0gRWB7EKKuUUVxN3dztSsrBRRRS\r\nGFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQA\r\nneloooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKA\r\nCiiigD//2Q==\r\n'). 

這種特定響應對應於JPEG圖像附件。我嘗試提取表示身體部分的字符串(所以,我說的是以'/ 9j'開頭並以'2Q == \ r \ n'結尾的字符串)並將其作爲一個文件保存到文件中.jpg,但它不是一個有效的文件。

然後我通過那個,因爲該字符串中有\ r \ n的多個實例,該字符串可能會用換行符/回車符分割,所以我拆分了字符串並將其除去\ r \ n,然後加入子字符串並試圖將其保存到文件中。仍然不是有效的JPEG文件。

我能做些什麼來嘗試解析這個迴應?

謝謝。

+0

該部件的'BODYSTRUCTURE'命令返回什麼? –

+0

我將省略完整的BODYSTRUCTURE響應,但解析後與該部分有關的部分是:['IMAGE','JPEG',['NAME','image001.jpg'],'',None,'BASE64',5318,None,None,None] –

+2

所以它是base64編碼的。 –

回答

3

您需要解析BODYRESPONSE字符串以查看數據編碼的格式,請參閱IMAP RFC 3501, section 7.4.2。所述第五字段是內容編碼:

['IMAGE', 'JPEG', ['NAME', 'image001.jpg'], '<[email protected]>', None, 'BASE64', 5318, None, None, None] 

的字段,在順序,類型和子類型(在這種情況下這樣image/jpeg),身體參數(如字符集,格式流動,或在此文件名情況),附件標識,描述,編碼,大小,MD5簽名(如果有),處置和語言。

在這種情況下,數據是base-64編碼:

>>> imagedata = datastring.decode('base64') 
>>> imagedata[:10] 
'\xff\xd8\xff\xe0\x00\x10JFIF' 

它看起來像JPEG數據給我。