arghhh,這並不容易。我試圖用perl解析一些郵件。我們舉個例子:Perl MIME ::解析器和嵌套bodys中的編碼(message/rfc_822)
From: [email protected]
Content-Type: multipart/mixed;
boundary="----_=_NextPart_001_01CBE273.65A0E7AA"
To: [email protected]
This is a multi-part message in MIME format.
------_=_NextPart_001_01CBE273.65A0E7AA
Content-Type: multipart/alternative;
boundary="----_=_NextPart_002_01CBE273.65A0E7AA"
------_=_NextPart_002_01CBE273.65A0E7AA
Content-Type: text/plain;
charset="UTF-8"
Content-Transfer-Encoding: base64
[base64-content]
------_=_NextPart_002_01CBE273.65A0E7AA
Content-Type: text/html;
charset="UTF-8"
Content-Transfer-Encoding: base64
[base64-content]
------_=_NextPart_002_01CBE273.65A0E7AA--
------_=_NextPart_001_01CBE273.65A0E7AA
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----_=_NextPart_003_01CBE272.13692C80"
From: [email protected]
To: [email protected]
This is a multi-part message in MIME format.
------_=_NextPart_003_01CBE272.13692C80
Content-Type: multipart/alternative;
boundary="----_=_NextPart_004_01CBE272.13692C80"
------_=_NextPart_004_01CBE272.13692C80
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
=20
Viele Gr=FC=DFe
------_=_NextPart_004_01CBE272.13692C80
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<html>...</html>
------_=_NextPart_004_01CBE272.13692C80--
------_=_NextPart_003_01CBE272.13692C80
Content-Type: application/x-zip-compressed;
name="abc.zip"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="abc.zip"
[base64-content]
------_=_NextPart_003_01CBE272.13692C80--
------_=_NextPart_001_01CBE273.65A0E7AA--
這封郵件是從Outlook發出的,附帶另一封郵件。正如你所看到的,這是一個非常複雜的郵件,它具有許多不同的內容類型(text/plain,text/html,message/rfc_822,application/xyz)... 而rfc_822部分是問題所在。我在Perl 5.8(Debian Squeeze)中編寫了一個腳本,用MIME :: Parser解析這個消息。
use MIME::Parser;
my $parser = MIME::Parser->new;
$parser->output_to_core(1);
my $top_entity = $parser->parse(\*STDIN);
my $plain_body = "";
my $html_body = "";
my $content_type;
foreach my $part ($top_entity->parts_DFS) {
$content_type = $part->effective_type;
$body = $part->bodyhandle;
if ($body) {
if ($content_type eq 'text/plain') {
$plain_body = $plain_body . "\n" if ($plain_body ne '');
$plain_body = $plain_body . $body->as_string;
} elsif ($content_type eq 'text/html') {
$html_body = $html_body . "\n" if ($html_body ne '');
$html_body = $html_body . $body->as_string;
}
}
}
# parsing of attachment comes later
print $plain_body;
第一個消息部分(base64內容)包含德語元音變音,它們在標準輸出處正確顯示。嵌套的rfc_822消息由MIME :: Parser自動分析,並與頂級主體彙集爲一個實體。您可以看到,嵌套的rfc_822也包含引用打印的德語元音變音。但是這些在STDOUT沒有正確顯示。在打印之前,引用可打印的元音變音正確顯示,但不是base64編碼的元素。我正在嘗試幾個小時來分離提取rfc_822並進行一些編碼,但沒有任何幫助。還有誰可以幫忙?
Regards
好的,謝謝。我明白有什麼問題。我現在正在使用一個PHP腳本,我很喜歡這個腳本。 – rabudde 2011-05-16 04:41:34