2011-11-21 60 views
0

問題:試圖從一個MIME電子郵件正文拉明文

我的郵件系統上工作。我們收到電子郵件並將它們存儲在MySQL數據庫中。 身體被解析,頭剝離出來等,所有好的純文本電子郵件,但是當我們收到的MIME格式的電子郵件,身體數據存儲到數據庫,看起來像這樣:

This is a multi-part message in MIME format. 

------=_NextPart_000_1B20_01CCA865.03078710 
Content-Type: text/plain; 
charset=\"us-ascii\" 
Content-Transfer-Encoding: 7bit 

This Message is intended for the indicated recipients only and may be 
confidential. If this message has been sent to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please inform us 
immediately and delete this message. 




------=_NextPart_000_1B20_01CCA865.03078710 
Content-Type: text/html; 
charset=\"us-ascii\" 
Content-Transfer-Encoding: quoted-printable 

<html xmlns:v=3D\"urn:schemas-microsoft-com:vml\" = 
xmlns:o=3D\"urn:schemas-microsoft-com:office:office\" = 
xmlns:w=3D\"urn:schemas-microsoft-com:office:word\" = 
xmlns:m=3D\"http://schemas.microsoft.com/office/2004/12/omml\" = 
xmlns=3D\"http://www.w3.org/TR/REC-html40\"><head><META = 
HTTP-EQUIV=3D\"Content-Type\" CONTENT=3D\"text/html; = 
charset=3Dus-ascii\"><meta name=3DGenerator content=3D\"Microsoft Word 12 = 
(filtered medium)\"><style><!-- 
/* Font Definitions */ 
@font-face 
{font-family:\"Cambria Math\"; 
panose-1:2 4 5 3 5 4 6 3 2 4;} 
@font-face 
{font-family:Calibri; 
panose-1:2 15 5 2 2 2 4 3 2 4;} 
@font-face 
{font-family:Tahoma; 
panose-1:2 11 6 4 3 5 4 4 2 4;} 
@font-face 
{font-family:Verdana; 
panose-1:2 11 6 4 3 5 4 4 2 4;} 
/* Style Definitions */ 
p.MsoNormal, li.MsoNormal, div.MsoNormal 
{margin:0cm; 
margin-bottom:.0001pt; 
font-size:11.0pt; 
font-family:\"Calibri\",\"sans-serif\";} 
a:link, span.MsoHyperlink 
{mso-style-priority:99; 
color:blue; 
text-decoration:underline;} 
a:visited, span.MsoHyperlinkFollowed 
{mso-style-priority:99; 
color:purple; 
text-decoration:underline;} 
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate 
{mso-style-priority:99; 
mso-style-link:\"Balloon Text Char\"; 
margin:0cm; 
margin-bottom:.0001pt; 
font-size:8.0pt; 
font-family:\"Tahoma\",\"sans-serif\";} 
span.EmailStyle17 
{mso-style-type:personal-compose; 
font-family:\"Calibri\",\"sans-serif\"; 
color:windowtext;} 
span.BalloonTextChar 
{mso-style-name:\"Balloon Text Char\"; 
mso-style-priority:99; 
mso-style-link:\"Balloon Text\"; 
font-family:\"Tahoma\",\"sans-serif\";} 
..MsoChpDefault 
{mso-style-type:export-only;} 
@page WordSection1 
{size:612.0pt 792.0pt; 
margin:72.0pt 72.0pt 72.0pt 72.0pt;} 
div.WordSection1 
{page:WordSection1;} 
--></style><!--[if gte mso 9]><xml> 
<o:shapedefaults v:ext=3D\"edit\" spidmax=3D\"1026\" /> 
</xml><![endif]--><!--[if gte mso 9]><xml> 
<o:shapelayout v:ext=3D\"edit\"> 
<o:idmap v:ext=3D\"edit\" data=3D\"1\" /> 
</o:shapelayout></xml><![endif]--></head><body lang=3DEN-GB link=3Dblue = 
vlink=3Dpurple><div class=3DWordSection1><p class=3DMsoNormal><span = 
style=3D\'font-size:7.5pt;font-family:\"Verdana\",\"sans-serif\";color:#1F497D= 
\'>This Message is intended for the indicated recipients only and may be = 
confidential. If this message has been sent to you in error you must = 
take no action based on it, nor must you copy or show it to anyone; = 
please inform us immediately and delete this message. </span><span = 
style=3D\'color:#1F497D\'><o:p></o:p></span></p><p = 
class=3DMsoNormal><o:p>&nbsp;</o:p></p></div></body></html> 
------=_NextPart_000_1B20_01CCA865.03078710-- 

. 

我們希望去除除文本唯一版本以外的所有內容。任何Reg-Ex專家在那裏解決這個問題?我們已經嘗試了幾個類和其他PHP系統,但它們總是返回最初輸入的相同代碼,而不是僅僅是我們之後輸入的文本。有任何想法嗎? RegEx優先。我們沿着檢測文本/純等一系列換行符檢測到純文本內容的行思....

+3

你有問題,你決定用正則表達式來解決它。現在你有兩個問題。 –

回答

1

正則表達式的優先停留

沒有,they are not

使用適當的MIME解析器(谷歌拋出this one,我不能評論它的質量)。

+0

今天已經嘗試了確切的代碼。沒有快樂,它只是抽出原來的字符串。 – Nick

+1

答案仍然基本正確;你想要一個MIME解析器,而不是一個正則表達式。 – tripleee