我試圖從HTML頁面中獲取所有獨特的電子郵件到數組中。該文件是巨大的,並沒有真正的模式來獲取電子郵件。PHP從一個巨大的html文件中提取獨特的電子郵件,將其放入數組中
下面是一個名爲GetEmails.html的示例html ---實際的文件將包含css和更多的代碼來篩選。在這個例子中,注意電子郵件的獨特模式。總之不是所有用空格分開,但有的用逗號和半冒號等。
<html>
<body>
<p>This is some text and here is an email [email protected] and in this text we will see lots of emails like [email protected]; [email protected], [email protected] or even dot orgs too like [email protected] and all types such as [email protected],[email protected] and even [email protected] some might be bold [email protected] and some will look like this Email:<strong>[email protected]</strong>
</p>
<p><u>There will be pages and pages and pages of text to sift thru so get the emails into an array.</u></p>
<p>This is some text and here is an email [email protected] and in this text we will see lots of emails like [email protected]; [email protected], [email protected] or even dot orgs too like [email protected] and all types such as [email protected],[email protected] and even [email protected] some might be bold [email protected] and some will look like this Email:<strong>[email protected]</strong> and repeat This is some text and here is an email [email protected] and in this text we will see lots of emails like [email protected]; [email protected], [email protected] or even dot orgs too like [email protected] and all types such as [email protected],[email protected] and even [email protected] some might be bold [email protected] and some will look like this Email:<strong>[email protected]</strong></p>
<p> </p>
</body>
</html>
我想使用帶有空格的爆炸,但可能不工作,並且可能會佔用太多的資源。只是想知道在PHP中是否有一個簡單的函數來幫助我將所有的電子郵件轉換爲數組。這是我試過的。
<?
$lines = file('GetEmails.html');
foreach ($lines as $line_num => $line) {
/// Finds if line has email.
if (preg_match('/\b[A-Z0-9._%+-][email protected][A-Z0-9.-]+\.[A-Z]{2,4}\b/si', $line))
{
// Puts that line into an array
$line = explode(" " , strip_tags($line));
// Finds if one of the itmes has an @ sign
$fl_array = preg_grep("/@/", $line);
// Puts that email in an array
$TheEmails[] = trim($fl_array);
// Puts only the unique emails an an array
$UniqueEmails= array_unique($TheEmails);
?>
但是,上面的代碼工作,我將使用的巨大文件恐怕它不必要地使用資源。此外,它不會考慮用逗號分隔的電子郵件,如ed @ ed.com,mike @ mike.com
有關最佳方式的任何想法? 至少這將是非常非常有幫助學習如何做到這一點最好的方式,即使我只能得到由空間等分開的電子郵件...
希望這是有道理的。非常感謝!
'preg_match_all'? – Tchoupi 2013-03-22 03:15:35
它不是重複的,因爲我不相信問題可以解決電子郵件旁邊有字符的問題,如逗號或< or a >等。 – 2013-03-22 03:29:32
其實我是錯誤的。該鏈接上的代碼工作。我應該刪除這篇文章還是相信那篇文章? – 2013-03-22 03:33:44