2016-11-28 95 views
0

我想輸出HTML格式的文本文件中的電子郵件地址數量使用正則表達式,我可以打開文件並閱讀文件,但我不知道如何使用正則表達式搜索文件的正則表達式模式。如何搜索正則表達式的txt文件?

更新:好我使用了一個測試文本文件,它的工作原理,但不是在HTML格式的實際文本文件,它輸出的電話號碼,但不是電子郵件地址的數量。

int _tmain(int argc, _TCHAR* argv[]) 
{ 
ifstream htmlText; 
string line; 


string eduEmail = "^[a-zA-Z0-9._%+-][email protected][a-zA-Z0-9.+-]+\.edu$"; 
string nonEduEmail = "^[a-zA-Z0-9._%+-][email protected][a-zA-Z0-9.+-]+\.com$"; 
string phoneNumbers = "[[:digit:]]{2}-[[:digit:]]{3}-[[:digit:]]{4}"; 

int eduEmails = 0; 
int nonEduEmails = 0; 
int num_phoneNumbers = 0; 

htmlText.open("ltudirectory.txt"); 


if (htmlText.good()) 
{ 
    while (getline(htmlText, line)) 
    { 
     cout << line << endl; 
     regex r_edu(eduEmail); //the pattern to search for edu emails 
     regex r_com(nonEduEmail); //the pattern to search for .com emails 
     regex r_phoneNumbers(phoneNumbers); //the pattern to search for .com emails 


     bool eduEmail_match = regex_search(line, r_edu); 
     bool nonEmail_match = regex_search(line, r_com); 
     bool phoneNumber_match = regex_search(line, r_phoneNumbers); 


     if (eduEmail_match) 
     { 
      ++eduEmails; 
     } 
     if (nonEmail_match) 
     { 
      ++nonEduEmails; 
     } 
     if (phoneNumber_match) 
     { 
      ++num_phoneNumbers; 
     } 
    } 
} 


htmlText.close(); 
cout << "Emails ending with .edu : " << eduEmails << endl; 
cout << "Emails ending with .com : " << nonEduEmails << endl; 
cout << "Number of Phone Numbers: " << num_phoneNumbers << endl; 


system("pause"); 
return 0; 
} 
+0

下面是示出了一個示例的鏈路:http://stackoverflow.com/questions/17681670/extract-email-sub-strings-from-large-document – Gumboy

+0

是有C++例子? – BigDuke6

+0

這應該做的訣竅:http://stackoverflow.com/questions/22406583/count-words-in-a-string/22406894#22406894 –

回答

0
int _tmain(int argc, _TCHAR* argv[]) 
{ 
ifstream htmlText; 
string line; 
string eduEmail = "(\\w+)(\\.|_)?(\\w*)@(\\w+)(\\.(\\w+))+"; 


int testNum = 0; 

list<string> l; 


htmlText.open("ltudirectory.txt"); 
if (htmlText.good()) 
{ 
    while (getline(htmlText, line)) 
    { 
     regex e(eduEmail); // the pattern 
     bool match = regex_search(line, e); 
     if (match) { 
      ++testNum; 
     } 
    } 
} 

htmlText.close(); 

system("pause"); 
return 0; 
} 
+0

解釋你的downvote。 – Gumboy

+0

有更快的方式搜索,因爲文本文件有點長 – BigDuke6

+0

您可以將文件讀入內存,然後執行搜索:http://stackoverflow.com/questions/17925051/fast-textfile-reading-in-c – Gumboy