我想輸出HTML格式的文本文件中的電子郵件地址數量使用正則表達式,我可以打開文件並閱讀文件,但我不知道如何使用正則表達式搜索文件的正則表達式模式。如何搜索正則表達式的txt文件?
更新:好我使用了一個測試文本文件,它的工作原理,但不是在HTML格式的實際文本文件,它輸出的電話號碼,但不是電子郵件地址的數量。
int _tmain(int argc, _TCHAR* argv[])
{
ifstream htmlText;
string line;
string eduEmail = "^[a-zA-Z0-9._%+-][email protected][a-zA-Z0-9.+-]+\.edu$";
string nonEduEmail = "^[a-zA-Z0-9._%+-][email protected][a-zA-Z0-9.+-]+\.com$";
string phoneNumbers = "[[:digit:]]{2}-[[:digit:]]{3}-[[:digit:]]{4}";
int eduEmails = 0;
int nonEduEmails = 0;
int num_phoneNumbers = 0;
htmlText.open("ltudirectory.txt");
if (htmlText.good())
{
while (getline(htmlText, line))
{
cout << line << endl;
regex r_edu(eduEmail); //the pattern to search for edu emails
regex r_com(nonEduEmail); //the pattern to search for .com emails
regex r_phoneNumbers(phoneNumbers); //the pattern to search for .com emails
bool eduEmail_match = regex_search(line, r_edu);
bool nonEmail_match = regex_search(line, r_com);
bool phoneNumber_match = regex_search(line, r_phoneNumbers);
if (eduEmail_match)
{
++eduEmails;
}
if (nonEmail_match)
{
++nonEduEmails;
}
if (phoneNumber_match)
{
++num_phoneNumbers;
}
}
}
htmlText.close();
cout << "Emails ending with .edu : " << eduEmails << endl;
cout << "Emails ending with .com : " << nonEduEmails << endl;
cout << "Number of Phone Numbers: " << num_phoneNumbers << endl;
system("pause");
return 0;
}
下面是示出了一個示例的鏈路:http://stackoverflow.com/questions/17681670/extract-email-sub-strings-from-large-document – Gumboy
是有C++例子? – BigDuke6
這應該做的訣竅:http://stackoverflow.com/questions/22406583/count-words-in-a-string/22406894#22406894 –