2016-05-17 49 views
0

我正面臨困難讓我的正則表達式工作。我試圖只從字符串中導出url。下面是字符串中的一些文字。 pastebin.com/wA9N1Gbi。正則表達式表達式,我試着去使用是正則表達式不是vaild

(?< protocol>https?:\/\/)(?:(?< urlroot>[^\/?#\n\s]+))?(?< urlResource>[^?#\n\s]+)?(?< queryString>\?(?:[^#\n\s]*))?(?:#(?< fragment>[^\n\s]))? 

這裏'山連結regex101.com/r/bH1eS9/3

遺憾的是沒有工作的,當編譯我獲得以下錯誤「在0x7638DAE8在Historik.exe未處理的異常:微軟C++異常:內存位置0x0018ED9C處的std :: regex_error。「。你有沒有人有另一個想法,我怎麼能做到這一點?是否還有另一個正則表達式函數可能對這個任務更好?

此時此刻的編碼。提前致謝。

string str; 
std::ifstream in("c:/Users/Petrus/Documents/History", std::ios::binary); 
std::stringstream buffer; 

buffer << in.rdbuf(); 

std::string contents(buffer.str()) 

unsigned counter = 0; 
std::regex word_regex(
    R"((?<protocol>https?:\/\/)(?:(?<urlroot>[^\/?#\n\s]+))?(?<urlResource>[^?#\n\s]+)?(?<queryString>\?(?:[^#\n\s]*))?(?:#(?<fragment>[^\n\s]))?)", 
    std::regex::extended 
    ); 
auto words_begin = std::sregex_iterator(contents.begin(), contents.end(), word_regex); 
auto words_end = std::sregex_iterator(); 

for (std::sregex_iterator i = words_begin; i != words_end; ++i) { 
    std::smatch match = *i; 
    std::string match_str = match.str(); 
    for (const auto& res : match) { 
     counter++; 
     std::cout << counter++ << ": " << res << std::endl; 
    } 
+0

請它是如何_Not正是工作unfortunately._闡述!這是一個非常模糊的問題。提供[MCVE]。 –

回答

0

你需要這麼複雜的正則表達式嗎?你能否逃避一些不嚴謹的事情?

std::string load_file(const std::string& filename) 
{ 
    std::ostringstream oss; 
    if(auto ifs = std::ifstream(filename, std::ios::binary)) 
     oss << ifs.rdbuf(); 
    else 
     throw std::runtime_error("Failed to open file: " + filename); 
    return oss.str(); 
} 

int main(int, const char* const*) 
{ 
    std::string s = load_file("test.txt"); 

    // crude... but effective? 
    std::regex e(R"(https?:\/\/[^/]+[[:print:][:punct:]]*)"); 

    auto itr = std::sregex_iterator(s.begin(), s.end(), e); 
    auto end = std::sregex_iterator(); 

    unsigned counter = 0; 
    for(; itr != end; ++itr) 
     std::cout << ++counter << ": " << itr->str(0) << '\n'; 

} 

輸出:

1: http://boplats.vaxjo.se/ 
2: http://192.168.0.7/ 
3: http://old.honeynet.org/ 
4: http://old.honeynet.org/scans/scan15/som/som11.txt 
5: http://en.hackdig.com/ 
6: http://parallelrecovery.com/pdf-password.html 
7: http://digitalcorpora.org/corp 
8: http://tv4play.se/program/nyhetsmorgon 
9: http://bredbandskollen.se/ 
10: http://194.47.149.19/dv1482/Lab5/ 
... 
+0

這就是完美!我正在看着它! –

+0

你釘它兄弟!我欠你!非常喜歡!沒有同性戀;) –