c＃正則表達式捕獲兩個字符串之間的字符串

-8

我要捕獲一個值括號括起來，我將一個'html頁面解析成一個字符串（我不能使用外部庫，所以我必須使用這個html像一個字符串）。我有兩個div的內容可以捕捉，我知道他們擁有的id，我試圖通過使用正則表達式來捕捉內容，但我無法做到這一點。c＃正則表達式捕獲兩個字符串之間的字符串

var div_tags = Regex.Match(json, "<div id=(.*)</div>").Groups[0];

返回我所有的3格，我有一個ID。但我只需要兩個div，至少包含單詞「mobile」。所以..我嘗試了另一個由我的同事建議的正則表達式，但是如果認爲它與.net正則表達式評估器不兼容。

string titolo = Regex.Replace(json, "<div id=[.*]mobile[.*]>(.*)</div>");

Thath的股利的一個例子。我唯一需要的是消息。這兩個div的ID是mobileBody和mobileTitle。

<div id='mobileBody' style='display:none;'>Message</div>

什麼是錯在我的正則表達式，不允許我趕上正確的文本？

來源

2017-10-10 osh arko

使用HTML解析器等[HtmlAgilityPack] （http://html-agility-pack.net/?z=codeplex）。另請參閱https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags –

您應該使用HTML解析器。 – SLaks

我說我不能使用外部庫，所以我不能使用HtmlAgilityPack。 –

你可以試試這個：
<[a-z\s]+id=[\'\"]mobile[\w]+[\'\"][\sa-zA-Z\d\'\=\;\:]*>([a-zA-Z\d\s]+)<[\/a-z\s]+>
反正它不會匹配特殊字符或符號。
可以測試並在這裏對其進行優化：https://regex101.com/r/fnYQ1o/10

EDIT - 代碼示例
這可能是代碼，以提取消息的部分：

var rgx = @"<[a-z\s]+id=[\']mobile[\w]+[\'][\sa-zA-Z\d\s\'\=\;\:]*>([a-zA-Z\d\s]+)<[\/a-z\s]+>"; 
var txt = "<!DOCTYPE html><html lang='it' xml:lang='it'><!-- <![endif]--><head><meta http-equiv='Content-Type' content='text/html; charset=UTF-8'><title>Banca Mediolanum S.p.A. | Accesso clienti</title><meta name='description' content='Banca Mediolanum S.p.A. | Accesso clienti'><meta name='keywords' content='Banca Mediolanum S.p.A. | Accesso clienti'><meta name='title' content='Banca Mediolanum S.p.A. | Accesso clienti'><meta name='author' content='Banca Mediolanum S.p.A.'><meta name='robots' content='index, follow'><meta name='viewport' content='width=1439,user-scalable=no'><link rel='shortcut icon' href='./images/favicon.ico' type='image/x-icon'><style>#cort {background-image: url(bmedonline_10set.png);background-repeat: no-repeat;background-position-x: center;height: 850px;width: auto;/*background-size: 100%;*/}@media only screen and (max-width: 768px) and (min-width: 641px) section.contactus-area.chat {}body {border: 0 none;margin: 0;padding: 0}</style></head><body class=' '><!-- Google Tag Manager --><script>(function (w, d, s, l, i) {w[l] = w[l] || [];w[l].push({'gtm.start': new Date().getTime(),event: 'gtm.js'});var f = d.getElementsByTagName(s)[0],j = d.createElement(s),dl = l != 'dataLayer' ? '&l=' + l : '';j.async = true;j.src ='//www.googletagmanager.com/gtm.js?id=' + i + dl;f.parentNode.insertBefore(j, f);})(window, document, 'script', 'dataLayer', 'GTM-KGSP');</script><!-- End Google Tag Manager --><div id='cort'></div><div id='mobileTitle' style='display:none;'>Titolo prova</div><div id='mobileBody' style='display:none;'>Corpo messaggio prova</div></body></html>"; 

/* Using matches and aggregation */ 
var matches = Regex.Matches(txt, rgx).Cast<Match>(); 
/* Aggregation without using foreach*/ 
if (matches != null && matches.Count() > 0) 
{ 
    matches = matches.Where(x => !String.IsNullOrEmpty(x.Groups[1].Value)); 
    var exitString = matches.Select(x => x.Groups[1].Value).Aggregate((x, y) => x + "-" + y); 
    Console.WriteLine("Match and aggregation"); 
    Console.WriteLine(exitString); 
    } 

    /* using replace with regex: .*<div id='mobileTitle'[\s\w\W]*>([\s\w]*)<\/div>[\s\r\n]*<div id='mobileBody'[\s\w\W]*>([\s\w]*)<\/div>.* */ 
    Console.WriteLine(); 
    Console.WriteLine(@"Replace with another regex"); 
    Console.WriteLine(Regex.Replace(txt, @".*<div id='mobileTitle'[\s\w\W]*>([\s\w]*)<\/div>[\s\r\n]*<div id='mobileBody'[\s\w\W]*>([\s\w]*)<\/div>.*", "$1-$2")); 

    Console.ReadLine();

來源

2017-10-10 19:34:48 Daniele

嗨，謝謝你的回覆。如果在我使用該方法之前：Regex.Match（string，「your_regex」）; 我有錯誤「無法識別的轉義序列」。如果我在你的正則表達式之前放置@，錯誤也是。我怎麼能刪除錯誤？ –

我不知道爲什麼，但它也行不通。我的確如下： var s = Regex.Matches（json，「

", false) });

c＃正則表達式捕獲兩個字符串之間的字符串

回答

相關問題