我使用C中的PCRE正則表達式庫(http://www.pcre.org/)來解析和匹配我的HTML字符串。爲了簡化我的問題,假設我得到了源字符串:「AAA:BBBB:」,我的模式:一個(*):| B:,符號(*?)?表明它是一個非貪婪匹配,所以答案應該是兩個對決:一個是「AAA:」和其他「BBBB:」,使用PCRE的奇怪答案正則表達式
然後我編程:
char *src = "aaa: bbbb:";
char *pattern = "a(.*?):|b(.*?):";
pcre *re = NULL;
//---missing out---
re = pcre_compile(pattern, // pattern,
0, // options,
&error, // errptr,
&erroffset, // erroffset,
NULL); // tableptr,
while (
(rc = pcre_exec(re, // regex ptr,
NULL, // extra arg,
src, // subject,
strlen(src), // length,
0, // startoffset,
0, // options,
ovector, // ovector,
OVECCOUNT) // ovecsize,
)!=PCRE_ERROR_NOMATCH)
{
printf("\nOK, string has matched ...there are %d matchups\n\n",rc); //
for (i = 0; i < rc; i++)
{
char *substring_start = src + ovector[2*i];
int substring_length = ovector[2*i+1] - ovector[2*i];
printf("$%2d: %.*s length: %d\n", i, substring_length, substring_start,substring_length);
}
src = src + ovector[1]; // to move the src pointer to the end offset of current matchup
if (!src) break;
}
pcre_free(re);
我得到了我的結果:
Source : aaa: bbbb:
Pattern: "a(.*?):|b(.*?):"
OK, string has matched ...there are 2 matches
$ 0: aaa: length: 4
$ 1: aa length: 2
OK, string has matched ...there are 3 matches
$ 0: bbbb: length: 5
$ 1: length: 0
$ 2: bbb length: 3
我不知道,我怎麼會得到答案「$ 1:長度:0」?
// -------------------------------------------- --------------------------------------------
@Jonathan勒夫勒我認爲你的回答是正確的。
剛纔我試着
Source: "aaa: bbb: ccc:"
Pattern: "c(.+?):|a(.+?):|b(.+?):"
,並得到了結果是這樣的:
$ 0: aaa: length: 4
$ 1: length: 0
$ 2: aa length: 2
$ 0: bbbb: length: 5
$ 1: length: 0
$ 2: length: 0
$ 3: bbb length: 3
$ 0: cccc: length: 5
$ 1: ccc length: 3
這證明你的答案相反:
正則表達式的捕獲時停止對決被發現,所以aaa:
與a(.+?):
被捕獲後嘗試匹配c(.+?):
,並且結果的第一行顯示整個字符串時,#2示出結果失調匹配了替代c(.+?):
對於b(。+?),它被認爲是最後的正則表達式所捕獲,即說明了這兩種length : 0
對於C(。+? ),它首先被捕獲,所以沒有length : 0
你能否擴充示例源代碼以便編譯? – thuovila 2013-03-06 11:46:38
也許是因爲你使用'*'而不是'+' – jcubic 2013-03-06 11:48:42