下面是一個可能有用的片段。我真的質疑你是否使用最好的方法,所以我做了一些假設(也許你只是沒有給出足夠的細節)。
我將xml解析爲XmlDocument
以在代碼中使用它。相關標籤(「LinkedFile」)被拉出。每個標籤被解析爲Uri
。如果失敗了,它就會失效並且再次嘗試解析。最後將是包含正確解析的url的字符串列表。如果你真的需要,你可以在這個集合上使用你的正則表達式。
// this is for the interactive console
#r "System.Xml.Linq"
using System.Xml;
using System.Xml.Linq;
// sample data, as provided in the post.
string rawXml = "<SupportingDocs><LinkedFile>http://llcorp/ll/lljomet.dll/open/864606</LinkedFile><LinkedFile>http://llcorp/ll/lljomet.dll/open/1860632</LinkedFile><LinkedFile>%20http%3A%2F%2Fllenglish%2Fll%2Fll.exe%2Fopen%2F927515</LinkedFile><LinkedFile>%20http%3A%2F%2Fllenglish%2Fll%2Fll.exe%2Fopen%2F973783</LinkedFile></SupportingDocs>";
var xdoc = new XmlDocument();
xdoc.LoadXml(rawXml)
// will store urls that parse correctly
var foundUrls = new List<String>();
// temp object used to parse urls
Uri uriResult;
foreach (XmlElement node in xdoc.GetElementsByTagName("LinkedFile"))
{
var text = node.InnerText;
// first parse attempt
var result = Uri.TryCreate(text, UriKind.Absolute, out uriResult);
// any valid Uri will parse here, so limit to http and https protocols
// see https://stackoverflow.com/a/7581824/1462295
if (result && (uriResult.Scheme == Uri.UriSchemeHttp || uriResult.Scheme == Uri.UriSchemeHttps))
{
foundUrls.Add(uriResult.ToString());
}
else
{
// The above didn't parse, so check if this is an encoded string.
// There might be leading/trailing whitespace, so fix that too
result = Uri.TryCreate(Uri.UnescapeDataString(text).Trim(), UriKind.Absolute, out uriResult);
// see comments above
if (result && (uriResult.Scheme == Uri.UriSchemeHttp || uriResult.Scheme == Uri.UriSchemeHttps))
{
foundUrls.Add(uriResult.ToString());
}
}
}
// interactive output:
> foundUrls
List<string>(4) { "http://llcorp/ll/lljomet.dll/open/864606", "http://llcorp/ll/lljomet.dll/open/1860632", "http://llenglish/ll/ll.exe/open/927515", "http://llenglish/ll/ll.exe/open/973783" }
您在https後匹配兩個斜線。那些出現在前兩個,但不是第二個。可能還有其他問題,但這是我看到的第一個問題。 – BurnsBA