正則表達式來匹配錨標記和它的href

-4

我想通過一個具有多個錨標記的html字符串運行正則表達式，並構建鏈接文本字典與其href url。正則表達式來匹配錨標記和它的href

<p>This is a simple text with some embedded <a href="http://example.com/link/to/some/page?param1=77&param2=22">links</a>. This is a <a href="https://exmp.le/sample-page/?uu=1">different link</a>.

如何提取一氣呵成<a>標籤的文字和HREF？

編輯：

func extractLinks(html: String) -> Dictionary<String, String>? { 

    do { 
     let regex = try NSRegularExpression(pattern: "/<([a-z]*)\b[^>]*>(.*?)</\1>/i", options: []) 
     let nsString = html as NSString 
     let results = regex.matchesInString(html, options: [], range: NSMakeRange(0, nsString.length)) 
     return results.map { nsString.substringWithRange($0.range)} 
    } catch let error as NSError { 
     print("invalid regex: \(error.localizedDescription)") 
     return nil 
    } 
}

來源

2017-05-05 Rao

你的正則表達式代碼在哪裏？ – matt

@matt：他們在等你寫它。 –

它非常糟糕。 – Rao

首先，你需要學習NSRegularExpression的pattern的基本語法：

pattern不包含分隔符
pattern不含改性劑，你需要通過如下信息options
當你wa nt使用元字符\，則需要在Swift字符串中將其轉義爲\\。

因此，創造NSRegularExpression實例的行應該是這樣的：

let regex = try NSRegularExpression(pattern: "<([a-z]*)\\b[^>]*>(.*?)</\\1>", options: .caseInsensitive)

但是，正如你可能已經知道，你的模式不包含任何代碼以匹配href或捕獲它的價值。

像這樣的你的榜樣html工作：

let pattern = "<a\\b[^>]*\\bhref\\s*=\\s*(\"[^\"]*\"|'[^']*')[^>]*>((?:(?!</a).)*)</a\\s*>" 
let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive) 
let html = "<p>This is a simple text with some embedded <a\n" + 
    "href=\"http://example.com/link/to/some/page?param1=77&param2=22\">links</a>.\n" + 
    "This is a <a href=\"https://exmp.le/sample-page/?uu=1\">different link</a>." 
let matches = regex.matches(in: html, options: [], range: NSRange(0..<html.utf16.count)) 
var resultDict: [String: String] = [:] 
for match in matches { 
    let hrefRange = NSRange(location: match.rangeAt(1).location+1, length: match.rangeAt(1).length-2) 
    let innerTextRange = match.rangeAt(2) 
    let href = (html as NSString).substring(with: hrefRange) 
    let innerText = (html as NSString).substring(with: innerTextRange) 
    resultDict[innerText] = href 
} 
print(resultDict) 
//->["different link": "https://exmp.le/sample-page/?uu=1", "links": "http://example.com/link/to/some/page?param1=77&param2=22"]

記住，我的pattern上面可能錯誤地檢測到病態的一個標籤或錯過一些嵌套結構，也缺乏特色與HTML字符的工作實體...

如果你想讓你的代碼更健壯和通用，你最好考慮採用ColGraff和Rob建議的HTML解析器。

來源

2017-05-06 01:32:28 OOPer

正則表達式來匹配錨標記和它的href

回答

相關問題