2017-10-20 65 views
1

我試圖解析一些HTML拉弦的任何事件後的所有環節:試圖解析斯威夫特4 HTML僅使用標準庫

market_listing_row_link的「href =」

收集項目URL的列表只使用Swift 4標準庫。

我認爲我需要的是一個for循環,它繼續檢查字符的條件,一旦找到完整的字符串,它開始讀取下列項目的URL到數組中,直到達到雙引號,然後停止,然後重複這個過程直到文件結束。在C中我們略微熟悉一下,我們可以訪問一個函數(我認爲它是fgetc),它在爲文件推進位置指示器時做了這個。在Swift中有沒有類似的方法?

我的代碼到目前爲止只能找到第一次出現的字符串,我在找10個需要查找的字符串。

import Foundation 

extension String { 
    func slice(from: String, to: String) -> String? { 
     return (range(of: from)?.upperBound).flatMap { substringFrom in 
      (range(of: to, range: substringFrom..<endIndex)?.lowerBound).map { substringTo in 
       String(self[substringFrom..<substringTo]) 
      } 
     } 
    } 
} 

let itemListURL = URL(string: "http://steamcommunity.com/market/search?appid=252490")! 
let itemListHTML = try String(contentsOf: itemListURL, encoding: .utf8) 
let itemURL = URL(string: itemListHTML.slice(from: "market_listing_row_link\" href=\"", to: "\"")!)! 

print(itemURL) 

// Prints the current first URL found matching: http://steamcommunity.com/market/listings/252490/Wyrm%20Chest 
+2

我張貼這種作爲,而不是一個答案,因爲它並不直接回答你的問題中留言。您是否考慮過使用[XMLParser](https://developer.apple.com/documentation/foundation/xmlparser)?真正的XML解析通常優於正則表達式,當涉及到HTML時,例如,請參見[着名的Stack Overflow答案。](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except -xhtml-self-contained-tags/1732454#1732454) –

+1

@AlanKantz HTML不是XML,除非它碰巧實際上是xHTML。 – rmaddy

+0

@AlanKantz忘記它是HTML,我想爲一串字符搜索一串無意義的字符,將該序列後面的字符讀入一個字符串變量,直到某個字符,然後繼續搜索該序列的另一個事件以重複該過程。 – ANoobSwiftly

回答

0

您可以使用正則表達式來找到兩個特定的字符串之間的所有字符串出現(檢查這個SO answer),並使用擴展方法ranges(of:)從這個answer來獲取正則表達式的所有範圍。您只需要將選項.regularExpression傳遞給該方法。


extension String { 
    func ranges(of string: String, options: CompareOptions = .literal) -> [Range<Index>] { 
     var result: [Range<Index>] = [] 
     var start = startIndex 
     while let range = range(of: string, options: options, range: start..<endIndex) { 
      result.append(range) 
      start = range.lowerBound < range.upperBound ? range.upperBound : index(range.lowerBound, offsetBy: 1, limitedBy: endIndex) ?? endIndex 
     } 
     return result 
    } 
    func slices(from: String, to: String) -> [Substring] { 
     let pattern = "(?<=" + from + ").*?(?=" + to + ")" 
     return ranges(of: pattern, options: .regularExpression) 
      .map{ self[$0] } 
    } 
} 

測試操場

let itemListURL = URL(string: "http://steamcommunity.com/market/search?appid=252490")! 
let itemListHTML = try! String(contentsOf: itemListURL, encoding: .utf8) 
let result = itemListHTML.slices(from: "market_listing_row_link\" href=\"", to: "\"") 
result.forEach({print($0)}) 

結果

http://steamcommunity.com/market/listings/252490/Night%20Howler%20AK47 http://steamcommunity.com/market/listings/252490/Hellcat%20SAR http://steamcommunity.com/market/listings/252490/Metal http://steamcommunity.com/market/listings/252490/Volcanic%20Stone%20Hatchet http://steamcommunity.com/market/listings/252490/Box http://steamcommunity.com/market/listings/252490/High%20Quality%20Bag http://steamcommunity.com/market/listings/252490/Utilizer%20Pants http://steamcommunity.com/market/listings/252490/Lizard%20Skull http://steamcommunity.com/market/listings/252490/Frost%20Wolf http://steamcommunity.com/market/listings/252490/Cloth

+0

不要忘記使用URLSession的dataTask異步獲取你的網址的HTML數據 –

+1

這是完美的!謝謝! – ANoobSwiftly