2011-05-29 70 views
5

可能重複:
Remove HTML Tags from an NSString on the iPhone地帶HTML標籤等從NSString的

我想知道剝離掉所有的HTML/JavaScript廣告等標記出一個NSString的最佳方法。

目前的解決方案,我用樹葉意見等標籤,這將是刪除它們的最好方法?

我知道解決方案,例如作者LibXML,但我希望有一些例子可以使用。

目前的解決方案:

- (NSString *)flattenHTML:(NSString *)html trimWhiteSpace:(BOOL)trim { 

    NSScanner *theScanner; 
    NSString *text = nil; 

    theScanner = [NSScanner scannerWithString:html]; 

    while ([theScanner isAtEnd] == NO) { 

     // find start of tag 
     [theScanner scanUpToString:@"<" intoString:NULL] ;     
     // find end of tag   
     [theScanner scanUpToString:@">" intoString:&text] ; 

     // replace the found tag with a space 
     //(you can filter multi-spaces out later if you wish) 
     html = [html stringByReplacingOccurrencesOfString: 
       [ NSString stringWithFormat:@"%@>", text] 
               withString:@""]; 
    } 

    // trim off whitespace 
    return trim ? [html stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]] : html; 
} 
+0

@ x3ro中投票關閉爲重複 – Mark 2011-05-29 21:33:42

+2

@馬克,他做到了,這是評論自動添加(爲了海報的利益)當一個人投票結束時。 – benzado 2011-05-29 21:35:53

+0

嗯收盤數仍然爲零,當我看到它 – Mark 2011-05-29 21:36:54

回答

17

試試這個方法從一個字符串中刪除HTML標籤:

- (NSString *)stripTags:(NSString *)str 
{ 
    NSMutableString *html = [NSMutableString stringWithCapacity:[str length]]; 

    NSScanner *scanner = [NSScanner scannerWithString:str]; 
    scanner.charactersToBeSkipped = NULL; 
    NSString *tempText = nil; 

    while (![scanner isAtEnd]) 
    { 
     [scanner scanUpToString:@"<" intoString:&tempText]; 

     if (tempText != nil) 
      [html appendString:tempText]; 

     [scanner scanUpToString:@">" intoString:NULL]; 

     if (![scanner isAtEnd]) 
      [scanner setScanLocation:[scanner scanLocation] + 1]; 

     tempText = nil; 
    } 

    return html; 
} 
+0

做好!!!!!!! – 2012-10-10 17:21:35

+1

我添加'scanner.charactersToBeSkipped = NULL'以上面的代碼,以避免字粘連,如下所述:http://stackoverflow.com/questions/2828737/strange-behaviour-of-nsscanner-on-simple-whitespace-removal – 2012-10-10 17:46:58

+0

好吧。謝謝。 – Dee 2012-10-11 06:08:14