這是我第一次使用SAXParser(我在Android中使用它,但我認爲這對這個特定問題沒有什麼不同),我正在嘗試閱讀來自RSS提要的數據。到目前爲止,它在大多數情況下都非常適合我,但在遇到包含HTML編碼文本的標記時(例如<a href="http://...
),我遇到了麻煩。 characters()
方法只能在<
中作爲<
讀取,然後將下一組字符視爲單獨的實體,而不是一次取得整個內容。我寧願它只是直接讀取它,而不實際翻譯HTML。我用我的文檔處理程序(縮短)的代碼貼在下面:使用SAXParser從XML中檢索HTML編碼文本
@Override
public void startElement(String uri, String localName, String qName, Attributes attrs) throws SAXException {
if (localName.equalsIgnoreCase("channel")) {
inChannel = true;
}
if (inChannel) {
if (newFeed == null) newFeed = new Feed();
if (localName.equalsIgnoreCase("image")) {
if (feedImage == null) feedImage = new Image();
inImage = true;
}
if (localName.equalsIgnoreCase("item")) {
if (newItem == null) newItem = new Item();
if (itemList == null) itemList = new ArrayList<Item>();
inItem = true;
}
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if(!inItem) {
if(!inImage) {
if(inChannel) {
//Reached end of feed
if(localName.equalsIgnoreCase("channel")) {
newFeed.setItems((ArrayList<Item>)itemList);
finalFeed = newFeed;
newFeed = null;
inChannel = false;
return;
} else if(localName.equalsIgnoreCase("title")) {
newFeed.setTitle(currentValue); return;
} else if(localName.equalsIgnoreCase("link")) {
newFeed.setLink(currentValue); return;
} else if(localName.equalsIgnoreCase("description")) {
newFeed.setDescription(currentValue); return;
} else if(localName.equalsIgnoreCase("language")) {
newFeed.setLanguage(currentValue); return;
} else if(localName.equalsIgnoreCase("copyright")) {
newFeed.setCopyright(currentValue); return;
} else if(localName.equalsIgnoreCase("category")) {
newFeed.addCategory(currentValue); return;
}
}
}
else { //is inImage
//finished with feed image
if(localName.equalsIgnoreCase("image")) {
newFeed.setImage(feedImage);
feedImage = null;
inImage = false;
return;
} else if (localName.equalsIgnoreCase("url")) {
feedImage.setUrl(currentValue); return;
} else if (localName.equalsIgnoreCase("title")) {
feedImage.setTitle(currentValue); return;
} else if (localName.equalsIgnoreCase("link")) {
feedImage.setLink(currentValue); return;
}
}
}
else { //is inItem
//finished with news item
if (localName.equalsIgnoreCase("item")) {
itemList.add(newItem);
newItem = null;
inItem = false;
return;
} else if (localName.equalsIgnoreCase("title")) {
newItem.setTitle(currentValue); return;
} else if (localName.equalsIgnoreCase("link")) {
newItem.setLink(currentValue); return;
} else if (localName.equalsIgnoreCase("description")) {
newItem.setDescription(currentValue); return;
} else if (localName.equalsIgnoreCase("author")) {
newItem.setAuthor(currentValue); return;
} else if (localName.equalsIgnoreCase("category")) {
newItem.addCategory(currentValue); return;
} else if (localName.equalsIgnoreCase("comments")) {
newItem.setComments(currentValue); return;
} /*else if (localName.equalsIgnoreCase("enclosure")) {
To be implemented later
}*/ else if (localName.equalsIgnoreCase("guid")) {
newItem.setGuid(currentValue); return;
} else if (localName.equalsIgnoreCase("pubDate")) {
newItem.setPubDate(currentValue); return;
}
}
}
@Override
public void characters(char[] ch, int start, int length) {
currentValue = new String(ch, start, length);
}
而RSS提要我試圖解析的一個例子是this one。
任何想法?
是的,這就是薩克斯分析器的工作原理 – Falmarri 2010-10-29 06:14:51
顯然我發現了。 – kcoppock 2010-10-29 11:52:45