0
我似乎遇到了這樣的錯誤:文本被寫入文件兩次,第一次格式不正確,第二次格式正確。 The method below takes in this URL after it's been converted properly.該方法應該在所有正文文本所在的分隔符「ffaq」的子節點的分隔符的所有子節點的文本轉換之間打印換行符。任何幫助,將不勝感激。我對使用jsoup相當陌生,所以解釋也會很好。Jsoup在寫入文件時解析html複製
/**
* Method to deal with HTML 5 Gamefaq entries.
* @param url The location of the HTML 5 entry to read.
**/
public static void htmlDocReader(URL url) {
try {
Document doc = Jsoup.parse(url.openStream(), "UTF-8", url.toString());
//parse pagination label
String[] num = doc.select("div.span12").
select("ul.paginate").
select("li").
first().
text().
split("\\s+");
//get the max page number
final int max_pagenum = Integer.parseInt(num[num.length - 1]);
//create a new file based on the url path
File file = urlFile(url);
PrintWriter outFile = new PrintWriter(file, "UTF-8");
//Add every page to the text file
for(int i = 0; i < max_pagenum; i++) {
//if not the first page then change the url
if(i != 0) {
String new_url = url.toString() + "?page=" + i;
doc = Jsoup.parse(new URL(new_url).openStream(), "UTF-8",
new_url.toString());
}
Elements walkthroughs = doc.select("div.ffaq");
for(Element elem : walkthroughs.select("div")) {
for(Element inner : elem.children()) {
outFile.println(inner.text());
}
}
}
outFile.close();
} catch(Exception e) {
e.printStackTrace();
System.exit(1);
}
}