如何解析這個具備java.xml.xpath XML？

我試圖解析這個XML：如何解析這個具備java.xml.xpath XML？

<?xml version="1.0" encoding="UTF-8"?> 
<veranstaltungen> 
    <veranstaltung id="201611211500#25045271"> 
    <titel>Mal- und Zeichen-Treff</titel> 
    <start>2016-11-21 15:00:00</start> 
    <veranstaltungsort id="20011507"> 
     <name>Freizeitclub - ganz unbehindert </name> 
     <anschrift>Macht los e.V. 
Lipezker Straße 48 
03048 Cottbus 
</anschrift> 
     <telefon>xxxx xxxx </telefon> 
     <fax>0355 xxxx</fax> 
[...] 
</veranstaltungen>

正如你可以看到，一些文本有空格，甚至換行。我有問題，與從節點anschrift文字，因爲我需要找到數據庫中正確的位置數據。問題是，返回的字符串是：代替

Macht los e.V.Lipezker Straße 4803048 Cottbus

：

Macht los e.V. Lipezker Straße 48 03048 Cottbus

我知道解析它應該與normalie-space()正確的方式，但我不能完全解決如何做到這一點。我嘗試這樣做：

// Does not work; afaik because xpath 1 normalizes just the first node 
xPath.compile("normalize-space(veranstaltungen/veranstaltung[position()=1]/veranstaltungsort/anschrift/text()")); 

// Does not work 
xPath.compile("veranstaltungen/veranstaltung[position()=1]/veranstaltungsort[normalize-space(anschrift/text())]"));

我也試過這裏給出的解決方案：xpath-normalize-space-to-return-a-sequence-of-normalized-strings

xPathExpression = xPath.compile("veranstaltungen/veranstaltung[position()=1]/veranstaltungsort"); 
NodeList result = (NodeList) xPathExpression.evaluate(doc, XPathConstants.NODESET); 

String normalize = "normalize-space(.)"; 
xPathExpression = xPath.compile(normalize); 

int length = result.getLength(); 
for (int i = 0; i < length; i++) { 
    System.out.println(xPathExpression.evaluate(result.item(i), XPathConstants.STRING)); 
}

的System.out打印：

Macht los e.V.Lipezker Straße 4803048 Cottbus

我在做什麼錯？

更新

我有一個解決辦法了，但是這不能成爲解決方案。下面的幾行表明我如何把綰從類HTTPResponse：

try (BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), Charset.forName(charset)))) { 
    final StringBuilder stringBuilder = new StringBuilder(); 
    String    line; 

    while ((line = reader.readLine()) != null) { 
    // stringBuilder.append(line); 
    // WORKAROUND: Add a space after each line 
    stringBuilder.append(line).append(" "); 
    } 

    // Work with the red lines 
}

我寧願有一個堅實的解決方案。

來源

2016-11-22 aProgger

'正常化空間（）'帶前緣和後空白和空白字符（包括新行）其它序列轉換爲單個空格字符。作爲你的結果不具有'anschrift'元素的文本內容的線之間的空間，必須的東西吃之前你換行*'正常化空間（）'得到完成其工作。 – Markus

本來，你似乎可以用下面的代碼讀取XML：

try (BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), Charset.forName(charset)))) { 
    final StringBuilder stringBuilder = new StringBuilder(); 
    String    line; 

    while ((line = reader.readLine()) != null) { 
    stringBuilder.append(line); 
    } 

}

這是你的新行被吃掉：readline()不不返回尾隨換行符。如果然後解析stringBuilder對象的內容，你會得到一個不正確的DOM，其中的文本節點不包含從XML原來的換行。

來源

2016-11-22 10:43:48 Markus

不知道這個。謝謝你的信息。我的解決辦法是然後檢查是否符合一個「>」結束，如果不添加「」。 – aProgger

不要這樣做。你正在修改輸入。你爲什麼想做基於線條的閱讀？爲什麼不按原樣解析輸入流？ – Markus

我應該讓自己的頭腦清醒一段時間。你是對的。現在就做這個。 – aProgger

感謝馬庫斯的幫助下，我才得以解決問題。原因是BufferedReader的readLine（）方法丟棄換行符。下面codesnippet對我的作品（也許可以提高）：

public Document getDocument() throws IOException, ParserConfigurationException, SAXException { 

    final HttpResponse response = getResponse(); // returns a HttpResonse 
    final HttpEntity entity = response.getEntity(); 
    final Charset  charset = ContentType.getOrDefault(entity).getCharset(); 

    // Not 100% sure if I have to close the InputStreamReader. But I guess so. 
    try (InputStreamReader isr = new InputStreamReader(entity.getContent(), charset == null ? Charset.forName("UTF-8") : charset)) { 
    return documentBuilderFactory.newDocumentBuilder().parse(new InputSource(isr)); 
    } 
}

來源

2016-11-22 12:09:54 aProgger

如何解析這個具備java.xml.xpath XML？

回答

相關問題