2016-11-05 104 views
1

我想使用Matlab正則表達式分析一個xml文件。具體而言,我喜歡檢索 「存款」和「/存款」之間出現的所有「曲線點」字詞的數組。所以對於XML低於它應該是一個[6X1]數組一樣正則表達式匹配兩個其他詞之間的單詞(Matlab正則表達式)

<curvepoint> 
<curvepoint> 
<curvepoint> 
<curvepoint> 
<curvepoint> 
<curvepoint> 

下面我嘗試因爲有很多其他每個文本「curvepoint之間穿插不起作用「發言和前瞻/背後,但我不知道如何處理這一點。

regexp(XMLText,'(?<=<deposits>)(<curvepoint>)(?=</deposits>)','match')' 

XMLTEXT是

<?xml version="1.0" encoding="utf-8"?> 
<interestRateCurve> 
    <effectiveasof>2016-11-07</effectiveasof> 
    <currency>EUR</currency> 
    <baddayconvention>M</baddayconvention> 
    <deposits> 
     <daycountconvention>ACT/360</daycountconvention> 
     <snaptime>2016-11-04T15:00:00.000Z</snaptime> 
     <spotdate>2016-11-09</spotdate> 
     <calendars> 
     <calendar>none</calendar> 
     </calendars> 
     <curvepoint> 
     <tenor>1M</tenor> 
     <maturitydate>2016-12-09</maturitydate> 
     <parrate>-0.00373</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>2M</tenor> 
     <maturitydate>2017-01-09</maturitydate> 
     <parrate>-0.00339</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>3M</tenor> 
     <maturitydate>2017-02-09</maturitydate> 
     <parrate>-0.00312</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>6M</tenor> 
     <maturitydate>2017-05-09</maturitydate> 
     <parrate>-0.00213</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>9M</tenor> 
     <maturitydate>2017-08-09</maturitydate> 
     <parrate>-0.0013</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>1Y</tenor> 
     <maturitydate>2017-11-09</maturitydate> 
     <parrate>-0.00071</parrate> 
     </curvepoint> 
    </deposits> 
    <swaps> 
     <fixeddaycountconvention>30/360</fixeddaycountconvention> 
     <floatingdaycountconvention>ACT/360</floatingdaycountconvention> 
     <fixedpaymentfrequency>1Y</fixedpaymentfrequency> 
     <floatingpaymentfrequency>6M</floatingpaymentfrequency> 
     <snaptime>2016-11-04T15:00:00.000Z</snaptime> 
     <spotdate>2016-11-09</spotdate> 
     <calendars> 
     <calendar>none</calendar> 
     </calendars> 
     <curvepoint> 
     <tenor>2Y</tenor> 
     <maturitydate>2018-11-09</maturitydate> 
     <parrate>-0.00157</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>3Y</tenor> 
     <maturitydate>2019-11-09</maturitydate> 
     <parrate>-0.00115</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>4Y</tenor> 
     <maturitydate>2020-11-09</maturitydate> 
     <parrate>-0.00059</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>5Y</tenor> 
     <maturitydate>2021-11-09</maturitydate> 
     <parrate>0.00017</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>6Y</tenor> 
     <maturitydate>2022-11-09</maturitydate> 
     <parrate>0.00108</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>7Y</tenor> 
     <maturitydate>2023-11-09</maturitydate> 
     <parrate>0.0021</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>8Y</tenor> 
     <maturitydate>2024-11-09</maturitydate> 
     <parrate>0.00316</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>9Y</tenor> 
     <maturitydate>2025-11-09</maturitydate> 
     <parrate>0.00419</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>10Y</tenor> 
     <maturitydate>2026-11-09</maturitydate> 
     <parrate>0.00513</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>12Y</tenor> 
     <maturitydate>2028-11-09</maturitydate> 
     <parrate>0.00673</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>15Y</tenor> 
     <maturitydate>2031-11-09</maturitydate> 
     <parrate>0.00838</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>20Y</tenor> 
     <maturitydate>2036-11-09</maturitydate> 
     <parrate>0.00966</parrate> 
     </curvepoint> 
     <curvepoint> 
     <tenor>30Y</tenor> 
     <maturitydate>2046-11-09</maturitydate> 
     <parrate>0.01006</parrate> 
     </curvepoint> 
    </swaps> 
</interestRateCurve> 

回答

0

切勿使用正則表達式來解析XML。充其量,解決方案將變得脆弱。改爲使用真正的XML解析器。

在MATLAB中,使用xmlread,xmlwritexslt函數來讀取,寫入和轉換XML。

請注意MathWorks blog has XML posts關於在MATLAB中使用這些函數。

+0

感謝kj,我沒有時間去完成這個matlab的xml功能(這只是我的「項目」中很小的一部分)。我認爲正則表達式只會像正則表達式的邏輯一樣脆弱 - 一個經驗豐富的正則表達式應該能夠找到一個嚴格的表達式。爲了獲得特定的解決方案,我花了很多時間來解決這個問題。 – user152112

+0

按照你的意願去做,但是如果你確切地意識到了一個不匹配的正則表達式對解析XML有多大的影響,那麼你就不會問這個問題了。這些風險在其他地方都有記載我沒有多少時間來重新應對這些風險,而不是你必須學會​​以正確的方式去做。 – kjhughes