2012-04-24 32 views
1

我想解析具有嵌套標籤集合的xml文件。我嘗試使用perl XML ::簡單的API來解析並且單個標籤值被完全解析,但無法解析嵌套標籤值。在perl中使用嵌套標籤解析

<archetype> 
    <original_language></original_language> 
    <description></description> 
    <archetype_id> 
    <definition></definition> 
    <ontology></ontology> 
</archetype> 
在定義部分

包含該項目的詳細信息

例如

<definition> 
. 
. 
<node_id>at0004</node_id> 
<attributes xsi:type="C_SINGLE_ATTRIBUTE"> 
<rm_attribute_name>value</rm_attribute_name> 
+<existence> </existence> 
<children xsi:type="C_DV_QUANTITY"> 
    <rm_type_name>DV_QUANTITY</rm_type_name> 
    +<occurrences></occurrences> 
    <node_id/> 
    +<property></property> 
    <list> 
    <magnitude> 
     <lower_included>true</lower_included> 
     <upper_included>false</upper_included> 
     <lower_unbounded>false</lower_unbounded> 
     <upper_unbounded>false</upper_unbounded> 
     <lower>0.0</lower> 
     <upper>1000.0</upper> 
</magnitude> 
<units>mm[Hg]</units> 
</list> 
</children> 
</attributes> 
. 
. 
</definition> 

從上面的例子文件格式我想喜歡

node_id - > at0004 
    magnitude -> lower -> 0.0 
    magnitude -> higher -> 1000.0 

請指導的內容過濾器我過濾內容。

+0

如果您包含當前的代碼,它可能會很有用。這樣我們就可以指出你出錯的地方,而不僅僅是給你完整的答案。 – 2012-04-24 10:36:14

回答

2

您需要了解有關參考文獻:perlreftutperlref,perldsc

use strictures; 
use XML::Simple qw(:strict); 

my $root = XMLin(<<'XML', ForceArray => 0, KeyAttr => undef); 
<definition> 
. 
. 
<node_id>at0004</node_id> 
<attributes xsi:type="C_SINGLE_ATTRIBUTE"> 
<rm_attribute_name>value</rm_attribute_name> 
+<existence> </existence> 
<children xsi:type="C_DV_QUANTITY"> 
    <rm_type_name>DV_QUANTITY</rm_type_name> 
    +<occurrences></occurrences> 
    <node_id/> 
    +<property></property> 
    <list> 
    <magnitude> 
     <lower_included>true</lower_included> 
     <upper_included>false</upper_included> 
     <lower_unbounded>false</lower_unbounded> 
     <upper_unbounded>false</upper_unbounded> 
     <lower>0.0</lower> 
     <upper>1000.0</upper> 
</magnitude> 
<units>mm[Hg]</units> 
</list> 
</children> 
</attributes> 
. 
. 
</definition> 
XML 

my $m = $root->{attributes}{children}{list}{magnitude}; 
printf <<'TEMPLATE', $root->{node_id}, $m->{lower}, $m->{upper}; 
node_id -> %s 
    magnitude -> lower -> %.1f 
    magnitude -> higher -> %.1f 
TEMPLATE 

use Data::Dump::Streamer qw(Dump); Dump $root; 

輸出:

node_id -> at0004 
    magnitude -> lower -> 0.0 
    magnitude -> higher -> 1000.0 

$HASH1 = { 
    attributes => { 
     children => { 
      content => [("\n +") x 2], 
      list => { 
       magnitude => { 
        lower   => '0.0', 
        lower_included => 'true', 
        lower_unbounded => 'false', 
        upper   => '1000.0', 
        upper_included => 'false', 
        upper_unbounded => 'false' 
       }, 
       units => 'mm[Hg]' 
      }, 
      node_id  => {}, 
      occurrences => {}, 
      property  => {}, 
      rm_type_name => 'DV_QUANTITY', 
      "xsi:type" => 'C_DV_QUANTITY' 
     }, 
     content   => "\n+", 
     existence   => {}, 
     rm_attribute_name => 'value', 
     "xsi:type"  => 'C_SINGLE_ATTRIBUTE' 
    }, 
    content => [("\n.\n.\n") x 2], 
    node_id => 'at0004' 
}; 
1

這裏是一個XML::Twig程序,可以做到這一點,雖然我做了一些假設,你可能需要調整。我不知道如果<defintions>可以有多個節點屬性對,所以我寫這個來處理多對:

#!/Users/brian/bin/perls/perl5.14.2 

use XML::Twig; 
use Data::Dumper; 

my $twig = XML::Twig->new(
    twig_handlers => { 
     magnitude => sub { 
      my $m = $_; 
      my $hash = $m->simplify; 
      my $node_id = $m->parent('attributes')->prev_sibling('node_id')->text; 
      print "node -> $node_id\n", 
       "\tmagnitude -> lower -> $hash->{lower} $units\n", 
       "\tmagnitude -> higher -> $hash->{upper} $units\n"; 
      }, 
     }, 
    ); 

$twig->parse(*DATA); 


__END__ 
<definition> 

<node_id>at0004</node_id> 
<attributes xsi:type="C_SINGLE_ATTRIBUTE"> 
    <rm_attribute_name>value</rm_attribute_name> 
    <existence> </existence> 
    <children xsi:type="C_DV_QUANTITY"> 
     <rm_type_name>DV_QUANTITY</rm_type_name> 
     <occurrences></occurrences> 
     <node_id/> 
     <property></property> 
     <list> 
      <magnitude> 
       <lower_included>true</lower_included> 
       <upper_included>false</upper_included> 
       <lower_unbounded>false</lower_unbounded> 
       <upper_unbounded>false</upper_unbounded> 
       <lower>0.0</lower> 
       <upper>1000.0</upper> 
      </magnitude> 
      <units>mm[Hg]</units> 
     </list> 
    </children> 
</attributes> 

<node_id>at0005</node_id> 
<attributes xsi:type="C_SINGLE_ATTRIBUTE"> 
    <rm_attribute_name>value</rm_attribute_name> 
    <existence> </existence> 
    <children xsi:type="C_DV_QUANTITY"> 
     <rm_type_name>DV_QUANTITY</rm_type_name> 
     <occurrences></occurrences> 
     <node_id/> 
     <property></property> 
     <list> 
      <magnitude> 
       <lower_included>true</lower_included> 
       <upper_included>false</upper_included> 
       <lower_unbounded>false</lower_unbounded> 
       <upper_unbounded>false</upper_unbounded> 
       <lower>100.9</lower> 
       <upper>998.7</upper> 
      </magnitude> 
      <units>mm[Hg]</units> 
     </list> 
    </children> 
</attributes> 

</definition>