2012-04-20 103 views
0

我試圖使用XML :: SAX修改XHTML文檔的某些部分,但是所有嘗試都失敗了。使用Perl XML :: SAX修改XML文檔

這裏是我想要做的事:

#!/usr/bin/perl 
package MyHandler; 
use strict; 
use warnings; 

use base qw(XML::SAX::Base); 
use Data::Dumper; 

sub start_element { 
    my $self = shift; 
    my $data = shift; 

    if($data->{LocalName} eq 'span') { 
     $data->{LocalName} = 'naps'; 
    } 

    $self->SUPER::start_element($data); # GOOD (and easy) ! 
    #print Dumper($data); 
} 

1; 

#============================ 
#Main programm 
#============================ 
use strict; 
use warnings; 

use XML::SAX::ParserFactory; 
use XML::SAX::Writer; 

my $out; 

my $o = XML::SAX::Writer->new(Output => \$out); 
my $h = MyHandler->new(Handler => $o); 
my $p = XML::SAX::ParserFactory->parser(Handler => $h); 

my $data; 
{ local undef $/ }; $data = <DATA>; 
$p->parse_string($data); 
print $out; 


__DATA__ 
<?xml version="1.0" encoding="UTF-8"?> 
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:wicket="http://wicket.apache.org/dtds.data/wicket-xhtml1.4-strict.dtd"> 
<body> 
<wicket:panel> 
    <form wicket:id="mvpForm"> 
     <span>Edit Information: </span> 
     <input type="checkbox" wicket:id="editForm"/> 

     <span>Name: </span> 
     <span wicket:id="name"></span> 
     <input type="text" wicket:id="nameEdit"/> 

     <span>Last Name: </span> 
     <span wicket:id="lastName"></span> 
     <input type="text" wicket:id="lastNameEdit"/> 

     <span>DOB: </span> 
     <span wicket:id="dob"></span> 
     <input type="text" wicket:id="dobEdit"/> 


     <span>Occupation: </span> 
     <span wicket:id="occupation"></span> 
     <input type="text" wicket:id="occupationEdit"/> 


     <span>Gender: </span> 
     <span wicket:id="gender"></span> 
     <span wicket:id="genderEdit"/> 

     <input type="submit" wicket:id="submit"/> 

    </form> 
</wicket:panel> 
</body> 
</html> 

的基本思想是每一個「跨度」更改爲「小睡」,寫所產生的修改後的XML到stdout。另外,看看它是否可以使用SAX合併xml塊,換句話說,如果我找到了一個擴展到其他東西的特定元素,我怎樣才能將它與輸出合併到一起STDOUT?

E.g. 來源:

<xmltag> 
    <expandable/> 
</xmltag> 

要:

<xmltag> 
    <expanded> 
     This is an expanded element 
    </expanded> 
</xmltag> 

感謝。

回答

1

回答我的關於合併/擴展元素的問題,這裏是如何與薩克斯做一個片段:

#!/usr/bin/perl 
package MyHandler; 
use strict; 
use warnings; 

use base qw(XML::SAX::Base); 
use Data::Dumper; 

use XML::SAX::ParserFactory; 
use XML::SAX::Writer; 

sub start_element { 
    my $self = shift; 
    my $data = shift; 

    if($data->{LocalName} eq 'expand') { 
     $self->{in_include}++; 
     my $p = XML::SAX::ParserFactory->parser(Handler => $self); 
     $p->parse_string("<expanded>This is my expanded tag</expanded>"); 
     return; 
    } 

    #$data->{Attributes} = undef; 
    $self->SUPER::start_element($data); 
    #print Dumper($data); 
} 

sub characters { 
    my $self = shift; 
    my $data = shift; 

    #print "Data is $data->{Data}" if defined $data->{Data}; 
    $self->SUPER::characters($data); 
} 

sub end_element { 
    my ($self, $element) = @_; 
    if ($element->{LocalName} eq "expand") { 
     $self->{in_include}--; 
    } else { 
     $self->SUPER::end_element($element); 
    } 
} 

sub start_document { # same for end_document 
    my($self, $data) = @_; 
    return if($self->{in_include}); 
    $self->SUPER::start_document($data); 
} 

sub end_document { # same for end_document 
    my($self, $data) = @_; 
    return if($self->{in_include}); 
    $self->SUPER::end_document($data); 
} 

1; 

#============================ 
#Main programm 
#============================ 
use strict; 
use warnings; 

use XML::SAX::ParserFactory; 
use XML::SAX::Writer; 

my $out; 

my $o = XML::SAX::Writer->new(Output => \$out); 
my $h = MyHandler->new(Handler => $o); 
my $p = XML::SAX::ParserFactory->parser(Handler => $h); 

my $data; 
{ local undef $/ }; $data = <DATA>; 
$p->parse_string($data); 
print $out; 


__DATA__ 
<?xml version="1.0" encoding="UTF-8"?> 
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:wicket="http://wicket.apache.org/dtds.data/wicket-xhtml1.4-strict.dtd"> 
<body> 
<wicket:panel> 
    <form wicket:id="mvpForm"> 
     <span>Edit Information: </span> 
     <input type="checkbox" wicket:id="editForm"/> 

     <span>Name: </span> 
     <span wicket:id="name"></span> 
     <input type="text" wicket:id="nameEdit"/> 

     <span>Last Name: </span> 
     <span wicket:id="lastName"></span> 
     <input type="text" wicket:id="lastNameEdit"/> 

     <span>DOB: </span> 
     <span wicket:id="dob"></span> 
     <input type="text" wicket:id="dobEdit"/> 

     <span>Occupation: </span> 
     <span wicket:id="occupation"></span> 
     <input type="text" wicket:id="occupationEdit"/> 

     <span>Gender: </span> 
     <span wicket:id="gender"></span> 
     <span wicket:id="genderEdit"/> 

     <input type="submit" wicket:id="submit"/> 

     <expand/> 

    </form> 
</wicket:panel> 
</body> 
</html> 

<expand/>標籤將<expanded>This is my expanded tag</expanded>被替換。

基本上所有需要的是創建一個新的解析器,並將其傳遞給一個文件/字符串進行解析。但是,請注意,有幾個陷阱。第一個是停止傳播你已經攔截了要擴展標籤的事件。換句話說,不管何時擴展/嵌套標籤,都不要調用$ self-> SUPER :: start/end_element,這會阻止替換的標籤在輸出中結束。其次,它需要攔截START_DOCUMENT/END_DOCUMENT並跳過呼籲那些那些家長,否則下面的錯誤會產生:

試圖彈出上下文不推的上下文是/ usr /共享/的perl5/XML/NamespaceSupport 。第79行,大塊1。

換句話說一些清理失敗:正在觸發

此消息,因爲XML :: NamespaceSupport確實在START_DOCUMENT事件的一些初始化和上END_DOCUMENT事件進行一些清理。問題在於,在您的代碼中,主文檔將包含一對這樣的事件,並且每個包含的文檔都會有一對嵌套對。當發生第二個end_document事件時,沒有任何東西需要清理 - 因此是消息。 Taken from here

1

好像從主要名稱作家選秀元素的名稱,而不是的localName。因此,而不是修改LocalName修改名稱以獲得所需的結果。

if($data->{LocalName} eq 'span') { 
    $data->{LocalName} = 'naps'; 
} 

將其更改爲

if($data->{LocalName} eq 'span') { 
    $data->{Name} = 'naps'; 
} 
+0

有關添加文本節點是什麼? – daxim 2012-04-20 07:16:08

+0

我不認爲SAX支持添加節點。可能使用骯髒的方式! – tuxuday 2012-04-20 07:34:33

+0

謝謝,那是一個有點意外的壽:)。是的,看起來最好的方法是在找到可擴展節點時創建另一個sax解析器,但是如何將它與主處理管道合併?我會再試驗一下,可能畢竟有一個解決方案。 – dryajov 2012-04-20 16:47:10

2

SAX不是這樣微不足道的變化的最佳工具。考慮一個DOM實現。

use strictures; 
use XML::LibXML qw(); 
my $dom = XML::LibXML->load_xml(…); 

for my $e ($dom->findnodes('//*')) { 
    $e->setNodeName('naps') if 'span' eq $e->nodeName; 
    if ('expandable' eq $e->nodeName) { 
     $e->setNodeName('expanded'); 
     $e->appendText('This is an expanded element'); 
    } 
} 
print $dom->toString; # ->toFile 
+0

謝謝,這也適用,唯一的consern是這將是如何內存密集。 – dryajov 2012-04-20 19:33:19

2

這裏是一個XML::Twig基於解決方案,我更容易找到比SAX使用(但後來我可能會有點偏頗; - )。由於只有1 span(或expandable)元素保留在內存中,因此非常有效。

#!/usr/bin/perl 

use strict; 
use warnings; 

use XML::Twig; 

XML::Twig->new(twig_roots => { span  => sub { $_->set_tag('naps')->flush; }, 
           expandable => sub { XML::Twig::Elt->new(expanded => 'this is an expanded element')->print; }, 
           }, 
       twig_print_outside_roots => 1, 
      ) 
      ->parsefile(\*DATA); 
__DATA__ 
<?xml version="1.0" encoding="UTF-8"?> 
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:wicket="http://wicket.apache.org/dtds.data/wicket-xhtml1.4-strict.dtd"> 
<body> 
<wicket:panel> 
    <form wicket:id="mvpForm"> 
     <span>Edit Information: </span> 
     <input type="checkbox" wicket:id="editForm"/> 

     <span>Name: </span> 
     <span wicket:id="name"></span> 
     <input type="text" wicket:id="nameEdit"/> 

     <span>Last Name: </span> 
     <span wicket:id="lastName"></span> 
     <input type="text" wicket:id="lastNameEdit"/> 

     <span>DOB: </span> 
     <span wicket:id="dob"></span> 
     <input type="text" wicket:id="dobEdit"/> 


     <span>Occupation: </span> 
     <span wicket:id="occupation"></span> 
     <input type="text" wicket:id="occupationEdit"/> 


     <span>Gender: </span> 
     <span wicket:id="gender"></span> 
     <span wicket:id="genderEdit"/> 

     <input type="submit" wicket:id="submit"/> 

    </form> 

<xmltag> 
    <expandable/> 
</xmltag> 

</wicket:panel> 
</body> 
</html> 
+0

+1容易的事情容易 – daxim 2012-04-20 11:06:00

+0

至少在模塊的用戶; - ) – mirod 2012-04-20 11:37:30

+0

,看起來非常簡單,方便。我試圖從基於DOM /樹解決方案望而卻步,因爲他們通常更內存密集型,但樹枝是DOM和SAX的亮度的便利之間進行很好的平衡。謝謝! – dryajov 2012-04-20 16:44:22