輸入文本文件包含以下內容:如何使用perl將文本轉換爲XML?
....
ponies B-pro
were I-pro
used I-pro
A O
report O
of O
indirect B-cd
were O
. O
...
輸出XML文件
<sen>
<base id="pro">
<w id="1">ponies</w>
<w id="2">were</w>
<w id="3">were</w>
</base>A report of
<base id="cd">indirect</base> were
</sen>
我想通過閱讀文本文件,使XML文件,B-意味着我的標籤和的開頭I-意味着在標籤內包含單詞,而「O」意味着在基本標籤之外,這意味着它只存在於標籤中。
我試試下面的代碼:
#!/usr/local/bin/perl -w
open(my $f, "input.txt") or die "Can't";
open(my $o, ">output.xml") or die "Can't";
my $c;
sub read_line {
my $fh = shift;
if ($fh and my $line = <$fh>) {
chomp($line);
my @words = split(/\t/, $line);
my $word = $words[0];
my $group = $words[1];
if($word eq "."){
return;
}
else{
if($group ne 'O'){
my @b = split(/\-/, $group);
if($b[0] eq 'B'){
my $e = "<e id=\"";
$e .= " . $b[1] . "\">";
$e .= $word . "</e>";
return $e;
}
if($b[0] eq 'I'){
my $w = "<w id=\"";
$w .= $c . "\">";
$w .= $word . "</w>";
$c++;
return $w;
}
}
else{
$c = 2;
return $word;
}
}
}
return;
}
sub get_text(){
my $txt = "";
my $r = read_line($f);
while($r){
if($r =~ m/[[:punct:]]/){
chop($txt);
$txt .= " " . $r . " ";
}
else{
$txt .= $r . " ";
}
$r = read_line($f);
}
chop($txt);
return "<sen>" . $txt . ".</sen>";
}
,而不是即時得到作爲輸出:
<sen>
<base id="pro"> ponies </base>
<w id="2">were</w>
<w id="3">were</w>
A report of
<base id="cd">indirect</base> were
</sen>
我真的需要幫助。
謝謝
不要試圖通過將字符串打到一起來生成XML。使用適當的XML模塊。 – Quentin 2010-12-06 23:05:50
幫助我通過! – aliocee 2010-12-06 23:21:46
在你的問題中存在一系列含糊之處 - 「間接」實際上應該是直接在 「內部的文本,而不是獲得」「? '`ID只是在全球增加? (XML禁止重用一個ID)。如果我們在`blah B-bar`(基礎ID不匹配)之後立即看到`blah I-foo`,會發生什麼?我有一些工作代碼,但我不能說這是*正確的*沒有這些問題的答案。 –
hobbs
2010-12-07 02:40:59