2011-01-06 82 views
0

我已經得到了文本文件:解析YAML般的文本文件轉換成散列結構

country = { 
    tag = ENG 
    ai = { 
     flags = { } 
     combat = { ROY WLS PUR SCO EIR FRA DEL USA QUE BGL MAH MOG VIJ MYS DLH GUJ ORI JAI ASS MLC MYA ARK PEG TAU HYD } 
     continent = { "Oceania" } 
     area = { "America" "Maine" "Georgia" "Newfoundland" "Cuba" "Bengal" "Carnatic" "Ceylon" "Tanganyika" "The Mascarenes" "The Cape" "Gold" "St Helena" "Guiana" "Falklands" "Bermuda" "Oregon" } 
     region = { "North America" "Carribean" "India" } 
     war = 50 
     ferocity = no 
    } 
    date = { year = 0 month = january day = 0 } 
} 

我試圖做的是分析這個文本到Perl中的哈希結構,使數據後的輸出轉儲看起來是這樣的:

$VAR1 = { 
      'country' => { 
         'ai' => { 
            'area' => [ 
               'America', 
               'Maine', 
               'Georgia', 
               'Newfoundland', 
               'Cuba', 
               'Bengal', 
               'Carnatic', 
               'Ceylon', 
               'Tanganyika', 
               'The Mascarenes', 
               'The Cape', 
               'Gold', 
               'St Helena', 
               'Guiana', 
               'Falklands', 
               'Bermuda', 
               'Oregon' 
              ], 
            'combat' => [ 
               'ROY', 
               'WLS', 
               'PUR', 
               'SCO', 
               'EIR', 
               'FRA', 
               'DEL', 
               'USA', 
               'QUE', 
               'BGL', 
               'MAH', 
               'MOG', 
               'VIJ', 
               'MYS', 
               'DLH', 
               'GUJ', 
               'ORI', 
               'JAI', 
               'ASS', 
               'MLC', 
               'MYA', 
               'ARK', 
               'PEG', 
               'TAU', 
               'HYD' 
               ], 
            'continent' => [ 
                'Oceania' 
                ], 
            'ferocity' => 'no', 
            'flags' => [], 
            'region' => [ 
               'North America', 
               'Carribean', 
               'India' 
               ], 
            'war' => 50 
           }, 
         'date' => { 
            'day' => 0, 
            'month' => 'january', 
            'year' => 0 
            }, 
         'tag' => 'ENG' 
         } 
     }; 

硬編碼的版本可能是這樣的:

#!/usr/bin/perl 
use Data::Dumper; 
use warnings; 
use strict; 

my $ret; 

$ret->{'country'}->{tag} = 'ENG'; 
$ret->{'country'}->{ai}->{flags} = []; 
my @qw = qw(ROY WLS PUR SCO EIR FRA DEL USA QUE BGL MAH MOG VIJ MYS DLH GUJ ORI JAI ASS MLC MYA ARK PEG TAU HYD); 
$ret->{'country'}->{ai}->{combat} = \@qw; 
$ret->{'country'}->{ai}->{continent} = ["Oceania"]; 
$ret->{'country'}->{ai}->{area} = ["America", "Maine", "Georgia", "Newfoundland", "Cuba", "Bengal", "Carnatic", "Ceylon", "Tanganyika", "The Mascarenes", "The Cape", "Gold", "St Helena", "Guiana", "Falklands", "Bermuda", "Oregon"]; 
$ret->{'country'}->{ai}->{region} = ["North America", "Carribean", "India"]; 
$ret->{'country'}->{ai}->{war} = 50; 
$ret->{'country'}->{ai}->{ferocity} = 'no'; 
$ret->{'country'}->{date}->{year} = 0; 
$ret->{'country'}->{date}->{month} = 'january'; 
$ret->{'country'}->{date}->{day} = 0; 

sub hash_sort { 
    my ($hash) = @_; 
    return [ (sort keys %$hash) ]; 
} 

$Data::Dumper::Sortkeys = \hash_sort; 

print Dumper($ret); 

我不得不承認我有一個巨大的問題,德與嵌套大括號相吻合。 我試圖通過使用貪婪和非理性匹配來解決它,但它似乎並沒有做到這一點。我也讀過擴展模式(比如(?PARNO)),但我絕對不知道如何在我的特定問題中使用它們。數據的順序是不相關的,因爲我有hash_sort子例程。 我會apprieciate任何幫助。

+2

什麼創建了文本文件。我的解決方案是找到一種方法來創建文本文件,以便它真的是一個YAML文件。否則,這太瘋狂了!以標準格式創建它更容易,閱讀起來也更容易! – 2011-01-06 17:02:44

+0

悖論savefiles,是吧? – Oesor 2011-01-06 18:36:22

回答

3

我打破它歸結爲一些簡單的假設:

  1. 的條目將包括一個標識符後跟一個等號
  2. 的條目是三種基本類型之一:A級或一組或一單值
  3. 一組有3種形式:1)帶引號的空格分隔列表; 2)鍵 - 值對,3)QW狀無引號列表
  4. 一組鍵 - 值對必須包含一個密鑰和indentifier要麼nonspaces或引用 值的值

見穿插評論。

use strict; 
use warnings; 

my $simple_value_RE 
    = qr/^ \s* (\p{Alpha}\w*) \s* = \s* ([^\s{}]+ | "[^"]*") \s* $/x 
    ; 
my $set_or_level_RE 
    = qr/^ \s* (\w+) \s* = \s* [{] (?: ([^}]+) [}])? \s* $/x 
    ; 
my $quoted_set_RE 
    = qr/^ \s* (?: "[^"]+" \s+)* "[^"]+" \s* $/x 
    ; 
my $associative_RE 
    = qr/^ \s* 
     (?: \p{Alpha}\w* \s* = \s* (?: "[^"]+" | \S+) \s+)* 
     \p{Alpha}\w* \s* = \s* (?: "[^"]+" | \S+) 
     \s* $ 
    /x 
    ; 
my $pair_RE = qr/ \b (\p{Alpha}\w*) \s* = \s* ("[^"]+" | \S+)/x; 

sub get_level { 
    my $handle = shift; 
    my %level; 
    while (<$handle>) { 
     # if the first character on the line is a close, then we're done 
     # at this level 
     last if m/^\s*[}]/; 
     my ($key, $value); 

     # get simple values 
     if (($key, $value) = m/$simple_value_RE/) { 
      # done. 
     } 
     elsif (($key, my $complete_set) = m/$set_or_level_RE/) { 
      if ($complete_set) { 
       if ($complete_set =~ m/$quoted_set_RE/) { 
        # Pull all quoted values with global flag 
        $value = [ $complete_set =~ m/"([^"]+)"/g ]; 
       } 
       elsif ($complete_set =~ m/$associative_RE/) { 
        # going to create a hashref. First, with a global flag 
        # repeatedly pull all qualified pairs 
        # then split them to key and value by spliting them at 
        # the first '=' 
        $value 
         = { map { split /\s*=\s*/, $_, 2 } 
           ($complete_set =~ m/$pair_RE/g) 
         }; 
       } 
       else { 
        # qw-like 
        $value = [ split(' ', $complete_set) ]; 
       } 
      } 
      else { 
       $value = get_level($handle); 
      } 
     } 
     $level{ $key } = $value; 
    } 
    return wantarray ? %level : \%level; 
} 

my %base = get_level(\*DATA); 
2

那麼,正如David所建議的那樣,最簡單的方法是使用標準格式生成文件。 JSON,YAML或XML將更容易解析。如果你真的需要解析這種格式,我會使用Regexp::Grammars(如果你可以要求Perl 5.10)或Parse::RecDescent(如果你不能)爲它寫一個語法。這會有點棘手,尤其是因爲你似乎使用了兩個哈希數組,但它應該是可行的。

2

內容看起來很規整。爲什麼不對內容執行一些替換並將其轉換爲哈希語法,然後對其進行評估。這將是一個快速和骯髒的方式來轉換它。

假設你知道語法,你也可以編寫一個解析器。