2013-03-08 76 views
5

我有一個CSV文件,其中包含標題行和數據之前的註釋文本,我希望將其作爲散列讀入以供進一步操作。主鍵有散列將是兩個數據值的組合。我如何?Perl CSV來散列

  1. 搜索使用模式「索引」
  2. 將標題鑰匙
  3. 讀入文件的其餘部分的標題行。

例CSV

# 
# 
# 
# 
Description information of source of file. 

index,label,bit,desc,mnemonic 
6,370,11,three,THRE 
9,240,23,four,FOR 
11,120,n/a,five,FIV 

例所需的哈希

('37011' => { 'index' => '6', 'label' => '370', 'bit' => '11', 'desc' => 'three', 'mnemonic' => 'THRE'}, '24023' => {'index' => '9', 'label' => '240', 'bit' => '23', 'desc' => 'four', 'mnemonic' => 'FOR'}, '120n/a' => {'index' => '11', 'label' => '120', 'bit' => 'n/a', 'desc' => 'five', 'mnemonic' => 'FIV'}) 

回答

9

你所需要的是,Text::CSV模塊:

#!/usr/bin/env perl 
use strict; 
use warnings; 
use Data::Dumper; 
use Text::CSV; 

my $filename = 'test.csv'; 

# watch out the encoding! 
open(my $fh, '<:utf8', $filename) 
    or die "Can't open $filename: $!"; 

# skip to the header 
my $header = ''; 
while (<$fh>) { 
    if (/^index,/x) { 
     $header = $_; 
     last; 
    } 
} 

my $csv = Text::CSV->new 
    or die "Text::CSV error: " . Text::CSV->error_diag; 

# define column names  
$csv->parse($header); 
$csv->column_names([$csv->fields]); 

# parse the rest 
while (my $row = $csv->getline_hr($fh)) { 
    my $pkey = $row->{label} . $row->{bit}; 
    print Dumper { $pkey => $row }; 
} 

$csv->eof or $csv->error_diag; 
close $fh; 
+0

[Text :: CSV :: Simple](https://metacpan.org/pod/Text::CSV::Simple)使這更簡單。大師作品 – 2014-06-06 13:53:53

3

你總是可以這樣做:

#!/usr/bin/env perl 

use strict; 
use warnings; 

my %hash; 
while(<DATA>){ last if /index/ } # Consume the header 
my $labels = $_; # Save the last line for hash keys 
chop $labels; 
while(<DATA>){ 
    chop; 
    my @a = split ','; 
    my $idx = 0; 
    my %h = map { $_ => $a[$idx++]} split(",", $labels); 
    $hash{ $a[1] . $a[2] } = \%h; 
} 

while(my ($K, $H) = each %hash){ 
    print "$K :: "; 
    while(my($k, $v) = each(%$H)) { 
     print $k . "=>" . $v . " "; 
    } 
    print "\n"; 
} 

__DATA__ 

# 
# 
# 
# 
Description information of source of file. 

index,label,bit,desc,mnemonic 
6,370,11,three,THRE 
9,240,23,four,FOR 
11,120,n/a,five,FIV 
+0

我同意......如果你知道你的輸入格式,不需要調用難看的模塊和數千行代碼。 – 2013-06-18 07:16:42

1

簡單,pasteable解析器

sub parse_csv { 
    my ($f, $s, %op) = @_; # file, sub, options 
    my $d = $op{delim}?$op{delim}:"\t"; # delimiter, can be a regex 
    open IN, $f; 
    $_=<IN>; chomp; 
    my @h=map {s/"//g; lc} split /$d/; # header assumed, could be an option 
    $h[0]="id" if $h[0] eq ""; # r compatible 
    while(<IN>) { 
     chomp; 
     my @d=split /$d/; 
     map {s/^"//; s/"$//;} @d; # any file with junk in it should fail anyway 
     push @h, "id" if (@h == (@d - 1)); # r compat 
     my %d=map {$h[$_]=>$d[$_]} (0..$#d); 
     &{$s}(\%d); 
    } 
} 

實例:

parse_csv("file.txt", sub { 
    die Dumper $_[0]; 
}) 

注意的東西像$和$ _仍然在子

+0

! (@h ==(@d - 1));'或行'&{$ s}(\%d);'做?什麼是'lc'? 'r compatible'是什麼意思? – msciwoj 2015-12-23 07:59:27

1

Text::CSV::Simple以來一直存在的工作2005 ...

來自文檔:

# Map the fields to a hash 
my $parser = Text::CSV::Simple->new; 
$parser->field_map(qw/id name null town/); 
my @data = $parser->read_file($datafile); 

...簡單!