2012-02-15 124 views
1

我對Perl很新,希望有人能幫我解決這個問題。我需要從CSV文件嵌入逗號中提取兩列。這是該格式的樣子:如何使用Perl從CSV文件中提取多列

"ID","URL","DATE","XXID","DATE-LONGFORMAT" 

我需要提取DATE立柱,XXID列,XXID後列。請注意,每一行不一定遵循相同的列數。

XXID列包含2個字母的前綴,並不總是以相同的字母開頭。它幾乎可以是aplhabet的任何信件。長度總是相同的。

最後,一旦這三列被提取,我需要對XXID列進行排序並獲得重複計數。

回答

0

你一定要使用CPAN庫來解析CSV,因爲你永遠不會考慮到格式的所有怪癖。

請參閱:How can I parse quoted CSV in Perl with a regex?

請參閱:How do I efficiently parse a CSV file in Perl?

然而,這裏是您所提供的特定字符串非常幼稚和非慣用的解決方案:

use strict; 
use warnings; 

my $string = '"ID","URL","DATE","XXID","DATE-LONGFORMAT"'; 

my @words =(); 
my $word = ""; 
my $quotec = '"'; 
my $quoted = 0; 

foreach my $c (split //, $string) 
{ 
    if ($quoted) 
    { 
    if ($c eq $quotec) 
    { 
     $quoted = 0; 
     push @words, $word; 
     $word = ""; 
    } 
    else 
    { 
     $word .= $c; 
    } 
    } 
    elsif ($c eq $quotec) 
    { 
    $quoted = 1; 
    } 
} 

for (my $i = 0; $i < scalar @words; ++$i) 
{ 
    print "column " . ($i + 1) . " = $words[$i]\n"; 
} 
3

下面是一個示例腳本使用Text::CSV模塊來解析您的csv數據。查閱模塊的文檔以找到適合您數據的設置。

#!/usr/bin/perl 
use strict; 
use warnings; 
use Text::CSV; 

my $csv = Text::CSV->new({ binary => 1 }); 

while (my $row = $csv->getline(*DATA)) { 
    print "Date: $row->[2]\n"; 
    print "Col#1: $row->[3]\n"; 
    print "Col#2: $row->[4]\n"; 
} 
3

我出版了一本名爲Tie::Array::CSV模塊,它可以讓Perl,以您的CSV互動作爲本地的Perl嵌套數組。如果你使用這個,你可以使用你的搜索邏輯並應用它,就好像你的數據已經在一個數組引用數組中一樣。看一看!

#!/usr/bin/env perl 

use strict; 
use warnings; 

use File::Temp; 
use Tie::Array::CSV; 
use List::MoreUtils qw/first_index/; 
use Data::Dumper; 

# this builds a temporary file from DATA 
# normally you would just make $file the filename 
my $file = File::Temp->new; 
print $file <DATA>; 
######### 

tie my @csv, 'Tie::Array::CSV', $file; 

#find column from data in first row 
my $colnum = first_index { /^\w.{6}$/ } @{$csv[0]}; 
print "Using column: $colnum\n"; 

#extract that column 
my @column = map { $csv[$_][$colnum] } (0..$#csv); 

#build a hash of repetitions 
my %reps; 
$reps{$_}++ for @column; 

print Dumper \%reps; 
相關問題