如何在Perl中分割固定寬度的列？

編程對我來說太新了，以至於我不知道如何描述問題，對此我表示抱歉。如何在Perl中分割固定寬度的列？

我有一個從內部工具獲取變量的Perl腳本。這並不總是什麼樣子，但它總是遵循這個模式：

darren.local   1987 A  Sentence1 
darren.local   1996 C  Sentence2 
darren.local   1991 E  Sentence3 
darren.local   1954 G  Sentence4 
darren.local   1998 H  Sentence5

用Perl，那是什麼讓這些線成本身是一個變量最簡單的方法？根據內部工具吐出的內容，每條線總是不同的，並且可能有多於五條線。每行中的大寫字母就是最後按（全部爲，全部爲Cs，全部爲Es等）排序的字母。我應該看看正則表達式嗎？

來源

2009-12-12 scraft3613

這些數據/行在哪裏？你的內部工具是否將它們放入單個變量中？或者，您需要閱讀文件中的文本數據？ – 2009-12-12 16:04:33

該工具將它們放入單個變量中。 – scraft3613 2009-12-12 16:17:26

有Perl新手!!! 1 – nes1983 2009-12-12 16:23:44

我喜歡用這種東西使用unpack。它快速，靈活且可逆。

您只需要知道每列的位置，並且unpack可以自動修剪每列的額外空白。

如果更改了其中一列的東西，很容易被具有相同的格式重新包裝去包原來的格式：

my $format = 'A23 A8 A7 A*'; 

while(<DATA>) { 
    chomp(my $line = $_); 

    my($machine, $year, $letter, $sentence) = 
     unpack($format, $_); 

    # save the original line too, which might be useful later 
    push @grades, [ $machine, $year, $letter, $sentence, $_ ]; 
    } 

my @sorted = sort { $a->[2] cmp $b->[2] } @grades; 

foreach my $tuple (@sorted) { 
    print $tuple->[-1]; 
    } 

# go the other way, especially if you changed things 
foreach my $tuple (@sorted) { 
    print pack($format, @$tuple[0..3]), "\n"; 
    } 

__END__ 
darren.local   1987 A  Sentence1 
darren.local   1996 C  Sentence2 
darren.local   1991 E  Sentence3 
darren.local   1954 G  Sentence4 
darren.local   1998 H  Sentence5

現在，有一個額外的考慮。這聽起來像你可能在單個變量中有這麼大的多行文本。通過在對標量的引用上打開文件句柄來像處理文件一樣處理此操作。該文件句柄的東西需要照顧其餘的：

my $lines = '...multiline string...'; 

open my($fh), '<', \ $lines; 

while(<$fh>) { 
     ... same as before ... 
     }

來源

2009-12-12 17:22:24

也可以使用「A23 A8 A7 A *」的格式。 – 2009-12-12 17:39:13

可讀的Perl的一個很好的例子...（甚至對於每兩年一次的用戶） – Rook 2009-12-12 17:42:13

我不確定你看到了哪種格式，因爲我發佈了第一個錯誤，但是我們結束了格式相同。 – 2009-12-12 17:42:56

use strict; 
use warnings; 

# this puts each line in the array @lines 
my @lines = <DATA>; # <DATA> is a special filehandle that treats 
        # everything after __END__ as if it was a file 
        # It's handy for testing things 

# Iterate over the array of lines and for each iteration 
# put that line into the variable $line 
foreach my $line (@lines) { 
    # Use split to 'split' each $line with the regular expression /s+/ 
    # /s+/ means match one or more white spaces. 
    # the 4 means that all whitespaces after the 4:th will be ignored 
    # as a separator and be included in $col4 
    my ($col1, $col2, $col3, $col4) = split(/\s+/, $line, 4); 

    # here you can do whatever you need to with the data 
    # in the columns. I just print them out 
    print "$col1, $col2, $col3, $col4 \n"; 
} 


__END__ 
darren.local   1987 A  Sentece1 
darren.local   1996 C  Sentece2 
darren.local   1991 E  Sentece3 
darren.local   1954 G  Sentece4 
darren.local   1998 H  Sentece5

來源

2009-12-12 16:23:26 Nifle

對於文本的東西每行是這樣的：

my ($domain, $year, $grade, @text) = split /\s+/, $line;

我使用一個數組的句子，因爲它是不明確的，如果在最後的判決將有空格或沒有。然後可以根據需要將@text數組加入一個新字符串中。如果最後的句子沒有空格，那麼可以將@text轉換爲$ text。

來源

2009-12-12 16:23:51

也會分割一個句子如果您要在此情況下使用split，請使用第三個參數來限制它返回的元素數量。如果最後一列有明顯的空白，則會丟失部分數據。 – 2009-12-12 17:39:03

假設文本被放入一個變量$信息，那麼你可以把它拆分成使用內部的Perl分割功能不同的行：

my @lines = split("\n", $info);

其中@lines是你的線條組成的數組。「\ n」是換行符的正則表達式。可以通過每個線環如下：

foreach (@lines) { 
    $line = $_; 
    # do something with $line.... 
}

然後可以拆分上空白的每一行（正則表達式\ S +，其中\ s是一個空白字符，以及+是指1或大於1次）：

@fields = split("\s+", $line);

和然後可以通過它的數組索引直接訪問每個字段：$字段[0]，$字段[1]等

，或者可以這樣做：

($var1, $var2, $var3, $var4) = split("\s+", $line);

這將把每行中的字段放入單獨的命名變量中。

現在 - 如果你想排序您的線條在第三列中的字符，你可以這樣做：

my @lines = split("\n", $info); 
my @arr =(); # declare new array 

foreach (@lines) { 
    my @fields = split("\s+", $_); 
    push(@arr, \@fields) # add @fields REFERENCE to @arr 
}

現在你有一個「數組的數組」。這可以很容易地排序如下：

@sorted = sort { $a->[2] <=> $b->[2] } @arr;

這將按@fields的第三個元素（索引2）排序@arr。

編輯2要放線，相同的第三列到自己的變量，這樣做：

my %hash =();    # declare new hash 

foreach $line (@arr) {  # loop through lines 
    my @fields = @$line;  # deference the field array 

    my $el = $fields[2];  # get our key - the character in the third column 

    my $val = ""; 
    if (exists $hash { $el }) {   # check if key already in hash 
    my $val = $hash{ $el };  # get the current value for key 
    $val = $val . "\n" . $line; # append new line to hash value   
    } else { 
    $val = $line; 
    } 
    $hash{ $el } = $val;   # put the new value (back) into the hash 
}

現在你有一個第三列字符鍵，其值爲每個密鑰爲散列包含該鍵的行。然後，您可以遍歷散列並打印輸出或以其他方式使用散列值。

來源

2009-12-12 16:24:05

如果您打算在這種情況下使用split，請使用第三個參數來限制它返回的元素的數量。如果最後一列有明顯的空白，則會丟失部分數據。 – 2009-12-12 17:39:42

謝謝理查德 - 每一行都需要用大寫字母分組。根據查詢的輸出，我可以有多達20行或少至2行。帶有「C」的行需要進入一個變量，帶有「B」的行需要進入他們自己的變量，等等。 – scraft3613 2009-12-12 17:40:21

在我上面的回答中使用排序功能，你的數組將按照字母數字排序。所以「A」先出現，「B」出現，等等。如果你想把所有的「A」行放到一個單獨的變量中，那麼（像任何編程問題一樣）有很多可能性。您可以使用鍵控散列/映射，並將字符「A」等作爲您的鍵，值可以是a）一行數組或b）一個單一的單個單元，您在其中追加後續行。有關使用散列的教程，請參見here。 – 2009-12-12 17:57:11

-1

使用CPAN，和我的模塊DataExtract::FixedWidth

#!/usr/bin/env perl 
use strict; 
use warnings; 
use DataExtract::FixedWidth; 

my @rows = <DATA>; 

my $defw = DataExtract::FixedWidth->new({ heuristic => \@rows, header_row => undef }); 

use Data::Dumper; 

print Dumper $defw->parse($_) for @rows; 

__DATA__ 
darren.local   1987 A  Sentence1 
darren.local   1996 C  Sentence2 
darren.local   1991 E  Sentence3 
darren.local   1954 G  Sentence4 
darren.local   1998 H  Sentence5

沒有比這更簡單。

來源

2011-09-28 18:29:09

如何在Perl中分割固定寬度的列？

回答

相關問題