2012-07-09 155 views
0

我正在尋找這個問題的解決方案: 我有一個文件(製表符分隔),就像我在下面的blockquote中顯示的那樣。正如你可以看到 有行匹配的第一部分(加粗字段)。組合行的元素

CHR 4 164440449 165354407 G1 P8002-51-75
CHR1 220871675 220962596 G2 P2368-132-84
CHR1 220871675 220962596 G2 P2369-152-116
CHR1 220871675 220962596 G2 P2371 -180-82
CHR1 220871675 220962596 G2 P2372-223-129
CHR1 220871675 220962596 G2 P2 373-153-96
CHR1 220871675 220962596 G2 P2370-104-78
CHR5 126198405 126416440 G3 P9333-135-146
CHR5 126198405 126416440 G3 P9334-151-116

使用AWK或PERL我如何設法獲得以下輸出,保存製表符分隔的格式?一般概念是嘗試根據它的第一部分,以統一的線條,並添加最後一個字段

CHR 4 164440449 165354407 G1 P8002-51-75
CHR1 220871675 220962596 G2 P2368-132-84 P2369- 152-116 P2371-180-82 P2372-223-129 P2373-153-96 P2370-104-78
CHR5 126198405 126416440 G3 P9333-135-146 P9334-151-116

一般概念是試圖根據它的第一部分統一線條,a次追加的最後一個字段使用perl

+0

看到:HTTP://計算器問題/ 11389990/group-rows-in-text-file-and-aggregate-corresponding-rows-to-column/11390214#11390214 – kev 2012-07-09 09:08:39

+0

您的數據是否已按照顯示分組或可能某些行出現在ord外面呃? – 2012-07-09 10:49:49

回答

1

方式一:

perl -ane ' 
    ## Save all fields but the last one as the key to compare between rows. 
    $key = join qq|\t|, @F[ 0 .. $#F - 1 ]; 

    ## In first line or when current key is equal to previous key, save last 
    ## field in an array and stop processing current row. 
    if ($. == 1 || $key eq $pkey) { 
     $pkey = $key; 
     push @value, $F[ $#F ]; 
     next unless eof; 
    } 

    ## At this point, keys between rows are different, so print previous 
    ## key with its values and begin to save the new one. 
    printf qq|%s\n|, join qq|\t|, $pkey, @value; 
    @value =(); 
    push @value, $F[ $#F ]; 

    ## Exception: Last line with a new key, print it. 
    if (eof && $pkey ne $key) { 
    printf qq|%s\n|, join qq|\t|, $key, @value; 
    } 

    ## Save previous key. 
    $pkey = $key; 

' infile 

假設infile你的問題的數據,輸出將是:

chr4 164440449  165354407  G1  P8002-51-75 
chr1 220871675  220962596  G2  P2368-132-84 P2369-152-116 P2371-180-82 P2372-223-129 P2373-153-96 P2370-104-78 
chr5 126198405  126416440  G3  P9333-135-146 P9334-151-116 
+0

WOOOOW !!!! @Birei非常感謝,它是如此的有用:) – FoRsUs 2012-07-09 09:21:58

2
while (<DATA>) { 
    ($x, $y) = /^(.*)\s([-\w]+)$/; 
    push @{$hash{$x}}, $y; 
} 
while (($k, $v) = each %hash) { 
    print $k, join("\t", @{$v}), "\n"; 
} 
__DATA__ 
chr4 164440449 165354407 G1 P8002-51-75 
chr1 220871675 220962596 G2 P2368-132-84 
chr1 220871675 220962596 G2 P2369-152-116 
chr1 220871675 220962596 G2 P2371-180-82 
chr1 220871675 220962596 G2 P2372-223-129 
chr1 220871675 220962596 G2 P2373-153-96 
chr1 220871675 220962596 G2 P2370-104-78 
chr5 126198405 126416440 G3 P9333-135-146 
chr5 126198405 126416440 G3 P9334-151-116