收集類似列的數據

我想過濾來自unix文本文件的數據。我在UNIX文本文件，如下：收集類似列的數據

我怎麼能修改/從上面的數據如下創建數據我有AWK？

A 200 100 
B 300 600 700 
C 400

我在awk中不太好，我相信awk/perl最適合這個。

來源

2012-03-16 Vijay

awk 'END { 
    for (R in r) 
    print R, r[R] 
    } 
{ 
    r[$1] = $1 in r ? r[$1] OFS $2 : $2 
    }' infile

如果順序第一個字段的值很重要，需要更多的代碼。解決方案將取決於您的awk的實施和版本。

說明：

r[$1] = $1 in r ? r[$1] OFS $2 : $2

設置數組R元素$ 1的值：

如果鍵$ 1已經存在：$ 1中R，追加OFS 2 $ 到現有值
否則設置爲$ 2

表達式中的值？如果爲true：如果false是三元運算符。有關更多信息，請參閱ternary operation。

來源

2012-03-16 08:10:11

請問您是否可以在r中解釋'r [$ 1] = $ 1這一行？ r [$ 1] OFS $ 2：$ 2' – Vijay 2012-03-16 08:39:38

給了我一個語法錯誤 – Vijay 2012-03-16 08:43:26

Hi @peter，我已經添加了一個解釋。 – 2012-03-16 08:46:46

不是語言特定的。更像僞代碼，但這裏的想法：

- Get all lines in an array 
- Set a target dictionary of arrays 

- Go through the array : 
     - Split the string using ' '(space) as the delimiter, into array parts 
     - If there is already a dictionary entry for `parts[0]` (e.g. 'A'). 
     If not create it. 
     - Add `parts[1]` (e.g. 100) to `dictionary(parts[0])`

就是這樣！ :-)

我會這樣做，可能是在Python中，但這是一個味道的問題。

來源

2012-03-16 07:44:39

你可以不喜歡這樣，但用Perl總是有這樣做的方法不止一種：

my %hash; 
while(<>) { 
    my($letter, $int) = split(" "); 
    push @{ $hash{$letter} }, $int; 
} 

for my $key (sort keys %hash) { 
    print "$key " . join(" ", @{ $hash{$key} }) . "\n"; 
}

應該像說：

$ cat data.txt | perl script.pl 
A 200 100 
B 300 600 700 
C 400

來源

2012-03-16 07:50:30

的一個班輪：'perl -i.bk -e'my％rows; while（<>）{chomp; if（$ _ ne「」）{my（$ id，$ value）= split（）;打印「$ id $ value \ n」;推（@ {$ rows {$ id}}，$ value）; （鍵％行）{print $ _。「@ {$ rows {$ _}} \ n」; }'data.txt' – Ilion 2012-03-16 07:55:59

感謝您的回答 – Vijay 2012-03-16 08:56:07

使用awk，分揀它裏面的輸出：

awk ' 
    { data[$1] = (data[$1] ? data[$1] " " : "") $2 } 
    END { 
    for (i in data) { 
     idx[++j] = i 
    } 
    n = asort(idx); 
    for (i=1; i<=n; i++) { 
     print idx[i] " " data[idx[i]] 
    } 
    } 
' infile

使用外部程序sort：

awk ' 
    { data[$1] = (data[$1] ? data[$1] " " : "") $2 } 
    END { 
    for (i in data) { 
     print i " " data[i] 
    } 
    } 
' infile | sort

兩個命令輸出是：

A 200 100 
B 300 600 700 
C 400

來源

2012-03-16 08:38:09 Birei

使用sed：

的

內容：

## First line. Newline will separate data, so add it after the content. 
## Save it in 'hold space' and read next one. 
1 { 
    s/$/\n/ 
    h 
    b 
} 

## Append content of 'hold space' to current line. 
G 

## Search if first char (\1) in line was saved in 'hold space' (\4) and add 
## the number (\2) after it. 
s/^\(.\)\(*[0-9]\+\)\n\(.*\)\(\1[^\n]*\)/\3\4\2/ 

## If last substitution succeed, goto label 'a'. 
ta 

## Here last substitution failed, so it is the first appearance of the 
## letter, add it at the end of the content. 
s/^\([^\n]*\n\)\(.*\)$/\2\1/ 

## Label 'a'. 
:a 

## Save content to 'hold space'. 
h 

## In last line, get content of 'hold space', remove last newline and print. 
$ { 
    x 
    s/\n*$// 
    p 
}

運行它想：

sed -nf script.sed infile

而且結果：

A 200 100 
B 300 600 700 
C 400

來源

2012-03-16 09:07:19 Birei

這可能會爲你工作：

sort -sk1,1 file | sed ':a;$!N;s/^\([^ ]*\)\(.*\)\n\1/\1\2/;ta;P;D' 
A 200 100 
B 300 600 700 
C 400

來源

2012-03-16 13:26:36 potong

收集類似列的數據

回答

相關問題