2011-06-01 88 views
-1

我有一堆數據收集的,例如:AWK,濾波器的特定信息:時間,平均和β

1.00 3 4 
1.00 0 1 
51.00 1 4 
84.00 3 4 
95.00 0 2 
110.00 2 4 
120.00 0 1 
121.00 1 2 
124.00 2 4 
158.00 3 4 
159.00 1 3 
172.00 0 4 
214.00 0 4 
223.00 2 4 
224.00 1 2 
228.00 1 4 
229.00 0 1 
232.00 2 3 
233.00 3 4 
233.00 1 3 
246.00 0 2 
292.00 0 3 
294.00 0 4 
294.00 2 4 
294.00 3 4 
318.00 1 2 
331.00 0 1 
383.00 2 4 
402.00 3 4 

然後,我要生成的輸出是這樣的:

node_src node_dst time_repeated time1 time2 ... average_time ß 

細節:

*node_src = 2nd column 
*node_dst = 3rd column 
*time_repeated = the number of the same line is repeated, example 3 4 is repeated 5 time 
*time1, time2 .. = are the value of column 1 
*average_time = the average time for the different interval, 
example see below, 
*ß = time_repeated/average_time 

我試圖產生這樣的結果:

node1 node2 nbrepeated time1 time2 time3 time4 time5 time6 time7 average ß 
2  4  6   110.0 124.0 223.0 294.0 383.0 461.0 543.0 6.0  0   
2  3  1   232.0 402.0 0.0 0.0 0.0 0.0 0.0 1.0  0  
1  3  2   159.0 233.0 521.0 0.0 0.0 0.0 0.0 2.0  4  
1  2  4   121.0 224.0 318.0 461.0 573.0 0.0 0.0 4.0  5  
0  4  4   172.0 214.0 294.0 415.0 543.0 0.0 0.0 4.0  5  
0  2  5   95.0 246.0 415.0 536.0 572.0 588.0 0.0 5.0  :  
0  3  3   292.0 403.0 455.0 588.0 0.0 0.0 0.0 3.0  :  
1  4  2   51.0 228.0 494.0 0.0 0.0 0.0 0.0 2.0  :  
0  1  4   1.0 120.0 229.0 331.0 536.0 0.0 0.0 4.0  :  
3  4  6   1.0 84.0 158.0 233.0 294.0 402.0 431.0 6.0  : 

我無法細的平均時間和SS由於計算 找到的平均時間複雜度是這樣的:

121.0 224.0 318.0 461.0 573.0 

    avg_time = ((224-121)+(318-224)+(461-318)+(573-461))/4 

這裏的挑戰是使其動態,因爲數時間字段是未知... 使用bash做...

這裏是代碼,感謝格倫·傑克曼

#!/bin/bash 


declare -A t 

while read tm f1 f2; do 
    t["$f1:$f2"]+=" $tm" 
done < $1 

max=0 
for key in "${!t[@]}"; do 
    set -- ${t[$key]} 
    [[ $# -gt $max ]] && max=$# 
done 

{ 
    printf "field1 field2 nbrepeated" 
    for i in $(seq $max); do printf " %s" time$i; done 
    echo " average_time beta" 


    for key in "${!t[@]}"; do 
     f1=${key%:*} 
     f2=${key#*:} 
     set -- ${t[$key]} 
     f3=$(($# - 1)) 
     f4=$(($# - 1)) 
    f5= 1 
     printf "%d %d %d" $f1 $f2 $f3 
     for i in $(seq $max); do 
      printf " %.1f" ${1-0} 
      shift 
      done 
     printf " %.1f %.1f" $f4 $f5 


     echo "" 

    done 
} | column -t 

修改需要做:

  1. 發現的平均時間:avg_time
  2. 找到公測

P/S:通常找到的平均時間,人們做:sum/NR,但它是不是我的問題

情況下的情況下解決:這裏是輸出

field1 field2 nbrepeated time1 time2 time3 time4 time5 time6 time7 average_time beta 
2  4  6   110.0 124.0 223.0 294.0 383.0 461.0 543.0 72.16   0.08 
2  3  1   232.0 402.0 0.0 0.0 0.0 0.0 0.0 170.00  0.00 
1  3  2   159.0 233.0 521.0 0.0 0.0 0.0 0.0 181.00  0.01 
1  2  4   121.0 224.0 318.0 461.0 573.0 0.0 0.0 113.00  0.03 
+0

後續問題http://stackoverflow.com/questions/6198882/awk-calculate-the-average-for-different-interval-of-time – daxim 2011-06-01 10:27:43

+0

稍微有點代碼以上我的口味,你可以縮小這個問題,包括你曾經嘗試過的? – 2011-06-01 10:39:36

+0

也是這樣的:http://stackoverflow.com/questions/6185305/awk-regroup-by-lines-pattern – ex001 2011-06-01 10:42:11

回答

1

首先,請注意平均公式可以簡化。例如:

121.0 224.0 318.0 461.0 573.0 
= (573.0 - 121.0)/4 

我已經加入以下部分來計算平均值和β:

avg=0 
beta=0 
if [ $f3 -ne 0 ] 
then 
    total=$(bc<<<${@: -1}-$1) 
    avg=$(bc<<<"scale=2;$total/$f3") 
    beta=$(bc<<<"scale=2;$f3/$avg") 
fi 

完整腳本變爲:

declare -A t 

while read tm f1 f2; do 
    t["$f1:$f2"]+=" $tm" 
done < f.txt 

max=0 
for key in "${!t[@]}"; do 
    set -- ${t[$key]} 
    [[ $# -gt $max ]] && max=$# 
done 

{ 
    printf "field1 field2 nbrepeated" 
    for i in $(seq $max); do printf " %s" time$i; done 
    echo " average_time beta" 


    for key in "${!t[@]}"; do 
     f1=${key%:*} 
     f2=${key#*:} 
     set -- ${t[$key]} 
    f3=$(($# - 1)) 

    avg=0 
    beta=0 
     # don't want to divide by zero if we have only one time 
    if [ $f3 -ne 0 ] 
    then 
     total=$(bc<<<${@: -1}-$1) 
     avg=$(bc<<<"scale=2;$total/$f3") 
     beta=$(bc<<<"scale=2;$f3/$avg") 
    fi 

     printf "%d %d %d" $f1 $f2 $f3 
     for i in $(seq $max); do 
      printf " %.1f" ${1-0} 
      shift 
     done 

    printf " %.2f %.2f" $avg $beta 


     echo "" 

    done 
} | column -t 

輸出

field1 field2 nbrepeated time1 time2 time3 time4 time5 time6 average_time beta 
2  4  4   110.0 124.0 223.0 294.0 383.0 0.0 68.25   0.05 
2  3  0   232.0 0.0 0.0 0.0 0.0 0.0 0.00   0.00 
1  3  1   159.0 233.0 0.0 0.0 0.0 0.0 74.00   0.01 
1  2  2   121.0 224.0 318.0 0.0 0.0 0.0 98.50   0.02 
0  4  2   172.0 214.0 294.0 0.0 0.0 0.0 61.00   0.03 
0  2  1   95.0 246.0 0.0 0.0 0.0 0.0 151.00  0.00 
0  3  0   292.0 0.0 0.0 0.0 0.0 0.0 0.00   0.00 
1  4  1   51.0 228.0 0.0 0.0 0.0 0.0 177.00  0.00 
0  1  3   1.0 120.0 229.0 331.0 0.0 0.0 110.00  0.02 
3  4  5   1.0 84.0 158.0 233.0 294.0 402.0 80.20   0.06 
+0

非常感謝.. 我可以讓您發佈此代碼嗎?提前致謝.. – ex001 2011-06-01 11:02:49