我對Linux非常陌生,想用bash/awk找到x = 0.975和y = 0.025(95%置信區間)的x點然後我可以用它來給我寬峯的寬度(數據有點像鐘形曲線)。x,y使用awk指向95%的置信區間
這是一組具有x數據,y值,像這樣(注意:我打算DX增量要小得多獲得了更爲/細點):
0 0
0.100893 0
0.201786 0
0.302679 0
0.403571 0
0.504464 0
0.605357 0
0.70625 0
0.807143 0
0.908036 0
1.00893 0
1.10982 0
1.21071 0
1.31161 0
1.4125 0.00173803
1.51339 0.0186217
1.61429 0.0739904
1.71518 0.211295
1.81607 0.725379
1.91696 2.34137
2.01786 4.69752
2.11875 6.58415
2.21964 6.06771
2.32054 8.57593
2.42143 11.7745
2.52232 12.4957
2.62321 13.0301
2.72411 11.1008
2.825 11.4504
2.92589 12.6537
3.02679 12.1584
3.12768 11.0262
3.22857 6.89166
3.32946 5.88521
3.43036 6.48794
3.53125 5.0121
3.63214 2.70189
3.73304 0.914824
3.83393 0.154436
3.93482 0.0286775
4.03571 0.00533823
4.13661 0.00024829
4.2375 0
4.33839 0
4.43929 0
4.54018 0
4.64107 0
4.74196 0
4.84286 0
4.94375 0
5.04464 0
5.14554 0
5.24643 0
5.34732 0
5.44821 0
5.54911 0
首先我要規範化y數據,所以這些數值加起來總共爲1(基本上給了我在y點找到x的概率)。
然後,我想確定標記數據集95%置信區間開始和結束的x值。我解決這個問題的方法是使用第2列y值的運行總和,然後通過這種方式執行runsum/'sum'
,這些值應該從0-1填充(見下文)。 (注:我用柱-t清理輸出一點點)
sum=$(awk 'BEGIN {sum=0} {sum+=$2} END {print sum}' mydata.txt)
awk '{runsum += $2} ; {if (runsum!=0) {print $0,$2/'$sum',runsum/'$sum'} else{print $0,"0","0"}}' mydata.txt | column -t
這給:
0 0 0 0
0.100893 0 0 0
0.201786 0 0 0
0.302679 0 0 0
0.403571 0 0 0
0.504464 0 0 0
0.605357 0 0 0
0.70625 0 0 0
0.807143 0 0 0
0.908036 0 0 0
1.00893 0 0 0
1.10982 0 0 0
1.21071 0 0 0
1.31161 0.00136559 8.92134e-06 8.92134e-06
1.4125 0.0259463 0.000169506 0.000178427
1.51339 0.159775 0.0010438 0.00122223
1.61429 0.552197 0.00360748 0.00482971
1.71518 1.2808 0.00836741 0.0131971
1.81607 2.20568 0.0144096 0.0276067
1.91696 3.29257 0.0215102 0.049117
2.01786 4.27381 0.0279206 0.0770376
2.11875 7.10469 0.0464146 0.123452
2.21964 9.56549 0.062491 0.185943
2.32054 11.3959 0.0744489 0.260392
2.42143 8.16116 0.0533165 0.313709
2.52232 9.08145 0.0593287 0.373037
2.62321 9.3105 0.0608251 0.433863
2.72411 10.8084 0.0706108 0.504473
2.825 10.4597 0.0683328 0.572806
2.92589 9.81763 0.0641382 0.636944
3.02679 9.06295 0.0592079 0.696152
3.12768 8.84222 0.0577659 0.753918
3.22857 10.285 0.0671915 0.82111
3.32946 8.37618 0.0547212 0.875831
3.43036 7.02052 0.0458648 0.921696
3.53125 4.82589 0.0315273 0.953223
3.63214 3.39214 0.0221607 0.975384
3.73304 2.2402 0.0146351 0.990019
3.83393 1.06194 0.00693761 0.996956
3.93482 0.350213 0.00228793 0.999244
4.03571 0.091619 0.000598543 0.999843
4.13661 0.0217254 0.000141931 0.999985
4.2375 0.00211046 1.37875e-05 0.999999
4.33839 0 0 0.999999
4.43929 0 0 0.999999
4.54018 0 0 0.999999
4.64107 0 0 0.999999
4.74196 0 0 0.999999
4.84286 0 0 0.999999
4.94375 0 0 0.999999
5.04464 0 0 0.999999
5.14554 0 0 0.999999
5.24643 0 0 0.999999
5.34732 0 0 0.999999
5.44821 0 0 0.999999
5.54911 0 0 0.999999
我想我可以用它來尋找X點,其中y = 0.975和y = .025並解決我的問題,但你們知道更優雅的方式,這是做我認爲是什麼?
爲什麼選擇awk?我不想開始「我的腳本語言比你的更好」的辯論,但我認爲你可能會更容易地使用perl,python或ruby的更多東西來做這件事...... – jwatkins 2014-10-22 01:58:25
說實話,就像我說的我對腳本非常陌生,這正是我所熟悉的:) – Patrick 2014-10-22 02:00:05
夠公平的!我自己做了很多「不要做的事」,所以爲什麼不呢? :D – jwatkins 2014-10-22 02:09:25