組合和多個文件和組織在bash

基於列的數據我想115個文件從一個目錄結合在一起。這裏的文件是什麼樣子的例如：組合和多個文件和組織在bash

文件中的一個

 
Phenotype Marker Value1 Value2 Value3 
P1   1:54390 0.2948 0.4837 0.2198 
P2   1:54390 0.3482 0.6583 0.1937 
P3   1:54390 0.1983 0.1837 0.4177 
P4   1:54390 0.9128 0.9930 0.0043 
P5   1:54390 0.1938 0.0109 0.6573 
P1   1:69402 0.2039 0.2340 0.2346 
P2   1:69402 0.0239 0.3545 0.1987 
P3   1:69402 0.8239 0.8677 0.4177 
P4   1:69402 0.2498 0.3099 0.0765 
P5   1:69402 0.0982 0.0198 0.

文件中的兩個

 
Phenotype Marker Value1 Value2 Value3 
P1   9:21048 0.8568 0.1231 0.1654 
P2   9:21048 0.1244 0.3213 0.1223 
P3   9:21048 0.9869 0.1231 0.4776 
P4   9:21048 0.3543 0.7657 0.0033 
P5   9:21048 0.1231 0.3213 0.8578 
P1   9:87758 0.1231 0.8768 0.4653 
P2   9:87758 0.7657 0.5435 0.8845 
P3   9:87758 0.9879 0.8437 0.7464 
P4   9:87758 0.1231 0.9879 0.5523 
P5   9:87758 0.9879 0.9868 0.0006

所以基本上每一個文件都有一組獨特的標記，所有每個地方表型（P1，P2，P3，P4，P5）與它們匹配。

有兩件事情：

答：我想一個文件看起來像這樣（如下圖），其中數據由表型組織

 
Phenotype Marker Value1 Value2 Value3 
P1   1:54390 0.2948 0.4837 0.2198 
P1   1:69402 0.2039 0.2340 0.2346 
P1   9:21048 0.8568 0.1231 0.1654 
P1   9:87758 0.1231 0.8768 0.4653 
P2   1:54390 0.3482 0.6583 0.1937 
P2   1:69402 0.0239 0.3545 0.1987 
P2   9:21048 0.1244 0.3213 0.1223  
P3   1:54390 0.1983 0.1837 0.4177 
P3   1:69402 0.8239 0.8677 0.4177 
P3   9:21048 0.9869 0.1231 0.4776 
P3   9:87758 0.9879 0.8437 0.7464 
P4   1:54390 0.9128 0.9930 0.0043 
P4   1:69402 0.2498 0.3099 0.0765 
P4   9:21048 0.3543 0.7657 0.0033 
P4   9:87758 0.1231 0.9879 0.5523 
P5   1:54390 0.1938 0.0109 0.6573 
P5   1:69402 0.0982 0.0198 0.
P5   9:21048 0.1231 0.3213 0.8578 
P5   9:87758 0.9879 0.9868 0.0006

我想這樣做是bash。任何人都可以提供一些見解嗎？我是這個語言很新！

B.一旦我有了這個巨大的文件，我也想節省基於表型獨立的文件（我打算做一些質量控制的中間步驟），所以我將有5個文件P1 ，P2，P3，P4，和與它們各自的數據P5在其它列）

來源

2013-05-07 Sheila

要解決A，你可以使用spiehr提出的方法。爲了解決B：

# Name of your big merged file 
BIG_FILE='...' 


TYPES='P1 P2 P3 P4 P5'  
for T in $TYPES; do 
    # Will reduce the input file to 
    # all lines starting with $T, which is one of P1, P2 etc., 
    # and write them to a file named accordingly 
    grep "^$T" $BIG_FILE > file_$T 
done

來源

2013-05-07 19:53:44

爲了得到標題，帶有列標題：

head -1 > tmpfile

數據可以插入這樣的：

for file in $(ls); do 
    tail -n +2 ${file} >> tmpfile2 
done 
sort tmpfile2 >> tmpfile 
rm tmpfile2

TMPFILE將與所有數據文件。而不是寫$（ls），你可以添加另一個linux命令，即列出你所有的相關文件。

對於第一列掌握「P3」只有條目，你可以使用grep：

grep '^P3' tmpfile | cut -f1 --complement

cut命令用來切出的第一個條目，你也許並不需要它了。

來源

2013-05-07 19:30:28 spiehr

這並不涵蓋表型 – 2013-05-07 19:43:30

現在排序是這樣，那種錯字...... – spiehr 2013-05-07 19:45:03

使用'在文件*;做'而不是調用'ls'。 – chepner 2013-05-07 21:55:07

#!awk -f 
{ 
    /Phenotype/ ? hd=$0 : rw[$0] 
} 
END { 
    print hd 
    PROCINFO["sorted_in"] = "@ind_str_asc" 
    for (each in rw) print each 
}

來源

2013-05-07 19:33:40

我會寫的第一步是

{ 
    sed 1q file1 
    sed 1d * | sort 
} > file_all

然後

awk ' 
    FNR == 1 {head = $0; next} 
    !seen[$1]++ {print head > $1} 
    {print > $1} 
' file_all

這導致在名爲「P1」，「P2」文件等

來源

2013-05-08 02:30:26

組合和多個文件和組織在bash

回答

相關問題