2016-08-30 50 views
1

bash下面的I loop通過一個目錄並在所有.txt文件上運行grep。我試圖做的是在過濾結果中包含每個文件的標題行。目前,標題顯示在「stdout」中,並且兩個新的過濾文件不帶標題。下面看起來很接近,但我似乎無法在輸出中包含獨特的標題。謝謝你:)。bash to grep文件的匹配但包含唯一標題行

的bash

for file in /home/cmccabe/compare/*.txt ; do 
bname=$(basename $file) 
pref=${bname%%.txt} 
[ "$file" = /home/cmccabe/compare/${pref}_filtered.txt ] && continue 
head -n 1 "$file" 
grep -wFf /home/cmccabe/compare/list $file > /home/cmccabe/compare/${pref}_filtered.txt 
done 

file1的

Index Chromosomal Position Gene  
4 43394661 SLC2A1 
22 166870221 SCN1A 
22 166870952 CBS 

file2的

Chrom Position Gene Symbol Target ID 
chr22 40742831 ADSL AMPL3764590328 
chr22 40745898 ADSL AMPL5177720331 
chr5 125885803 ALDH7A1 AMPL4306766150 
chr5 178555085 FBN1 AMPL4306766155 

列表(用於grep

SLC2A1 
SCN1A 
ADSL 
ALDH7A1 

期望file1_filtered輸出

Index Chromosomal Position Gene 
4 43394661 SLC2A1 
22 166870221 SCN1A 

期望file2_filtered輸出

Chrom Position Gene Symbol Target ID 
chr22 40742831 ADSL AMPL3764590328 
chr22 40745898 ADSL AMPL5177720331 
chr5 125885803 ALDH7A1 AMPL4306766150 

回答

2

隨着GNU grep和bash的過程替代:

grep -wf <(head -n 1 file1; cat list) file1 

輸出:

 
Index Chromosomal Position Gene  
4 43394661 SLC2A1 
22 166870221 SCN1A 

grep -wf <(head -n 1 file2; cat list) file2 

輸出:

 
Chrom Position Gene Symbol Target ID 
chr22 40742831 ADSL AMPL3764590328 
chr22 40745898 ADSL AMPL5177720331 
chr5 125885803 ALDH7A1 AMPL4306766150 
+0

或無進程替換:'head -n 1 file1; grep -wf list file1' – Cyrus

1

你要對這個錯誤的。閱讀why-is-using-a-shell-loop-to-process-text-considered-bad-practice然後只是這樣做:

awk ' 
BEGIN { FS="\t" } 
NR==FNR { genes[$0]; next } 
FNR==1 { 
    close(out) 
    out = FILENAME 
    sub(/\.txt$/,"_filtered&",out) 
    for (i=1; i<=NF; i++) { 
     if ($i == "Gene") { 
      g = i 
     } 
    } 
} 
(FNR==1) || ($g in genes) { print > out } 
' /home/cmccabe/compare/*.txt 

這將是比目前你正在做什麼更穩健,高效,便於攜帶。