2017-02-20 58 views
1

我試圖將多個文件合併到一個具有相似命名模式的文件夾中,而不管該目錄如何。如何匹配具有相同名稱的文件並將它們合併到shell腳本中?

文件結構如下:

20170219-A20-L1-AB1234_S1_R1_001.txt 
20170211-B21-L3-AB1234-2_S1_R1_001.txt 
20170210-C20-L1-AB1234-3_S1_R1_001.txt 
20170211-B21-L3-AB1234-2_S2_R1_001.txt 
20170210-C20-L1-AB1234-3_S2_R1_001.txt 

我的準則是要找出包含_S1_S2的文件,所有的_S1文件到一個新的單個文件和所有_S2文件合併成一個新的單個文件。

我的預期輸出可能是20170219-B21-L3-AB1234-2_S1_R1_001_merge.txt20170219-B21-L3-AB1234-2_S2_R1_001_merge.txt。我對合並的文件名沒有任何特定要求,但我希望這些合併的文件位於同一個文件夾中。

我一直在嘗試使用grepcut命令,但我的for循環無法正常工作。我發現很難理解shell中的正則表達式。

請幫助我構建邏輯。

+4

如果您發佈您的代碼,我們可以更好地幫助您進行調試。 – Guest

+1

另外,「不工作」與你可以得到的一樣毫無用處 –

+1

@anon編輯使得它看起來像HTML標籤是文件結構的一部分......可能不是這個意圖。 –

回答

4

無論是以前的解決方案是合適的,但不會從其他目錄合併的任何文件。要重新創建你的問題我做了以下內容,然後試圖去解決它按照您的初始請求:

按您的要求創建的文件:

$ touch $(date +%Y%m%d)_{A,B}{20,21}_L{1,3}_AB1234_{1,3}_S{1,2}_R1_001.txt 
$ touch $(date +%Y%m%d)_{A,B}{20,21}_L{1,3}_AB1234_S{1,2}_R1_001.txt 
$ ls | wc -l 
48 

創建一個參數myText與48線隨機文本生成Lorem存有:

$ echo "${myText}" | wc -l 
    48 

都給從myText每一行中的每個文件之一:

$ ls -t1 | awk '{print NR" "$0}' | while read i j; do echo "${myText}" | awk -v var=${i} 'NR==var {print}' >> ${j}; done 
$ for i in `ls -t1`; do echo -n " ${i}: "; cat ${i}; done 
20170219_B21_L3_AB1234_3_S1_R1_001.txt: This is additional line two 
20170219_B21_L3_AB1234_3_S2_R1_001.txt: line three 
... 
20170219_A20_L3_AB1234_S1_R1_001.txt: Phasellus ut quam eu lacus aliquet vehicula. 
20170219_A20_L1_AB1234_S1_R1_001.txt: Proin nec orci accumsan, pharetra sapien sed, gravida arcu. 
20170219_B21_L3_AB1234_S2_R1_001.txt: Lorem ipsum dolor sit amet, consectetur adipiscing elit 

然後我合併了所有... S1 ...和... S2 ...文件(這會發現任何符合我的標準並從我的主目錄下來的文件;附加,而不是覆蓋,使用cat >> file代替cat > file - 這取決於如果文件腳本之前清理需要重新運行):

$ find ~ -type f -iname "[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[A,B]*S1*" -exec cat > AB1234_S1_R1_001_merged.txt {} + 
$ find ~ -type f -iname "[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[A,B]*S2*" -exec cat > AB1234_S2_R1_001_merged.txt {} + 

結果:

$ for i in `ls | grep merged`; do echo; echo "--- ${i} ---"; cat ${i}; done 

--- AB1234_S1_R1_001_merged.txt --- 
Donec et ante tempor, hendrerit est ut, egestas massa. 
Donec laoreet erat a sapien finibus venenatis. 
Etiam eget urna eu ipsum dapibus aliquet. 
Phasellus ut quam eu lacus aliquet vehicula. 
Phasellus sed lorem ac odio rutrum vehicula. 
Aliquam ac eros ut risus fringilla fringilla. 
Curabitur a purus ultricies sem venenatis auctor. 
Praesent dignissim justo non diam ultrices, nec fermentum lectus dictum. 
Donec imperdiet mi sit amet quam iaculis rhoncus. 
Nam vitae neque vehicula, consectetur dui porttitor, placerat libero. 
Nulla eget diam iaculis augue interdum posuere. 
Fusce a diam ac neque accumsan sagittis. 
Sed feugiat mi eget augue euismod, et laoreet urna dictum. 
This is additional line two 
Vestibulum egestas tellus non justo fringilla viverra eget eu neque. 
Aliquam porttitor nisi nec laoreet vestibulum. 
Donec congue diam ut leo commodo mattis. 
Quisque egestas odio sit amet diam efficitur, non accumsan magna blandit. 
Donec convallis metus at iaculis pellentesque. 
Nam a ligula venenatis, consectetur lectus et, dictum erat. 
Proin nec orci accumsan, pharetra sapien sed, gravida arcu. 
Curabitur volutpat nibh nec leo tempus, at sagittis lacus euismod. 
Mauris blandit sem ac lectus varius lobortis. 
In eu ipsum et felis lobortis dictum. 

--- AB1234_S2_R1_001_merged.txt --- 
Aenean id orci sit amet lacus tincidunt molestie. 
Duis pretium tellus dapibus lorem rhoncus, at tincidunt mauris pellentesque. 
Integer hendrerit mauris sit amet nunc aliquam, id congue justo pulvinar. 
Praesent dapibus augue ac enim consequat, vitae feugiat enim scelerisque. 
This is additional line one 
Sed sit amet dolor accumsan, commodo magna at, aliquet neque. 
Quisque porttitor sapien sed orci vulputate, ac porta ante sollicitudin. 
In malesuada leo sit amet purus accumsan porttitor commodo eu eros. 
Integer ut odio elementum, viverra velit at, molestie nulla. 
Suspendisse suscipit lorem id suscipit consectetur. 
Donec vulputate nibh eget imperdiet volutpat. 
Curabitur sit amet libero eget nulla viverra iaculis sit amet eget eros. 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Maecenas imperdiet nisl quis arcu blandit, sed pretium mi auctor. 
Sed sit amet nunc faucibus, ultricies elit quis, sodales magna. 
Nulla pharetra mauris eu quam sollicitudin ornare in et metus. 
Ut convallis nibh in tempus fringilla. 
In ornare erat quis sodales hendrerit. 
Phasellus molestie erat commodo est venenatis, ullamcorper tempus elit hendrerit. 
Nam mollis ante in purus suscipit, quis facilisis risus efficitur. 
Integer pellentesque sem eget diam ultrices, eget vulputate ante pharetra. 
Mauris ac nisl vitae sapien lacinia ornare nec nec felis. 
line three 
Sed dapibus ipsum eu purus interdum, at varius libero ornare. 

做了這樣的回答這個問題?

2

事情是這樣的:

#!/bin/bash 

for i in 'S1' 'S2' 
do 
    cat *_"$i"_R[0-9]*_[0-9]*.txt > "$i".txt 
done 

使用在for聲明(S1 & S2在這種情況下)給出的列表,貓使用正則表達式模式和輸出發送到一個文件中的每個元素的文件列表。合併的輸出文件將是S1.txtS2.txt。如果需要,您可以使用正則表達式來使其更加嚴格。

1

下面將幫助:如果要搜索的文件在工作目錄

cat *_s1* > 20170219-B21-L3-AB1234-2_S1_R1_001_merge.txt 
cat *_s2* > 20170219-B21-L3-AB1234-2_S2_R1_001_merge.txt