我試圖與它發生在File1
的this和this問題,即在File2
每串/線路匹配的主題結合起來(每串僅發生一次)匹配同時打印出現在File2
上的整行,同時打印每個匹配之間的行(即序列號爲File2
)。AWK/SED:文件和打印一切的匹配模式之間
文件1
>GAXI01000525.151.1950 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Hexapoda;Ellipura;Collembola;Tetrodontophora bielanensis (giant springtail)
CCUGGUUGAUCCUGCCAGUAGUCAUAUGCUUGUCUCAAA
GAUUAAGCCAUGCAUGUCUAAGUUCAAGCAAAAAUAAAG
ACCGCGAAUGGCUCAUUAUAUCAGUUAUGGUUCCUUAGA
ACUUACUACUUGGAUAACUGUGGUAAUUCUAGAGCUAAU
>GAXI01000526.151.1950 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Hexapoda;Ellipura;Collembola;Tetrodontophora bielanensis (giant springtail)
CCUGGUUGAUCCUGCCAGUAGUCAUAUGCUUGUCUCAAAGAU
UAAGCCAUGCAUGUCUAAGUUCAAGCAAAAAUAAAGUGAAAC
>GAXI01005455.1.1233 Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Chryseobacterium;Tetrodontophora bielanensis (giant springtail)
CUUUCGAAAGGAAGAUUAAUACCCCAUAACAUA
>GAXI01006199.29.1525 Bacteria;Chlamydiae;Chlamydiae;Chlamydiales;Simkaniaceae;Candidatus Rhabdochlamydia;Tetrodontophora bielanensis (giant springtail)
AGAAUUUGAUCUUGGUUCAGAUUGAAUGCUGG
UGCAAGUCGAACGAAGCUAGAGGGCAACCUCU
文件2
>GAXI01000525.151.1950
>GAXI01006199.29.1525
我到目前爲止有:
awk 'FNR==NR{a[$0];next} $1 in a' file2 file1 > output
這給:
>GAXI01000525.151.1950 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Hexapoda;Ellipura;Collembola;Tetrodontophora bielanensis (giant springtail)
>GAXI01006199.29.1525 Bacteria;Chlamydiae;Chlamydiae;Chlamydiales;Simkaniaceae;Candidatus Rhabdochlamydia;Tetrodontophora bielanensis (giant springtail)
我想這樣的:
>GAXI01000525.151.1950 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Hexapoda;Ellipura;Collembola;Tetrodontophora bielanensis (giant springtail)
CCUGGUUGAUCCUGCCAGUAGUCAUAUGCUUGUCUCAAA
GAUUAAGCCAUGCAUGUCUAAGUUCAAGCAAAAAUAAAG
ACCGCGAAUGGCUCAUUAUAUCAGUUAUGGUUCCUUAGA
ACUUACUACUUGGAUAACUGUGGUAAUUCUAGAGCUAAU
>GAXI01006199.29.1525 Bacteria;Chlamydiae;Chlamydiae;Chlamydiales;Simkaniaceae;Candidatus Rhabdochlamydia;Tetrodontophora bielanensis (giant springtail)
AGAAUUUGAUCUUGGUUCAGAUUGAAUGCUGG
UGCAAGUCGAACGAAGCUAGAGGGCAACCUCU
原始文件包含數千行,以便以最快的速度解決方案表示讚賞,無論是AWK,sed的或其他任何東西......
恕我直言,科學家們可以使用專爲他們工作而設計的工具獲得更快更好的結果。使用像https://metacpan.org/release/BioPerl和/或https://metacpan.org/release/FAST這樣的工具肯定可以更有效地實現目標...... – jm666
當然,雖然我沒有這樣做每天查詢。 –