2013-02-19 81 views
0

我組合了2個序列文件,所以我有2個序列的1個文件。我將這兩個序列分成了一個@char數組 - 因爲我後來必須逐字比較它們。但是序列中的1個是兩行。我想使用連接功能來組合2條線,但我不知道如何。Perl:加入函數和數組

例:

SEQ 1

ACGTATATATTATATCTGGCGCTATCGATGCTATCGAT 
CGATGCGCG 

SEQ 2

AGTGAGCGTAGCTAGCGGCGCGATCTAGCTA 

到目前爲止我的代碼

#!usr/bin/perl 
use strict; 
use warnings; 

# open file 1 
open (my $seq1, "<", "file1.fa") or die $!; 
# open file 2 
open (my $seq2, "<", "file2.fa") or die $!; 
# open combined file 
open (my $combined, ">", "combined.txt") or die $!; 

# read file 1, skip header line, write to combined file 
while (my $line = <$seq1>) { 
     if($line =~ />/) { 
       next; 
} 

     else { 
     print $combined "$line\n"; 
} 
} 
# read file 2, skip header line, write to combined file on new line 
while (my $line2 = <$seq2>) { 
     if ($line2 =~ />/) { 
       next; 
} 
     else { 
     print $combined "$line2\n"; 
} 
} 
# need to open combined file for reading 
open (my $combined2, "<", "combined.txt") or die $!; 
# read through combined file line by line 
while (my $seqs = <$combined2>) { 
     chomp($seqs); 
# split sequences into characters 
     my @chars = split(//, $seqs); 
# the sequence from file1 is on 2 separate lines. Need to join these 
# lines together 
+3

你怎麼知道何時加入序列?你怎麼知道它們在兩條線之間斷開,需要合併? – Glenn 2013-02-19 05:55:13

+0

你想製作什麼? – darch 2013-02-19 17:32:17

回答

4

考慮使用Bio::SeqIO閱讀您的FASTA文件,因爲它可以處理一個序列的多條線路:中file1.fa

use strict; 
use warnings; 
use Bio::SeqIO; 

my $in = Bio::SeqIO->new(-file => "file1.fa", '-format' => 'Fasta'); 

while (my $seq = $in->next_seq) { 
    my $sequence = $seq->seq; 
    print $sequence, "\n"; 
} 

內容:

>seq0 
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF 
>seq1 
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME 
LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM 
>seq2 
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK 
>seq3 
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK 

輸出:

FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF 
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLMELKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM 
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK 
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK 
0

我假設你的序列被一個「>」號隔開,這就是爲什麼你你用if($ _ =〜/> /)作爲隊長。如果不是,請回複評論,我會更改代碼。這裏嘗試以下內容:

open (fil1, "<", "file1.fa") or die $!; 
# open file 2 
open (fil2, "<", "file2.fa") or die $!; 
# open combined file 
open (combined, ">", "combined.txt") or die $!; 

# read file 1, skip header line, write to combined file 
while (<fil1>) { 
     if($_ =~ />/) { 
       print $combined "\n"; 
} 

     else { 
     print $combined "$line"; 
} 
} 
# read file 2, skip header line, write to combined file on new line 
while (<fil2>) { 
     if ($_ =~ />/) { 
       print $combined "\n"; 
} 
     else { 
     print $combined "$line2"; 
} 
} 

只要檢出combined.txt,如果在不同的行上有序列。