看來在線程中使用管道可能會導致線程變成殭屍。事實上,管道中的命令被轉化爲殭屍,而不是線程。這不會發生很煩人的時間,因爲很難找出真正的問題。如何處理這個問題?是什麼導致這些?它與管道有關嗎?如何避免這種情況?如何處理perl中變成殭屍的多線程
以下是創建示例文件的代碼。
#buildTest.pl
use strict;
use warnings;
sub generateChrs{
my ($outfile, $num, $range)[email protected]_;
open OUTPUT, "|gzip>$outfile";
my @set=('A','T','C','G');
my $cnt=0;
while ($cnt<$num) {
# body...
my $pos=int(rand($range));
my $str = join '' => map $set[rand @set], 1 .. rand(200)+1;
print OUTPUT "$cnt\t$pos\t$str\n";
$cnt++
}
close OUTPUT;
}
sub new_chr{
my @chrs=1..22;
push @chrs,("X","Y","M", "Other");
return @chrs;
}
for my $chr (&new_chr){
generateChrs("$chr.gz",50000,100000)
}
以下代碼會偶爾創建殭屍線程。原因或觸發因素仍然未知。
#paralRM.pl
use strict;
use threads;
use Thread::Semaphore;
my $s = Thread::Semaphore->new(10);
sub rmDup{
my $reads_chr=$_[0];
print "remove duplication $reads_chr START TIME: ",`date`;
return 0 if(!-s $reads_chr);
my $dup_removed_file=$reads_chr . ".rm.gz";
$s->down();
open READCHR, "gunzip -c $reads_chr |sort -n -k2 |" or die "Error: cannot open $reads_chr";
open OUTPUT, "|sort -k4 -n|gzip>$dup_removed_file";
my ($last_id, $last_pos, $last_reads)=split('\t',<READCHR>);
chomp($last_reads);
my $last_length=length($last_reads);
my $removalCnts=0;
while (<READCHR>) {
chomp;
my @line=split('\t',$_);
my ($id, $pos, $reads)[email protected];
my $cur_length=length($reads);
if($last_pos==$pos){
#may dup
if($cur_length>$last_length){
($last_id, $last_pos, $last_reads)[email protected];
$last_length=$cur_length;
}
$removalCnts++;
next;
}else{
#not dup
}
print OUTPUT join("\t",$last_id, $last_pos, $last_reads, $last_length, "\n");
($last_id, $last_pos, $last_reads)[email protected];
$last_length=$cur_length;
}
print OUTPUT join("\t",$last_id, $last_pos, $last_reads, $last_length, "\n");
close OUTPUT;
close READCHR;
$s->up();
print "remove duplication $reads_chr END TIME: ",`date`;
#unlink("$reads_chr")
return $removalCnts;
}
sub parallelRMdup{
my @[email protected]_;
my %jobs;
my @removedCnts;
my @processing;
foreach my $chr(@chrs){
while (${$s}<=0) {
# body...
sleep 10;
}
$jobs{$chr}=async {
return &rmDup("$chr.gz")
}
push @processing, $chr;
};
#wait for all threads finish
foreach my $chr(@processing){
push @removedCnts, $jobs{$chr}->join();
}
}
sub new_chr{
my @chrs=1..22;
push @chrs,("X","Y","M", "Other");
return @chrs;
}
¶llelRMdup(&new_chr);
是否所有的線程都報告了合理的開始和結束時間?但是我看不到任何明顯錯誤的代碼,可能導致線程無法連接。但是,有一些不好的做法:①你在'async'塊之後錯過了一個分號嗎? ②產卵時不要忙於等待。並且不要取消引用Semaphore對象。相反,你可以在發出信號之前「下」信號量,但是在線程結束時「上升」會好得多。 ③您應該以編程方式聲明所有'@ chrs'都是唯一的,否則您將只加入'$ chr'的最後一個線程。 – amon 2013-05-09 07:43:52
殭屍是在管道中創建的(排序,gzip等)。謝謝你的建議。我學到了很多! – Gahoo 2013-05-09 10:53:58