2010-09-09 226 views
0

我有一點小小的應用程序,並試圖添加多線程。這裏是代碼(MyMech是用來處理HTTP錯誤WWW ::機械化子類):線程應用程序意外終止

#!/usr/bin/perl 

use strict; 
use MyMech; 
use File::Basename; 
use File::Path; 
use HTML::Entities; 
use threads; 
use threads::shared; 
use Thread::Queue; 
use List::Util qw(max sum); 

my $page = 1; 
my %CONFIG = read_config(); 

my $mech = MyMech->new(autocheck => 1); 
$mech->quiet(0); 

$mech->get($CONFIG{BASE_URL} . "/site-map.php"); 

my @championship_links = 
    $mech->find_all_links(url_regex => qr/\d{4}-\d{4}\/$/); 

foreach my $championship_link (@championship_links) { 

    my @threads; 

    my $queue   = Thread::Queue->new; 
    my $queue_processed = Thread::Queue->new; 

    my $url = sprintf $championship_link->url_abs(); 

    print $url, "\n"; 

    next unless $url =~ m{soccer}i; 

    $mech->get($url); 

    my ($last_round_loaded, $current_round) = 
     find_current_round($mech->content()); 

    unless ($last_round_loaded) { 

     print "\tLoading rounds data...\n"; 

     $mech->submit_form(

      form_id => "leagueForm", 
      fields => { 

       round => $current_round, 
      }, 
     ); 
    } 

    my @match_links = 
     $mech->find_all_links(url_regex => qr/matchdetails\.php\?matchid=\d+$/); 

    foreach my $link (@match_links) { 

     $queue->enqueue($link); 
    } 

    print "Starting printing thread...\n"; 

    my $printing_thread = threads->create(
     sub { printing_thread(scalar(@match_links), $queue_processed) }) 
     ->detach; 

    push @threads, $printing_thread; 

    print "Starting threads...\n"; 

    foreach my $thread_id (1 .. $CONFIG{NUMBER_OF_THREADS}) { 

     my $thread = threads->create(
      sub { scrape_match($thread_id, $queue, $queue_processed) }) 
      ->join; 
     push @threads, $thread; 
    } 

    undef $queue; 
    undef $queue_processed; 

    foreach my $thread (threads->list()) { 

     if ($thread->is_running()) { 

      print $thread->tid(), "\n"; 
     } 
    } 

    #sleep 5; 
} 

print "Finished!\n"; 

sub printing_thread { 

    my ($number_of_matches, $queue_processed) = @_; 

    my @fields = 
     qw (
      championship 
      year 
      receiving_team 
      visiting_team 
      score 
      average_home 
      average_draw 
      average_away 
      max_home 
      max_draw 
      max_away 
      date 
      url 
     ); 

    while ($number_of_matches) { 

     if (my $match = $queue_processed->dequeue_nb) { 

      open my $fh, ">>:encoding(UTF-8)", $CONFIG{RESULT_FILE} or die $!; 

      print $fh join("\t", @{$match}{@fields}), "\n"; 
      close $fh; 

      $number_of_matches--; 
     } 
    } 

    threads->exit(); 
} 

sub scrape_match { 

    my ($thread_id, $queue, $queue_processed) = @_; 

    while (my $match_link = $queue->dequeue_nb) { 

     my $url = sprintf $match_link->url_abs(); 

     print "\t$url", "\n"; 

     my $mech = MyMech->new(autocheck => 1); 
     $mech->quiet(0); 

     $mech->get($url); 

     my $match = parse_match($mech->content()); 
     $match->{url} = $url; 

     $queue_processed->enqueue($match); 
    } 

    return 1; 
} 

,我有一些奇怪的事情,此代碼。有時它運行,但有時它會退出而沒有錯誤(在->detach點)。我知道@match_links包含數據,但線程不會被創建,它只是關閉。通常在處理第二個$championship_link條目後終止。

可能是我做錯了什麼?

更新 這裏是find_current_round子程序代碼(但我敢肯定它是不相關的問題):

sub find_current_round { 

    my ($html) = @_; 

    my ($select_html) = $html =~ m{ 

    <select\s+name="round"[^>]+>\s* 
    (.+?) 
    </select> 
    }isx; 

    my ($option_html, $current_round) = $select_html =~ m{ 

    (<option\s+value="\d+"(?:\s+ selected="selected")?>(\d+)</option>)\Z 
    }isx; 

    my ($last_round_loaded) = $option_html =~ m{selected}; 

    return ($last_round_loaded, $current_round); 
} 
+0

您的代碼缺少'find_current_round'子例程。你可以發佈它嗎? – Zaid 2010-09-09 14:00:24

+0

@Zaid:我已經發布了find_current_round的代碼。 – gangabass 2010-09-10 00:34:46

回答

0

第一關 - 不使用dequeue_nb()。這是一個壞主意,因爲如果一個隊列暫時爲空,它會返回undef,並且你的線程將退出。

改爲使用dequeue和和enddequeue會阻止,但是一旦你的隊列中有你的隊列,就會退出。

你也在用你的線程做一些非常奇怪的事情 - 我建議你很少想要detach一個線程。你只是假設你的線程將在你的程序之前完成,這不是一個好的計劃。

同樣,

my $thread = threads->create(
     sub { scrape_match($thread_id, $queue, $queue_processed) }) 
     ->join; 

你生成一個線程,然後立即加入它。因此,join調用將...阻止您的線程退出。你根本不需要線程來做到這一點......

你也可以在你的foreach循環中定義你的隊列。我不認爲這是一個好計劃。我會建議,而不是 - 在外部範圍,併產生定義數量的'工人'線程(和一個「打印」線程)。

然後通過隊列機制提供它們。否則,你最終會創建多個隊列實例,因爲它們是詞法範圍的。

而一旦你已經完成排隊的東西,發出$queue -> end這將終止while循環。

您也不需要給線程$thread_id,因爲他們已經有一個。請嘗試:threads -> self -> tid();