我有下面的腳本,我的學校CS部門獲取所有課程的清單。我希望能夠提取CRN(課程編號)和其他重要信息,以便將其放入數據庫中,以便用戶瀏覽Web應用程序。Web :: Scraper和Perl
下面是一個例子網址: http://courses.illinois.edu/cis/2011/spring/schedule/CS/411.html
我想從這樣的網頁中提取信息。刮板的第一層只是從所有課程列表中構建出各個站點。一旦我在課程特定的目錄頁面上,我使用第二個刮板試圖獲取我想要的所有信息。出於某種原因,儘管CRN和課程導師都是'td'元素。刮刀時,我的刮刀似乎沒有任何回報。我試圖專門爲'div'刮,而我爲每個相關頁面獲得一堆信息。所以不知何故,我無法獲得'td'元素,但是我正在從正確的頁面中刪除。
my $tweets = scraper {
# Parse all LIs with the class "status", store them into a resulting
# array 'tweets'. We embed another scraper for each tweet.
# process "h4.ws-ds-name.detail-title", "array[]" => 'TEXT';
process "div.ws-row", "array[]" => 'TEXT';
};
my $res = $tweets->scrape(URI- >new("http://courses.illinois.edu/cis/2011/spring/schedule/CS/index.html?skinId=2169"));
foreach my $elem (@{$res->{array}}){
my $coursenum = substr($elem,2,4);
my $secondLevel = scraper{
process "td.ws-row", "array2[]" => 'TEXT';
};
my $res2 = $secondLevel->scrape(URI- >new("http://courses.illinois.edu/cis/2011/spring/schedule/CS/$coursenum.html"));
my $num = @{$res2->{array2}};
print $num;
print "---------------------", "\n";
my @curr = @{$res2->{array2}};
foreach my $elem2 (@curr){
$num++;
print $elem2, " ", "\n";
}
print "---------------------", "\n";
}
任何想法?
感謝
我使用Web :: Scraper的方式 – 2011-04-18 02:38:51