如何使用Perl的正則表達式

符合中國的角色，我需要匹配的UTF8編碼的HTML一些中國字，我寫了如下一些測試代碼：如何使用Perl的正則表達式

#! /usr/bin/perl 

use strict; 
use LWP::UserAgent; 
use Encode; 

my $ua = new LWP::UserAgent; 

my $request = HTTP::Request->new('GET'); 
my $url = 'http://www.boc.cn/sourcedb/whpj/'; 
$request->url($url); 

my $res = $ua->request($request) ; 

my $str_chinese = encode("utf8" ,"英磅") ; 
# my $str_chinese = "英磅" ; 


my $str_english = "English" ; 
#my $html = decode("utf8" , $res->content) ; 
my $html = $res->content ; 

if ($html =~ /$str_chinese/) { 
    print "chinese word matched" ; 
}else { 
    print "chinese word unmatched\n" ; 
} 

if ($html =~ /$str_english/i) { 
    print "english word matched\n" ; 
}else { 
    print "english word unmatched\n" ; 
}

輸出顯示的腳本不匹配現有的中文字符嵌入在HTML中。你能給我一些關於如何解決我的問題的提示嗎？

來源

2009-12-23 Haiyuan Zhang

您應該使用HTTP::Message類中的方法decoded_content代替。手動解碼不是必需的。

#!/usr/bin/env perl 
use utf8; 
use strict; 
use LWP::UserAgent; 

my $html = LWP::UserAgent->new 
    ->get('http://www.boc.cn/sourcedb/whpj/') 
    ->decoded_content; 

my $str_chinese = '首頁'; 
my $str_english = 'English'; 

if ($html =~ /$str_chinese/) { 
    print "chinese word matched\n"; 
} else { 
    print "chinese word unmatched\n"; 
} 

if ($html =~ /$str_english/i) { 
    print "english word matched\n"; 
} else { 
    print "english word unmatched\n"; 
}

輸出：

chinese word matched 
english word matched

來源

2009-12-23 13:29:15 daxim

@daxim：我不能運行aove腳本，你在Windows下提供，PERL抱怨說是畸形的UTF8字符。我使用的編輯器是gvim 7.2版。 – 2009-12-23 16:02:13

正如我之前寫的，你必須告訴gvim將文件保存爲UTF-8。 http://stackoverflow.com/questions/1945221#1945756 – daxim 2009-12-24 12:46:13

既然你已經在源代碼中添加UTF-8字符，你必須：

use utf8;

它告訴Perl腳本是用UTF-8。

來源

2009-12-23 10:08:28

我運行你的代碼和漢字不匹配。

然後我檢查html，它不包含這些字符。所以這可能是不匹配的情況下的原因。然後我嘗試了一些其他的字符（聯），並刪除了編碼功能。即my $str_chinese = "聯";

用此更改運行代碼並匹配字符。

來源

2009-12-23 10:13:55 RahulJ

如何使用Perl的正則表達式

回答

相關問題