如何獲得具有給定屬性的所有Unicode字符的列表？

沒有遍歷整個Unicode字符範圍，我如何獲得具有給定屬性的字符列表？特別是我想要一個所有字符都是數字的列表（即匹配/\d/）。我已經看過Unicode::UCD，它對於確定給定字符的屬性很有用，但似乎沒有辦法獲得具有屬性的列表字符。如何獲得具有給定屬性的所有Unicode字符的列表？

來源

2009-07-25 Chas. Owens

的Unicode字符每個類的列表是從Unicode的規格時編譯的Perl生成，並且典型地存儲在/ usr/LIB/Perl的YOURPERLVERSION /單核/ LIB/gc_sc/

例如，匹配IsDigit（又名\ d）的Unicode字符範圍列表存儲在文件/usr/lib/perl-YOURPERLVERSION/unicore/lib/gc_sc/Digit.pl中

來源

2009-07-25 16:51:25 tetromino

謝謝，這幾乎就是我正在尋找的東西。我仍然會對他們進行循環以建立一個列表，但至少這不會持續一整天。 – 2009-07-25 16:57:12

哪些字符/ \ d /匹配完全取決於您的正則表達式實現（雖然標準0-9保證）。在perl的情況下，perl locale用於定義哪些字符被認爲是字母和數字。

來源

2009-07-25 16:28:39 ewanm89

Perl的字符串轉換爲UTF8通過正則表達式引擎運行之前。 perl語言環境唯一影響的是如何將原始字節字符串轉換爲utf8。一旦字符串處於utf8中，perl將始終使用IsDigit的相同定義，而與locale無關。 – tetromino 2009-07-25 16:56:13

沒有辦法做到這一點，沒有迭代通過所有的角色。（如果你用它們創建一個巨大的字符串並使用正則表達式，你仍然必須至少執行一次循環來創建字符串）。

來源

2009-07-25 18:02:06

令人高興的是，部分Perl構建過程在libc目錄下的`unicore`下創建了一組文件，這些文件夾已經爲您完成了很多工作。我不知道他們是否是官方的，我對Perl 5 Porters列表有疑問，以確定它們是否安全。 – 2009-07-25 20:45:29

甚至比unicore/lib/gc_sc/Digit.pl更好的是unicore/To/Digit.pl。它是Unicode數字字符（以及它們的偏移量）與它們的數字值的直接映射。這意味着不是：

use Unicode::Digits qw/digit_to_int/; 

my @digits; 
for (split "\n", require "unicore/lib/gc_sc/Digit.pl") { 
    my ($s, $e) = map hex, split; 
    for (my $ord = $s; $ord <= $e; $ord++) { 
     my $chr = chr $ord; 
     push @{$digits[digits_to_int $chr]}, $chr; 
    } 
} 

for my $i (0 .. 9) { 
    my $re = join '', "[", @{$digits[$i]}, "]"; 
    $digits[$i] = qr/$re/; 
}

我可以說：

my @digits; 
for (split "\n", require "unicore/To/Digit.pl") { 
    my ($ord, $val) = split; 
    my $chr = chr hex $ord; 
    push @{$digits[$val]}, $chr; 
} 

for my $i (0 .. 9) { 
    my $re = join '', "[", @{$digits[$i]}, "]"; 
    $digits[$i] = qr/$re/; 
}

甚至更好：

my @digits; 
for (split "\n", require "unicore/To/Digit.pl") { 
    my ($ord, $val) = split; 
    $digits[$val] .= "\\x{$ord}"; 
} 
@digits = map { qr/[$_]/ } @digits;

來源

2009-07-25 20:12:01

如何獲得具有給定屬性的所有Unicode字符的列表？

回答

相關問題