我能想象這會工作會在字母所有可能的組合來分析,並將它們與對字典的唯一途徑。將它們與字典進行比較的最快方法是將該字典轉換爲散列。這樣,你可以快速查找這個單詞是否是一個有效的單詞。
我用詞典中的所有字母鍵入我的詞典,然後刪除任何非alpha字符以保證安全。對於這個值,我會存儲實際的字典單詞。例如:
cant => "can't",
google => "Google",
這樣,我就可以顯示拼寫正確的單詞。
我發現Math::Combinatorics看起來不錯,但沒有按照我希望的方式工作。你給它一個字母列表,它會返回你指定的字母數的所有組合。因此,我認爲我所要做的就是將這些字母轉換爲單個字母的列表,並簡單地遍歷所有可能的組合!
不......這給了我所有的無序組合。然後,我必須對每個組合列出所有可能的字母排列。胡說! Ptooy! Yech!
因此,循環中臭名昭着的循環。其實,三個循環。 *外層循環只需將所有數字的組合從1到單詞中的字母數倒計數。 *下一個查找每個字母組的所有無序組合。 *最後,最後一個採用所有無序組合並從這些組合中返回一個排列列表。
現在,我終於可以把這些字母組合與我的字典進行比較。令人驚訝的是,程序的運行速度比我預想的要快得多,因爲考慮到它必須將235,886字的字典變成散列,然後循環通過三層循環來查找所有可能數量字母的所有組合的所有排列。整個程序在不到兩秒的時間內運行。
#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say);
use autodie;
use Data::Dumper;
use Math::Combinatorics;
use constant {
LETTERS => "EBLAIDL",
DICTIONARY => "/usr/share/dict/words",
};
#
# Create Dictionary Hash
#
open my $dict_fh, "<", DICTIONARY;
my %dictionary;
foreach my $word (<$dict_fh>) {
chomp $word;
(my $key = $word) =~ s/[^[:alpha:]]//;
$dictionary{lc $key} = $word;
}
#
# Now take the letters and create a Perl list of them.
#
my @letter_list = split // => LETTERS;
my %valid_word_hash;
#
# Outer Loop: This is a range from one letter combinations to the
# maximum letters combination
#
foreach my $num_of_letters (1..scalar @letter_list) {
#
# Now we generate a reference to a list of lists of all letter
# combinations of $num_of_letters long. From there, we need to
# take the Permutations of all those letters.
#
foreach my $letter_list_ref (combine($num_of_letters, @letter_list)) {
my @letter_list = @{$letter_list_ref};
# For each combination of letters $num_of_letters long,
# we now generate a permeation of all of those letter
# combinations.
#
foreach my $word_letters_ref (permute(@letter_list)) {
my $word = join "" => @{$word_letters_ref};
#
# This $word is just a possible candidate for a word.
# We now have to compare it to the words in the dictionary
# to verify it's a word
#
$word = lc $word;
if (exists $dictionary{$word}) {
my $dictionary_word = $dictionary{$word};
$valid_word_hash{$word} = $dictionary_word;
}
}
}
}
#
# I got lazy here... Just dumping out the list of actual words.
# You need to go through this list to find your longest and
# shortest words. Number of syllables? That's trickier, you could
# see if you can divide on CVC and CVVC divides where C = consonant
# and V = vowel.
#
say join "\n", sort keys %valid_word_hash;
運行這個程序產生:
$ ./test.pl | column
a al balei bile del i lai
ab alb bali bill delia iba laid
abdiel albe ball billa dell ibad lea
abe albi balled billed della id lead
abed ale balli blad di ida leal
abel alible be blade dial ide led
abide all bea blae dib idea leda
abie alle bead d die ideal lei
able allie beal da dieb idle leila
ad allied bed dab dill ie lelia
ade b beid dae e ila li
adib ba bel dail ea ill liable
adiel bad bela dal ed l libel
ae bade beld dale el la lid
ai bae belial dali elb lab lida
aid bail bell dalle eld label lide
aide bal bella de eli labile lie
aiel bald bid deal elia lad lied
ail baldie bide deb ell lade lila
aile bale bield debi ella ladle lile
音節數?這將是一個粗糙的...我很感興趣,看看你想出了什麼! – 2012-02-02 00:15:25
**音節**在西班牙文中很平凡,但英文很難。現有模塊做得不好;我寫了自己的版本,效果更好,但現在我無法處理它。 Albeber寫給* castellano *的'Lingua :: ES :: Syllabify'模塊給Mon·te·ro A·sen·jo。 – tchrist 2012-02-02 00:16:51
應該'ABL'返回'BALL'嗎?或者每個字母只能使用一次? – ikegami 2012-02-02 01:51:02