另一個Perl的嘗試:
#!/usr/bin/perl -w
use strict;
use File::Slurp;
use Tie::File;
# Usage:
#
# $ perl WordCount.pl <Files>
#
# Example:
#
# $ perl WordCount.pl *.text
#
# Counts words in all files given as arguments.
# The words are taken from the file "WordList".
# The output is appended to the file "WordCount.out" in the format implied in the
# following example:
#
# File,Word1,Word2,Word3,...
# File1,0,5,3,...
# File2,6,3,4,...
# .
# .
# .
#
### Configuration
my $CaseSensitive = 1; # 0 or 1
my $OutputSeparator = ","; # another option might be "\t" (TAB)
my $RemoveHyphenation = 0; # 0 or 1. Careful, may be too greedy.
###
my @WordList = read_file("WordList");
chomp @WordList;
tie (my @Output, 'Tie::File', "WordCount.out");
push (@Output, join ($OutputSeparator, "File", @WordList));
for my $InFile (@ARGV)
{ my $Text = read_file($InFile);
if ($RemoveHyphenation) { $Text =~ s/-\n//g; };
my %Count;
for my $Word (@WordList)
{ if ($CaseSensitive)
{ $Count{$Word} = ($Text =~ s/(\b$Word\b)/$1/g); }
else
{ $Count{$Word} = ($Text =~ s/(\b$Word\b)/$1/gi); }; };
my $OutputLine = "$InFile";
for my $Word (@WordList)
{ if ($Count{$Word})
{ $OutputLine .= $OutputSeparator . $Count{$Word}; }
else
{ $OutputLine .= $OutputSeparator . "0"; }; };
push (@Output, $OutputLine); };
untie @Output;
當我把你的問題的文件wc-test
和羅伯特寶潔的答案爲wc-ans-test
,輸出文件看起來是這樣的:
File,linux,frequencies,science,words
wc-ans-test,2,2,2,12
wc-test,1,3,1,3
這是逗號分隔值(csv)文件(但您可以更改腳本中的分隔符)。它應該對任何電子表格應用程序都是可讀的。對於繪製圖表,我會推薦gnuplot
,它可以完全編寫腳本,因此您可以獨立於輸入數據調整輸出。
我很好奇,單詞列表是什麼? (和文本的種類) – 2008-11-24 23:19:06
文章。術語列表是該領域的關鍵詞。 – fdsayre 2008-11-24 23:33:20