2013-10-30 59 views
1

我已讀取的文件,並將它們分割成單詞的數組:顯示信息給用戶

file1 = File.open("spam1.txt","rb") 
file1_contents = file1.read 
file1 = file1_contents.split(' ') 

我可以計算單詞的頻率,使用散列,並

freqs1 = Hash.new(0) 
file1.each { |word| freqs1[word] +=1} 
freqs1 = freqs1.sort_by {|x,y| y} 
freqs1.reverse! 

也可以將結果輸出給用戶這樣的:

freqs.each{|word, freq| puts word + ' ' + freq.to_s} 

我根據詞的出現次數進行排序想要向用戶顯示消息,如果數組file1或散列freqs1包含某些詞多次

我有一個(壞)主意遍歷freqs1散列和顯示適當的消息給用戶:

freqs1.each{|word,freq| 
    if ((word == ('business' || 'fund' || 'funds' || 'account' ||'transfer' || 'money')) && freq > 2) || (word == 'Iraq' && freq > 1) then 
     puts "File 1 is suspected as spam mail - suspicious word frequency" 
    else 
     puts "File 1 does not appear to be spam email" 
    end 
} 

然而,這是我傻的,因爲這作用於hash中的每個元素。

如果像business, fund, funds, account等字樣出現超過兩次,我怎樣才能向用戶顯示某個消息?

感謝您的任何幫助。

回答

1

如果你只是希望改善的是最後陳述,試試這個(未測試,但應該去):

bad_words = %w{business fund funds account transfer money} 
is_spam = freqs1.any? do |word, freq| 
    (freq > 2 && bad_words.include?(word)) || (word == 'Iraq' && freq > 1) 
end 

if is_spam 
    puts "File 1 is suspected as spam mail - suspicious word frequency" 
else 
    puts "File 1 does not appear to be spam email" 
end 

Enumerable#any?會做的大部分工作的你,還抽取名單壞詞有助於可讀性。

1

我會做這樣的事情:

word_filter = [ 
{count: 2, words: ['business','fund','funds','account','transfer','money']}, 
{count: 1, words: ['iraq']} 
] 

alert  = "File 1 is suspected as spam mail - suspicious word frequency" 
calm_message = "File 1 does not appear to be spam email" 

grouped_words = file1.group_by{|x|x}.map{|x,array|[x,array.size]} 

appears_to_be_spam = grouped_words.any?{ |word,count| 
    word_filter.any? do |filter| 
    filter[:words].include?(word.downcase) && count > filter[:count] 
    end 
} 

puts appears_to_be_spam ? alert : calm_message 
+0

感謝 - 這工作,@Nick Veys是早期的答案,從而不得不接受他的 - 但我喜歡這種方法。 – Tom