2008-10-22 105 views
3

關於如何計算C#中的單詞頻率,有一些很好的例子,但沒有一個是全面的,我真的需要一個在VB.NET中。在VB.NET中計算詞頻的最佳方法是什麼?

我目前的做法是每個頻率計數限制爲一個字。什麼是最好的方式來改變這一點,以便我可以得到一個完全準確的詞頻列表?

wordFreq = New Hashtable() 

Dim words As String() = Regex.Split(inputText, "(\W)") 
    For i As Integer = 0 To words.Length - 1 
     If words(i) <> "" Then 
      Dim realWord As Boolean = True 
      For j As Integer = 0 To words(i).Length - 1 
       If Char.IsLetter(words(i).Chars(j)) = False Then 
        realWord = False 
       End If 
      Next j 

      If realWord = True Then 
       If wordFreq.Contains(words(i).ToLower()) Then 
        wordFreq(words(i).ToLower()) += 1 
       Else 
        wordFreq.Add(words(i).ToLower, 1) 
       End If 
      End If 
     End If 
    Next 

Me.wordCount = New SortedList 

For Each de As DictionaryEntry In wordFreq 
     If wordCount.ContainsKey(de.Value) = False Then 
      wordCount.Add(de.Value, de.Key) 
     End If 
Next 

我更喜歡一個實際的代碼片段,但通用的'噢是啊...使用這個和運行'將工作以及。

回答

2
Public Class CountWords 

    Public Function WordCount(ByVal str As String) As Dictionary(Of String, Integer) 
     Dim ret As Dictionary(Of String, Integer) = New Dictionary(Of String, Integer) 

     Dim word As String = "" 
     Dim add As Boolean = True 
     Dim ch As Char 

     str = str.ToLower 
     For index As Integer = 1 To str.Length - 1 Step index + 1 
      ch = str(index) 
      If Char.IsLetter(ch) Then 
       add = True 
       word += ch 
      ElseIf add And word.Length Then 
       If Not ret.ContainsKey(word) Then 
        ret(word) = 1 
       Else 
        ret(word) += 1 
       End If 
       word = "" 
      End If 
     Next 

     Return ret 
    End Function 

End Class 

然後一個快速演示應用程序,創建一個WinForms應用程序有一個稱爲InputBox的多行文本框,稱爲OutputList的一個listview和稱爲CountBtn的一個按鈕。在列表視圖中創建兩列 - 「Word」和「Freq」。選擇「詳細信息」列表類型。爲CountBtn添加一個事件處理程序。然後使用下面的代碼:

Imports System.Windows.Forms.ListViewItem 

Public Class MainForm 

    Private WordCounts As CountWords = New CountWords 

    Private Sub CountBtn_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles CountBtn.Click 
     OutputList.Items.Clear() 
     Dim ret As Dictionary(Of String, Integer) = Me.WordCounts.WordCount(InputBox.Text) 
     For Each item As String In ret.Keys 
      Dim litem As ListViewItem = New ListViewItem 
      litem.Text = item 
      Dim csitem As ListViewSubItem = New ListViewSubItem(litem, ret.Item(item).ToString()) 

      litem.SubItems.Add(csitem) 
      OutputList.Items.Add(litem) 

      Word.Width = -1 
      Freq.Width = -1 
     Next 
    End Sub 
End Class 

你做了一件可怕的事情,讓我在VB中寫這篇文章,我永遠不會原諒你。

:p

祝你好運!

編輯

固定的空白串漏洞和錯誤的情況下

3

這可能是你找什麼:

Dim Words = "Hello World))))) This is a test Hello World" 
    Dim CountTheWords = From str In Words.Split(" ") _ 
         Where Char.IsLetter(str) _ 
         Group By str Into Count() 

我剛纔測試了它和它的工作

編輯!我添加了代碼以確保它只計算字母而不是符號。

FYI:我發現瞭如何使用LINQ和目標2.0的文章,它是一種感覺有點髒,但它可能會幫助別人http://weblogs.asp.net/fmarguerie/archive/2007/09/05/linq-support-on-net-2-0.aspx

+0

我使用.NET 2.0,所以unfortuantely我不能使用LINQ。 – ine 2008-10-22 04:59:52

+0

Awww完全只是把它塞滿了。 – 2008-10-22 05:03:47

+0

這對你來說會很容易。 – 2008-10-22 05:04:24

2

相當接近,但\ W +是一個很好的正則表達式來匹配(只匹配單詞字符)。

Public Function CountWords(ByVal inputText as String) As Dictionary(Of String, Integer) 
    Dim frequency As New Dictionary(Of String, Integer) 

    For Each wordMatch as Match in Regex.Match(inputText, "\w+") 
     If frequency.ContainsKey(wordMatch.Value.ToLower()) Then 
      frequency(wordMatch.Value.ToLower()) += 1 
     Else 
      frequency.Add(wordMatch.Value.ToLower(), 1) 
     End If 
    Next 
    Return frequency 
End Function 
相關問題