2012-01-03 49 views
6

我正在嘗試查找字符串中字的最高發生次數。查找字符串中發生的最高字數C#

例如

Hello World This is a great world, This World is simply great 

從上面的字符串我試圖計算結果類似如下:

  • 世界,3
  • 很大,2
  • 你好,1
  • 此,2

但忽略長度小於3個字符的任何單詞例如發生了兩次的is

我試圖看看Dictionary<key, value>對,我試圖看看linq的GroupBy擴展名。我知道解決方案存在於兩者之間,但我無法理解算法以及如何完成這一任務。

+4

這是一個家庭作業? – dasblinkenlight 2012-01-03 02:12:18

+0

This is similar:http://stackoverflow.com/questions/8630235/finding-the-number-of-occurences-strings-in-a-specific-format-occur-in-a-given-t/8630247#8630247 – Matthias 2012-01-03 02:17:27

+0

@dasblinkenlight - 不,這不是一項家庭作業,我試圖提取元關鍵字並保存在每個記錄的數據庫中。 – Thr3e 2012-01-03 05:22:16

回答

14

使用LINQ和正則表達式

Regex.Split("Hello World This is a great world, This World is simply great".ToLower(), @"\W+") 
    .Where(s => s.Length > 3) 
    .GroupBy(s => s) 
    .OrderByDescending(g => g.Count()) 
+3

很好的答案,但我wouldnt推薦這個解決方案,它會更好地理解字典,而不是正則表達式或LINQ的用法。只是說。 – DarthVader 2012-01-03 02:19:20

+0

+1最好的答案,因爲它考慮到標點​​符號... – xandercoded 2012-01-03 02:32:47

+0

@DarthVader我同意數學水平。從編程的角度來看,瞭解LINQ同樣重要。 – 2012-01-03 03:53:19

0
string words = "Hello World This is a great world, This World is simply great".ToLower(); 

var results = words.Split(' ').Where(x => x.Length > 3) 
           .GroupBy(x => x) 
           .Select(x => new { Count = x.Count(), Word = x.Key }) 
           .OrderByDescending(x => x.Count); 

foreach (var item in results) 
    Console.WriteLine(String.Format("{0} occured {1} times", item.Word, item.Count)); 

Console.ReadLine(); 

爲了獲得最出現的詞:

results.First().Word;

+0

而不是'Word = x.First()',您可以通過'Word = x.Key'訪問組密鑰。 – 2012-01-03 02:34:12

+0

謝謝@TathamOddie不知道 - * btw *,謝謝FormsAuthenticationExtensions;) – xandercoded 2012-01-03 02:39:18

3
const string input = "Hello World This is a great world, This World is simply great"; 
var words = input 
    .Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries) 
    .Where(w => w.Length >= 3) 
    .GroupBy(w => w) 
    .OrderByDescending(g => g.Count()); 

foreach (var word in words) 
    Console.WriteLine("{0}x {1}", g.Count(), word.Key); 

// 2x World 
// 2x This 
// 2x great 
// 1x Hello 
// 1x world, 
// 1x simply 

並不完美,因爲它不修剪逗號,但它會告訴你至少如何進行分組和過濾。

3

所以我會避免LINQ和正則表達式之類的,因爲它聽起來像你試圖找到一個算法,並明白這不使用一些函數來爲你做。

不是說這些都不是有效的解決方案。他們是。當然。

嘗試這樣的事情

Dictionary<string, int> dictionary = new Dictionary<string, int>(); 

string sInput = "Hello World, This is a great World. I love this great World"; 
sInput = sInput.Replace(",", ""); //Just cleaning up a bit 
sInput = sInput.Replace(".", ""); //Just cleaning up a bit 
string[] arr = sInput.Split(' '); //Create an array of words 

foreach (string word in arr) //let's loop over the words 
{ 
    if (word.Length >= 3) //if it meets our criteria of at least 3 letters 
    { 
     if (dictionary.ContainsKey(word)) //if it's in the dictionary 
      dictionary[word] = dictionary[word] + 1; //Increment the count 
     else 
      dictionary[word] = 1; //put it in the dictionary with a count 1 
    } 
} 

foreach (KeyValuePair<string, int> pair in dictionary) //loop through the dictionary 
    Response.Write(string.Format("Key: {0}, Pair: {1}<br />",pair.Key,pair.Value)); 
+0

你可以製作 string sInput =「Hello World,這是一個偉大的世界,我喜歡這個偉大的世界。 string sInput =「Hello World,這是一個偉大的世界,我喜歡這個偉大的世界。」ToLower(); 以使其不區分大小寫。就我個人而言,我喜歡區分大小寫,因爲它可以告訴你專有名詞和句子的開頭,即使沒有標點符號,也不會丟失任何東西。缺點世界和世界並不平等。 – Jordan 2012-01-03 02:49:28

+0

+1好的教學帖子。如果可以的話,會給出另一個+1的評論。 – 2012-01-03 03:29:11

+1

關於您對套管的評論,有一個更簡單的解決方案:使用不區分大小寫的比較器創建字典:var var dictionary = new Dictionary (StringComparer.InvariantCultureIgnoreCase);' – 2012-01-03 03:31:47

2

我寫一個字符串處理器class.You可以使用它。

實施例:

metaKeywords = bodyText.Process(blackListWords: prepositions).OrderByDescending().TakeTop().GetWords().AsString(); 

類別:

public static class StringProcessor 
{ 
    private static List<String> PrepositionList; 

    public static string ToNormalString(this string strText) 
    { 
     if (String.IsNullOrEmpty(strText)) return String.Empty; 
     char chNormalKaf = (char)1603; 
     char chNormalYah = (char)1610; 
     char chNonNormalKaf = (char)1705; 
     char chNonNormalYah = (char)1740; 
     string result = strText.Replace(chNonNormalKaf, chNormalKaf); 
     result = result.Replace(chNonNormalYah, chNormalYah); 
     return result; 
    } 

    public static List<KeyValuePair<String, Int32>> Process(this String bodyText, 
     List<String> blackListWords = null, 
     int minimumWordLength = 3, 
     char splitor = ' ', 
     bool perWordIsLowerCase = true) 
    { 
     string[] btArray = bodyText.ToNormalString().Split(splitor); 
     long numberOfWords = btArray.LongLength; 
     Dictionary<String, Int32> wordsDic = new Dictionary<String, Int32>(1); 
     foreach (string word in btArray) 
     { 
      if (word != null) 
      { 
       string lowerWord = word; 
       if (perWordIsLowerCase) 
        lowerWord = word.ToLower(); 
       var normalWord = lowerWord.Replace(".", "").Replace("(", "").Replace(")", "") 
        .Replace("?", "").Replace("!", "").Replace(",", "") 
        .Replace("<br>", "").Replace(":", "").Replace(";", "") 
        .Replace("،", "").Replace("-", "").Replace("\n", "").Trim(); 
       if ((normalWord.Length > minimumWordLength && !normalWord.IsMemberOfBlackListWords(blackListWords))) 
       { 
        if (wordsDic.ContainsKey(normalWord)) 
        { 
         var cnt = wordsDic[normalWord]; 
         wordsDic[normalWord] = ++cnt; 
        } 
        else 
        { 
         wordsDic.Add(normalWord, 1); 
        } 
       } 
      } 
     } 
     List<KeyValuePair<String, Int32>> keywords = wordsDic.ToList(); 
     return keywords; 
    } 

    public static List<KeyValuePair<String, Int32>> OrderByDescending(this List<KeyValuePair<String, Int32>> list, bool isBasedOnFrequency = true) 
    { 
     List<KeyValuePair<String, Int32>> result = null; 
     if (isBasedOnFrequency) 
      result = list.OrderByDescending(q => q.Value).ToList(); 
     else 
      result = list.OrderByDescending(q => q.Key).ToList(); 
     return result; 
    } 

    public static List<KeyValuePair<String, Int32>> TakeTop(this List<KeyValuePair<String, Int32>> list, Int32 n = 10) 
    { 
     List<KeyValuePair<String, Int32>> result = list.Take(n).ToList(); 
     return result; 
    } 

    public static List<String> GetWords(this List<KeyValuePair<String, Int32>> list) 
    { 
     List<String> result = new List<String>(); 
     foreach (var item in list) 
     { 
      result.Add(item.Key); 
     } 
     return result; 
    } 

    public static List<Int32> GetFrequency(this List<KeyValuePair<String, Int32>> list) 
    { 
     List<Int32> result = new List<Int32>(); 
     foreach (var item in list) 
     { 
      result.Add(item.Value); 
     } 
     return result; 
    } 

    public static String AsString<T>(this List<T> list, string seprator = ", ") 
    { 
     String result = string.Empty; 
     foreach (var item in list) 
     { 
      result += string.Format("{0}{1}", item, seprator); 
     } 
     return result; 
    } 

    private static bool IsMemberOfBlackListWords(this String word, List<String> blackListWords) 
    { 
     bool result = false; 
     if (blackListWords == null) return false; 
     foreach (var w in blackListWords) 
     { 
      if (w.ToNormalString().Equals(word)) 
      { 
       result = true; 
       break; 
      } 
     } 
     return result; 
    } 
} 
相關問題