比較兩個字符串是否相等

-1

我想在ArrayList內添加一個沒有介詞和某種類型的單詞的「乾淨文本」。比較兩個字符串是否相等

我已經得到了所有被取締的話裏面Ph有一個與分離string「字詞1，單詞2，等等......」，textEnArray是一本書的第一個正常的文件。

我想檢查被禁止的單詞的值是不是相同的值textEnArray。如果它不符合，我在ArrayList內部添加一個名爲totEnArray的值。

我遇到了麻煩，因爲如果兩個值是相同的值，並且foreach不能比較，並且它不會過濾任何內容並添加ArrayList中的所有文本。

public static ArrayList topFive(string nomFitxer){ 
    ArrayList totEnArray = new ArrayList(); 

    string totElText = File.ReadAllText(nomFitxer); 
    string PH = File.ReadAllText(GetValues.obtenirRutaFitxerBlackList()); 
    char[] delimiterCharsText = { ' ',',', '.', ':', '\t' }; 
    string[] arrayPH = PH to.Split(','); 
    string[] textEnArray = totElText.Split(delimiterCharsText); 

    foreach (string paraulaProhibida in arrayPH){ 

     foreach (string text in textEnArray){ 
      if (!(paraulaProhibida.Contains(text))){ 
       totEnArray.Add(text); 
      } 
     } 
    } 
}

來源

2016-11-23 PIIZH

從我的理解，我認爲你應該改變'paraulaProhibida.Contains（text）'到'text.Contains（paraulaProhibida）' –

@BojanB - 我認爲OP應該使用'！arrayPH.Contains （文字）'。應該省略整個'foreach（arrayPH中的字符串paraulaProhibida）'。 – Enigmativity

@Enigmativity'arrayPH'是被禁止的單詞，以OP爲例，它應該是單個單詞 - 而textEnArray中的'text'可以是句子或至少是份數 - 爲什麼要在單詞中搜索一個句子？我在這裏錯過了什麼嗎？ –

我特別不給你一個完整的答案，但只是想向您展示您的代碼的樣子。試試這個：

public static List<string> topFive() 
{ 
    string totElText = "this is, or is not, the source text and should, mostly, be ok"; 
    string PH = "the,is,not"; 
    char[] delimiterCharsText = { ' ', ',', '.', ':', '\t' }; 
    string[] arrayPH = PH.Split(','); 
    string[] textEnArray = totElText.Split(delimiterCharsText, StringSplitOptions.RemoveEmptyEntries); 

    return new List<string>(textEnArray.Where(text => !arrayPH.Contains(text))); 
}

在這種情況下，它給出了：

 
this 
or 
source 
text 
and 
should 
mostly 
be 
ok

來源

2016-11-23 12:12:46 Enigmativity

由於@Enigmativity在評論中指出，應忽略第一foreach並搜索整個陣列中的字。就像這樣：

public static ArrayList topFive(string nomFitxer){ 
    ArrayList totEnArray = new ArrayList(); 

    string totElText = File.ReadAllText(nomFitxer); 
    string PH = File.ReadAllText(GetValues.obtenirRutaFitxerBlackList()); 
    char[] delimiterCharsText = { ' ',',', '.', ':', '\t' }; 
    string[] arrayPH = PH to.Split(','); 
    string[] textEnArray = totElText.Split(delimiterCharsText); 

    foreach (string text in textEnArray){ 
     if (!(arrayPH.Contains(text))){ 
      totEnArray.Add(text); 
     } 
    } 
}

您也可以廣告&& !String.IsNullOrEmpty(text)進入if語句，使空字符串不會被添加到結果陣列。

結果數組中總是有所有文字的原因是，因爲您在外環foreach的第一次迭代中過濾了某個單詞，但沒有在第二，第三，......這樣被禁止的單詞之後仍然添加。

來源

2016-11-23 11:37:08

至於我uderstand你，你要

負載黑名單（或停止字）集合paraulaProhibida
從nomFitxer文件過濾掉這些話

您可以實現的東西像這樣：

string blackListFileName = GetValues.obtenirRutaFitxerBlackList(); 

    // Hash set is more efficien O(1) than obsolete ArrayList O(N) 
    HashSet<String> paraulaProhibida = new HashSet<string>(File 
    .ReadLines(blackListFileName) 
    .SelectMany(line => new char[] { ',', ';' }, StringSplitOptions.None)) 
    ,StringComparer.OrdinalIgnoreCase);

Th主要難點是提取一個詞。在自然語言（英語，西班牙語等）的一個詞能很好是一個非常複雜的概念：

I cannot   // 2 words (shall we split "cannot" into "can" and "not"?) 
    I may not   // 3 words 
    Forget-me-not  // 1 word 
    Do not forget me // 4 words 
    It's an IT; it is // "It" and "it" are the same, IT is a different (acronym) 
    per cent   // do we have 1 word? 2 words? 
    George W. Bush // is "W" a word?

這就是爲什麼提取的話我建議使用正則表達式;一個簡單的第一次嘗試：

"[\p{L}'\-]+"

枚舉不paraulaProhibida並兌現到陣列中的所有單詞：

string pattern = @"[\p{L}'\-]+"; 

    string[] textEnArray = File 
    .ReadLines(nomFitxer) 
    .SelectMany(line => Regex.Matches(line, pattern) 
     .OfType<Match>() 
     .Select(match => match.Value)) 
    .Where(word => !paraulaProhibida.Contains(word)) 
    .ToArray();

來源

2016-11-23 12:04:28

如果你想檢查textEnArray每個短語是否包含違禁詞，並消除它們，你可以代替使用這樣的您的循環：

totEnArray = new ArrayList(textEnArray.Where(x => !arrayPH.Any(y => x.Contains(y))).ToList());

這可以解決您的問題，而無需更改代碼太多，但你的代碼可以提高...例如，你可以使用數組或列表，而不是ArrayLis的t ...

來源

2016-11-23 12:21:51

比較兩個字符串是否相等

回答

相關問題