比較單詞，還需要尋找複數和ing？

我有兩個單詞列表，假設LIST1和LIST2。我想將LIST1與LIST2進行比較以找到重複項，但它應該找到該詞的複數以及格式。例如。比較單詞，還需要尋找複數和ing？

假設LIST1有單詞「account」，而LIST2有單詞「accounts，accounting」當我比較結果時應顯示兩個單詞「account」匹配。

我在PHP中做它，並在mysql表中有LIST。

2010-12-06 daron

我知道這是一箇舊線程，但我只是添加了一個答案，如果您有一分鐘，請參閱您的想法。 – quickshiftin 2016-04-19 19:50:47

我會做的是你的話，直接把它比作列表2，並在同一時間從每一個字your're比較尋找遺留ING刪除你的話，S，ES表示複數或荷蘭國際集團詞（這應該足夠準確）。如果沒有，你必須產生一種算法制作出複數詞，因爲它並不是這麼簡單，添加S.

Duplicate Ending List 
s 
es 
ing 

LIST1 
Gas 
Test 

LIST2 
Gases 
Tests 
Testing

現在比較列表1到列表2。在相同的比較循環中，對項目進行直接比較，並將列表1中的單詞從列表2中的當前單詞中刪除。現在只需檢查此結果是否在重複結束列表中。

希望是有道理的。

來源

2010-12-06 17:33:26 Veign

問題在於，至少在英文中，複數不是全部的標準擴展名，也不是現在分詞。您可以使用所有詞+'ing'和+'s'進行近似，但會給出誤報和否定。

如果你願意，你可以直接在MySQL中處理它。

SELECT DISTINCT l2.word 
    FROM LIST1 l1, LIST l2 
    WHERE l1.word = l2.word OR l1.word + 's' = l2.word OR l1.word + 'ing' = l2.word;

來源

2010-12-06 17:38:23 Orbling

您可以使用一個名爲porter stemming映射每個列表項的幹法，然後比較莖。在PHP中使用Porter Stemming算法可以找到here或here。

來源

2010-12-06 17:38:41

+1在波特開始我乾的研究:) – RobertPitt 2010-12-06 17:41:59

不錯。以前從未聽說過這種技術。 – Veign 2010-12-06 17:43:02

該函數將輸出一個單詞的複數。類似

http://www.exorithm.com/algorithm/view/pluralize

東西可以爲動名詞和現在分詞（荷蘭國際集團形式）

來源

2010-12-06 20:15:41

你可能會考慮使用Doctrine Inflector類連同stemmer此寫入。

這裏是在較高水平的算法空間

分割搜索字符串，過程字單獨
小寫搜索詞
地帶的特殊字符
Singularize，更換differing portion通配符（」％'）
莖，用通配符替換不同部分（'％'）

這裏的函數我放在一起

/** 
* Use inflection and stemming to produce a good search string to match subtle 
* differences in a MySQL table. 
* 
* @string $sInputString The string you want to base the search on 
* @string $sSearchTable The table you want to search in 
* @string $sSearchField The field you want to search 
*/ 
function getMySqlSearchQuery($sInputString, $sSearchTable, $sSearchField) 
{ 
    $aInput = explode(' ', strtolower($sInputString)); 
    $aSearch = []; 
    foreach($aInput as $sInput) { 
     $sInput = str_replace("'", '', $sInput); 

     //-------------------- 
     // Inflect 
     //-------------------- 
     $sInflected = Inflector::singularize($sInput); 

     // Otherwise replace the part of the inflected string where it differs from the input string 
     // with a % (wildcard) for the MySQL query 
     $iPosition = strspn($sInput^$sInflected, "\0"); 

     if($iPosition !== null && $iPosition < strlen($sInput)) { 
      $sInput = substr($sInflected, 0, $iPosition) . '%'; 
     } else { 
      $sInput = $sInput; 
     } 

     //-------------------- 
     // Stem 
     //-------------------- 
     $sStemmed = stem_english($sInput); 

     // Otherwise replace the part of the inflected string where it differs from the input string 
     // with a % (wildcard) for the MySQL query 
     $iPosition = strspn($sInput^$sStemmed, "\0"); 

     if($iPosition !== null && $iPosition < strlen($sInput)) { 
      $aSearch[] = substr($sStemmed, 0, $iPosition) . '%'; 
     } else { 
      $aSearch[] = $sInput; 
     } 
    } 

    $sSearch = implode(' ', $aSearch); 
    return "SELECT * FROM $sSearchTable WHERE LOWER($sSearchField) LIKE '$sSearch';"; 
}

這我跑了幾個測試串

Input String: Mary's Hamburgers 
SearchString: SELECT * FROM LIST2 WHERE LOWER(some_field) LIKE 'mary% hamburger%'; 

Input String: Office Supplies 
SearchString: SELECT * FROM LIST2 WHERE LOWER(some_field) LIKE 'offic% suppl%'; 

Input String: Accounting department 
SearchString: SELECT * FROM LIST2 WHERE LOWER(some_field) LIKE 'account% depart%';

也許並不完美，但它是一個良好的開端反正！何時會跌倒的時候是多次比賽返回。沒有什麼邏輯來確定最佳匹配。這就是MySQL fulltext和Lucene之類的東西。想多瞭解一下，您可以使用levenshtein來用這種方法對多個結果進行排名！

來源

2016-04-19 19:48:16 quickshiftin

比較單詞，還需要尋找複數和ing？

回答

相關問題