2009-06-02 74 views
4

我正在嘗試在我的數據庫中使用一系列名稱進行全文搜索。這是我第一次嘗試使用全文搜索。目前,我會輸入搜索字符串,並在每個術語之間加入NEAR條件(即輸入的「Leon of Kings」中的短語變成「Leon附近的國王」)。SQL Server 2005中的噪聲字全文搜索

不幸的是,我發現這種策略導致了一個錯誤的否定搜索結果,因爲當它創建索引時,由於它是一個噪音詞,SQL這個詞將被SQL Server刪除。因此,「國王萊昂」將正確匹配,但「萊昂國王」不會。

我的同事建議將MSSQL \ FTData \ noiseENG.txt中定義的所有噪音詞並放在.Net代碼中,以便在執行全文搜索之前可以除去噪音詞。

這是最好的解決方案嗎?有沒有一些自動魔術設置,我可以在SQL Server中更改爲我做這個?或者,也許只是一個更好的解決方案,不覺得自己像哈克一樣?

+0

在以前的項目中,我們使用SQL Server全文搜索並使用c#刪除了噪音詞。 – Kane 2009-06-02 06:46:45

回答

4

全文將使用您提供的搜索條件。您可以從文件中刪除干擾詞,但這樣做會使您的索引大小膨脹。羅伯特·凱恩有關於這個有很多關於他的博客良好的信息:

http://arcanecode.com/2008/05/29/creating-and-customizing-noise-words-in-sql-server-2005-full-text-search/

爲了節省時間,你可以看看這個方法如何刪除它們,並複製代碼和文字:

 public string PrepSearchString(string sOriginalQuery) 
    { 
     string strNoiseWords = @" 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | $ | ! | @ | # | $ | % |^| & | * | (|) | - | _ | + | = | [ | ] | { | } | about | after | all | also | an | and | another | any | are | as | at | be | because | been | before | being | between | both | but | by | came | can | come | could | did | do | does | each | else | for | from | get | got | has | had | he | have | her | here | him | himself | his | how | if | in | into | is | it | its | just | like | make | many | me | might | more | most | much | must | my | never | now | of | on | only | or | other | our | out | over | re | said | same | see | should | since | so | some | still | such | take | than | that | the | their | them | then | there | these | they | this | those | through | to | too | under | up | use | very | want | was | way | we | well | were | what | when | where | which | while | who | will | with | would | you | your | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z "; 

     string[] arrNoiseWord = strNoiseWords.Split("|".ToCharArray()); 

     foreach (string noiseword in arrNoiseWord) 
     { 
      sOriginalQuery = sOriginalQuery.Replace(noiseword, " "); 
     } 
     sOriginalQuery = sOriginalQuery.Replace(" ", " "); 
     return sOriginalQuery.Trim(); 
    } 

然而,我可能會用一個Regex.Replace來做這個,它比循環要快得多。我只是沒有一個快速發佈的例子。

+1

將以下行添加到方法的開頭後,它可以正常工作:sOriginalQuery =「」+ sOriginalQuery +「」; 這是允許匹配搜索短語的第一個或最後一個單詞的噪音詞的必要步驟。 – 2009-06-04 01:33:27

0

這是一個工作功能。文件noiseENU.txt被複制原樣從\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData

Public Function StripNoiseWords(ByVal s As String) As String 
     Dim NoiseWords As String = ReadFile("/Standard/Core/Config/noiseENU.txt").Trim 
     Dim NoiseWordsRegex As String = Regex.Replace(NoiseWords, "\s+", "|") ' about|after|all|also etc. 
     NoiseWordsRegex = String.Format("\s?\b(?:{0})\b\s?", NoiseWordsRegex) 
     Dim Result As String = Regex.Replace(s, NoiseWordsRegex, " ", RegexOptions.IgnoreCase) ' replace each noise word with a space 
     Result = Regex.Replace(Result, "\s+", " ") ' eliminate any multiple spaces 
     Return Result 
    End Function