2012-02-04 104 views

回答

2

我的朋友,如果你試圖重新發明自動語法檢查,你正在着手一項危險的任務。自然語言充滿了例外,保證逃避任何你認爲可以完成這項工作的小規則。

無論如何,下面是一個清醒天真的刺傷它。現在,這個代碼適用於你給出的例子。它將刪除多餘的「a」。但請注意,如果您關心保留語法,語法和語義,則不應刪除每個重複的單詞。自動刪除重複的「那」將在這工作奇蹟:

我愛那個網站。

但它會採取語法下降到一個非常正規的水平改變作家的這一意圖:

她說,這是一個偉大的網站。

和刪除重複會毀了絕對這裏的一切:

That that is is that that is not is not that that is that that is is not true is not true.

不提這一點:

Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.

爲災難做好準備!但無論如何,代碼適用於您的示例(以及更多),並提供了一個框架供您進行構建和微調,以便在大多數與您相關的情況下都能正常工作。

Dim shp As Shape 
Dim str As String 
Dim wordArr() As String 
Dim words As Collection 
Dim iWord As Long 
Dim thisWord As String 
Dim nextWord As String 
Dim newText As String 

For Each shp In ActivePresentation.Slides(1).Shapes 
    If shp.HasTextFrame Then 
     'Get the text 
     str = shp.TextFrame.TextRange.Text 
     'Split it into an array of words 
     wordArr = Split(str, " ") 

     'Transfer to a Collection, easier to deal with than array. 
     Set words = New Collection 
     For iWord = LBound(wordArr) To UBound(wordArr) 
      words.Add wordArr(iWord) 
     Next iWord 

     'Look for repeats. 
     For iWord = words.Count - 1 To 1 Step -1 
      thisWord = words.Item(iWord) 
      nextWord = words.Item(iWord + 1) 

      'Make sure commas don't get in the way of a comparison 
      'e.g. "This is a great, great site" is fine 
      'but "This site is great great, and I love it" is not. 
      nextWord = Replace(nextWord, ",", "") 
      'Add whatever other filtering you feel is appropriate. 
      'e.g. period, case sensitivity, etc. 

      If LCase(thisWord) = LCase(nextWord) Then 
       If LCase(thisWord) = "that" Then 
        'Do nothing. "He said that that was great." is ok. 
        'This is just an example. "had" is another. 
        'Add more filtering here. 
       Else 
        words.Remove iWord + 1 
       End If 
      End If 
     Next iWord 

     'Assemble the text with repeats removed. 
     newText = "" 
     For iWord = 1 To words.Count 
      newText = newText & words.Item(iWord) & " " 
     Next iWord 

     'Finally, put it back on the slide. 
     shp.TextFrame.TextRange.Text = newText 
    End If 
Next shp 
+0

如果你完善這一然後爲每個形狀它可能更好地抓住所有項目的所有文本,並將它們放在一個名爲段落的單個字符串中。刪除所有逗號。在任何時段執行分割(「。「)轉換爲另一個稱爲句子的數組,然後對每個句子中相鄰的每兩個單詞進行並排測試,並在每個句子中用一個空格分隔 – RetroCoder 2012-02-04 22:06:52

+0

您提出了一個好的觀點,即某些加倍的單詞可能是正確的,修改提示用戶輸入這些輸入將是有用的。雖然你的代碼當然可以工作(我測試過它),並且邏輯上設計它比現成的解析工具需要更長的路徑 – brettdj 2012-02-05 02:46:57

+0

感謝你的朋友。你搖滾!!! – 2012-02-05 06:50:06

0

正則表達式,使這個漂亮和容易

Function remove_duplicates() 

    txt = "Stackoverflow is a a greate site" 

    Set word_match = CreateObject("vbscript.regexp") 
    word_match.IgnoreCase = True 
    word_match.Global = True 

    For Each wrd In Split(txt, " ") 
     word_match.Pattern = wrd & " " & wrd 
     txt = word_match.Replace(txt, wrd) 
    Next 

    MsgBox txt 

End Function 
+1

我不是RegExp的專家,但我敢肯定,這不是一個特別好的方式來使用它... – 2012-02-05 14:46:24

+0

@ Jean-FrançoisCorbett這很有趣,因爲你上面說過以上_I'll必須學習RegExp在某些point_ 。 感謝您對正則表達式的看法 - 不懂正則表達式的人! – rikAtee 2012-02-05 15:41:55

+1

我們是不是處於諷刺狀態......你使用正則表達式基本上是模擬原生VBA'替換'函數的一種迂迴方式:它完全等同於'txt = Replace(txt,wrd&「」&wrd, WRD)'。所以是的,我足夠了解你使用正則表達式不會帶來什麼特別的好處,只會增加複雜性。 – 2012-02-05 15:54:26

1

這是一個經典RegExp應用程序,它可以刪除所有重複的單詞在單杆(而不是字循環字)使用反向引用。

注意,如果你想在訪問底層PPT文本詳細的幫助,那麼你將需要提供更多的信息,到在幻燈片(S)的文本發生

Sub TestString() 
    MsgBox ReducedText("stackoverflow stackoverflow Stackoverflow is a a great site") 
End Sub 

Function ReducedText(strIn As String) As String 
    Dim objRegex As Object 
    Set objRegex = CreateObject("vbscript.regexp") 
    With objRegex 
     .IgnoreCase = True 
     .Global = True 
     .Pattern = "\b(\w+)\b(\s+\1\b)+" 
     ReducedText = .Replace(strIn, "$1") 
    End With 
End Function 
+0

+1我一直在推遲,但我必須在某個時候學習RegExp ... – 2012-02-05 14:48:55