使用MS Word XML

對我來說總是很難理解（特別是在英語中，這不是我的第一語言）解釋，我的問題是什麼，所以我提前抱歉錯綜複雜或過分瑣碎;）。使用MS Word XML

我需要做的是以特定的方式「解析」Word XML文檔。轉換爲xml的文件有一些部分將放置在某些固定標記（如[...]或/ * ... * /或其他任何東西）之間，我需要它們分別保留爲一個文本塊，而Word來自：

[SOME_TEXT.SOME_OTHER_TEXT]

使得類似：

<w:r> 
    <w:rPr><not relevant /></w:rPr> 
    <w:t> 
     [SOME_TEXT. 
    </w:t> 
</w:r> 
<w:r> 
    <w:rPr><not relevant /></w:rPr> 
    <w:t> 
     SOME_OTHER_TEXT 
    </w:t> 
</w:r> 
<w:r> 
    <w:rPr><not relevant /></w:rPr> 
    <w:t> 
     ] 
    </w:t> 
</w:r>

，而不是如：

<w:r> 
    <w:rPr><not relevant /></w:rPr> 
    <w:t> 
     [SOME_TEXT.SOME_OTHER_TEXT] 
    </w:t> 
</w:r>

我試圖Application.Options.StoreRSIDOnSave設置爲false，使用普通格式的所有文本，關掉咒語cking等，但Word仍然「隨機」分割一些字符串（尤其是當他們從別的地方粘貼，而不是用手寫的時候） - 而且我不能告訴人們，誰來創建這些XML文檔，做一百個其他的之前他們可以在我的應用程序中使用他們的文件。所以我需要照顧自己準備文件。我想知道什麼是最好的和儘可能簡單的解決方案來做到這一點 - 通過XmlDocument閱讀，通過節點循環，並刪除它們注意關閉那些需要關閉，並把*/* /乾淨之間或做同樣的事情，但通過閱讀純文本文件。或者，也許有人有一些更好的想法（像一些聰明的正則表達式;））？我會非常感謝所有的幫助。

//編輯 我設法解決了這個問題。我的解決辦法也許是有點「瘸」，但完美的作品;）

Dim MyMarkedString As Boolean = False 
Dim MyTextOpened As Boolean = False 
Dim MyFile As String = File.ReadAllText(pFileName) 
Dim MyFileCopy As String = String.Empty 
For Each foundPart As Match In Regex.Matches(MyFile, "((<\??/?)(?:[^:\s>]+:)?(\w+).*?(/?\??>))|(?!<)(\[?((?!<).)+\]?)") 
    If (foundPart.Value.Equals("<w:t>") OrElse foundPart.Value.Contains("<w:t ")) AndAlso Not MyMarkedString Then 
     MyTextOpened = True 
     MyFileCopy += foundPart.Value 
    ElseIf (foundPart.Value.Equals("</w:t>") OrElse foundPart.Value.Contains("</w:t ")) AndAlso Not MyMarkedString Then 
     MyTextOpened = False 
     MyFileCopy += foundPart.Value 
    ElseIf (foundPart.Value.Equals("<w:t>") OrElse foundPart.Value.Contains("<w:t ")) AndAlso MyMarkedString Then 
     MyTextOpened = True 
     MyFileCopy += "" 
    ElseIf (foundPart.Value.Equals("</w:t>") OrElse foundPart.Value.Contains("</w:t ")) AndAlso MyMarkedString Then 
     MyTextOpened = False 
     MyFileCopy += "" 
    Else 
     If MyTextOpened AndAlso Not MyMarkedString Then 
      If foundPart.Value.Contains("[") AndAlso Not foundPart.Value.Contains("]") Then MyMarkedString = True 
      MyFileCopy += foundPart.Value 
     ElseIf MyTextOpened AndAlso MyMarkedString Then 
      If foundPart.Value.Contains("]") AndAlso Not foundPart.Value.Contains("[") Then MyMarkedString = False 
      MyFileCopy += foundPart.Value 
     ElseIf Not MyTextOpened And MyMarkedString Then 
      MyFileCopy += "" 
     Else 
      MyFileCopy += foundPart.Value 
     End If 
    End If 
Next 
File.WriteAllText(pCopyName, MyFileCopy)

來源

2009-11-23 brovar

可能我建議另一種方法：讀取XML作爲一個純粹的字符串，刪除所有的XML元素和檢查所生成的字符串。

Imports System.IO 
Imports System.text.RegularExpressions 

Dim readFile As String = File.ReadAlltext("yourPathToFile.doc") 
readFile = Regex.Replace(readFile, "<[a-zA-Z0-9/:]+>", String.Empty) 

For Each foundPart As Match In Regex.Matches(readFile, "\[[a-zA-Z0-9]+\]") 
     ' do something here with the things we found' 
Next

可能需要一些額外的東西，f.e.更換空間等

編輯：是的，我明白，正則表達式表達是遠遠不夠完善這個...

EDIT2：RegEx to remove XML Tags with content

來源

2009-11-23 10:10:33 Bobby

其實我試圖想出一些正則表達式，我可以用 - 希望這不是一個黑暗的角落;） – brovar 2009-11-23 10:36:19

我也發現了這個問題，也許它可以幫助：HTTP：// stackoverflow.com/questions/121656/regular-expression-to-remove-xml-tags-and-their-content – Bobby 2009-11-23 10:52:57