有沒有一種方法可以從FCKEditor中去除所有不必要的MS Word格式化

我已經安裝了fckeditor，並且從MS Word粘貼時會添加大量不必要的格式。我想保留一些像大膽，斜體，公牛等等的東西。我已經搜索了網絡，並提出瞭解決方案，即使是我想保留的東西，如大膽和斜體的東西。有沒有辦法剝離不必要的文字格式？有沒有一種方法可以從FCKEditor中去除所有不必要的MS Word格式化

來源

2009-08-28 user161433

任何誰的曾經保持了CMS知道哪些你所說的罪惡。祝你好運找到答案。我們只是讓他們從單詞粘貼，然後我有一個程序，從數據庫中刪除不可顯示的字符。 – Steve 2009-08-29 16:38:26

下面是我用它來擦洗從傳入HTML的解決方案富文本編輯器......它是用VB.NET編寫的，我沒有時間轉換爲C＃，但它非常簡單：

Public Shared Function CleanHtml(ByVal html As String) As String 
    '' Cleans all manner of evils from the rich text editors in IE, Firefox, Word, and Excel 
    '' Only returns acceptable HTML, and converts line breaks to <br /> 
    '' Acceptable HTML includes HTML-encoded entities. 
    html = html.Replace("&" & "nbsp;", " ").Trim() ' concat here due to SO formatting 
    '' Does this have HTML tags? 
    If html.IndexOf("<") >= 0 Then 
     '' Make all tags lowercase 
     html = RegEx.Replace(html, "<[^>]+>", AddressOf LowerTag) 
     '' Filter out anything except allowed tags 
     '' Problem: this strips attributes, including href from a 
     '' http://stackoverflow.com/questions/307013/how-do-i-filter-all-html-tags-except-a-certain-whitelist 
     Dim AcceptableTags  As String = "i|b|u|sup|sub|ol|ul|li|br|h2|h3|h4|h5|span|div|p|a|img|blockquote" 
     Dim WhiteListPattern As String = "</?(?(?=" & AcceptableTags & ")notag|[a-zA-Z0-9]+)(?:\s[a-zA-Z0-9\-]+=?(?:([""']?).*?\1?)?)*\s*/?>" 
     html = Regex.Replace(html, WhiteListPattern, "", RegExOptions.Compiled) 
     '' Make all BR/br tags look the same, and trim them of whitespace before/after 
     html = RegEx.Replace(html, "\s*<br[^>]*>\s*", "<br />", RegExOptions.Compiled) 
    End If 
    '' No CRs 
    html = html.Replace(controlChars.CR, "") 
    '' Convert remaining LFs to line breaks 
    html = html.Replace(controlChars.LF, "<br />") 
    '' Trim BRs at the end of any string, and spaces on either side 
    Return RegEx.Replace(html, "(<br />)+$", "", RegExOptions.Compiled).Trim() 
End Function 

Public Shared Function LowerTag(m As Match) As String 
    Return m.ToString().ToLower() 
End Function

在你的情況，你要修改的「認可」，在「AcceptableTags」 HTML標記列表 - 該代碼將仍然去除所有無用的屬性（和，不幸的是，有用的像HREF和SRC，希望這些對你不重要）。

當然，這需要一趟到服務器。如果你不想這樣做，你需要在調用JavaScript的工具欄上添加某種「清理」按鈕來混淆編輯器的當前文本。不幸的是，「粘貼」不是一個可以自動清理標記的事件，每次OnChange之後的清理都會導致不可用的編輯器（因爲更改標記會更改文本光標位置）。

來源

2009-08-28 23:35:11 richardtallent

哇......這真棒。但我確實需要鏈接和基本的html標籤 – user161433 2009-08-29 00:59:47

但fckeditor是，正如名稱和網站建議，文本編輯器。對我來說，這意味着它只顯示文件中的字符。

不能有粗體和斜體格式沒有一些額外的字符。

編輯：啊，我明白了。仔細查看Fckeditor網站，它是一個HTML編輯器，而不是我習慣的簡單文本編輯器之一。

有Paste from Word cleanup with autodetection列爲功能。

來源

2009-08-28 23:20:35 pavium

pavium，fckeditor是一個RICH TEXT編輯器，它將所有使用可編輯DIV的漂亮文摘都摘錄出來，並添加漂亮的工具欄。在引擎蓋下，它存儲在HTML中，這意味着當有人從Word中粘貼時，Word將它傳遞給各種各樣的HTML邪惡。 – richardtallent 2009-08-28 23:39:59

我很理解這個問題。當複製出MS-Word（或任何文字處理或富文本編輯的文本區域）然後粘貼到FCKEditor中時（TinyMCE也會出現同樣的問題），原始標記將包含在剪貼板中的內容中並進行處理。這個標記並不總是與它嵌入到粘貼操作目標中的標記互補。

我不知道除了成爲FCKEditor的貢獻者並研究代碼並進行修改以外的解決方案。我通常所做的是指導用戶執行兩階段剪貼板操作。從MS-Word中

粘貼

複製到記事本
從記事本中選擇所有
複製
粘貼到FCKEDITOR

來源

2009-08-28 23:29:37 Glenn

萬一有人想接受的答案的C＃版本：

public string CleanHtml(string html) 
    { 
     //Cleans all manner of evils from the rich text editors in IE, Firefox, Word, and Excel 
     // Only returns acceptable HTML, and converts line breaks to <br /> 
     // Acceptable HTML includes HTML-encoded entities. 

     html = html.Replace("&" + "nbsp;", " ").Trim(); //concat here due to SO formatting 
     // Does this have HTML tags? 

     if (html.IndexOf("<") >= 0) 
     { 
      // Make all tags lowercase 
      html = Regex.Replace(html, "<[^>]+>", delegate(Match m){ 
       return m.ToString().ToLower(); 
      }); 
      // Filter out anything except allowed tags 
      // Problem: this strips attributes, including href from a 
      // http://stackoverflow.com/questions/307013/how-do-i-filter-all-html-tags-except-a-certain-whitelist 
      string AcceptableTags = "i|b|u|sup|sub|ol|ul|li|br|h2|h3|h4|h5|span|div|p|a|img|blockquote"; 
      string WhiteListPattern = "</?(?(?=" + AcceptableTags + @")notag|[a-zA-Z0-9]+)(?:\s[a-zA-Z0-9\-]+=?(?:([""']?).*?\1?)?)*\s*/?>"; 
      html = Regex.Replace(html, WhiteListPattern, "", RegexOptions.Compiled); 
      // Make all BR/br tags look the same, and trim them of whitespace before/after 
      html = Regex.Replace(html, @"\s*<br[^>]*>\s*", "<br />", RegexOptions.Compiled); 
     } 


     // No CRs 
     html = html.Replace("\r", ""); 
     // Convert remaining LFs to line breaks 
     html = html.Replace("\n", "<br />"); 
     // Trim BRs at the end of any string, and spaces on either side 
     return Regex.Replace(html, "(<br />)+$", "", RegexOptions.Compiled).Trim(); 
    }

來源

2011-04-26 15:18:53

嘗試接受的解決方案，但它沒有清理字生成的標籤。

但this code工作對我來說

靜態字符串CleanWordHtml（字符串HTML）{

StringCollection sc = new StringCollection(); 
// get rid of unnecessary tag spans (comments and title) 
sc.Add(@"<!--(\w|\W)+?-->"); 
sc.Add(@"<title>(\w|\W)+?</title>"); 
// Get rid of classes and styles 
sc.Add(@"\s?class=\w+"); 
sc.Add(@"\s+style='[^']+'"); 
// Get rid of unnecessary tags 
sc.Add(
@"<(meta|link|/?o:|/?style|/?div|/?st\d|/?head|/?html|body|/?body|/?span|!\[)[^>]*?>"); 
// Get rid of empty paragraph tags 
sc.Add(@"(<[^>]+>)+&nbsp;(</\w+>)+"); 
// remove bizarre v: element attached to <img> tag 
sc.Add(@"\s+v:\w+=""[^""]+"""); 
// remove extra lines 
sc.Add(@"(\n\r){2,}"); 
foreach (string s in sc) 
{ 
    html = Regex.Replace(html, s, "", RegexOptions.IgnoreCase); 
} 
return html; 
}

來源

2013-09-19 08:36:35

有沒有一種方法可以從FCKEditor中去除所有不必要的MS Word格式化

回答

相關問題