2011-10-11 49 views
1

我寫一個HttpModule,將在網頁中搜索出所有mailto鏈接,模糊的電子郵件地址和後續參數,然後將新混淆的字符串返回到HTML文檔。然後,我使用一些JavaScript來解開瀏覽器中的mailto鏈接,以便在用戶單擊鏈接時正常運行。的HttpModule問題:替換文本在網頁渲染

到目前爲止,我已經成功地混淆和未混淆,沒有任何問題的信息。我遇到的問題是將混淆的字符串放回到流中。如果一個mailto鏈接只有一個文檔中出現一次,然後它完美地放置混淆字符串代替mailto連結的,但如果有一個以上的郵寄地址鏈接,琴絃的位置是看似隨意。我敢肯定,它與正則表達式匹配索引的位置做作爲函數遍歷比賽,基本上增加通過流來的HTML的長度。我將在這裏發佈一些戰略編輯的代碼,以查看是否有人有關於如何正確定位混淆字符串的位置的想法。

我還張貼我沒有混淆的希望,它可能有助於有人試圖做同樣的事情串的工作。

public override void Write(byte[] buffer, int offset, int count) 
    { 
     byte[] data = new byte[count]; 
     Buffer.BlockCopy(buffer, offset, data, 0, count); 
     string html = System.Text.Encoding.Default.GetString(buffer); 

     //--- Work on the HTML from the page. We want to pass it through the 
     //--- obfusication function before it is sent to the browser. 
     html = html.Replace(html, obfuscate(html)); 

     byte[] outdata = System.Text.Encoding.Default.GetBytes(html); 
     _strmHTML.Write(outdata, 0, outdata.GetLength(0)); 
    } 


protected string obfuscate(string input) 
    { 

     //--- Declarations 
     string email = string.Empty; 
     string obsEmail = string.Empty; 
     string matchedEMail = string.Empty; 
     int matchIndex = 0; 
     int matchLength = 0; 

     //--- This is a REGEX to grab any "a href=mailto" tags in the document. 
     MatchCollection matches = Regex.Matches(input, @"<a href=""mailto:[a-zA-Z0-9\.,|\-|[email protected]?= &]*"">", RegexOptions.Singleline | RegexOptions.IgnoreCase); 

     //--- Because of the nature of doing a match search with regex, we must now loop through the results 
     //--- of the MatchCollection. 
     foreach (Match match in matches) 
     { 

      //--- Get the match string 
      matchedEMail = match.ToString(); 
      matchIndex = match.Index; 
      matchLength = match.Length; 

      //--- Obfusicate the matched string. 
      obsEmail = obfusucateEmail(@match.Value.ToString()); 

      //--- Reform the entire HTML stream. THis has to be added back in at the right point. 
      input = input.Substring(0, matchIndex) + obsEmail + input.Substring(matchIndex + matchLength);     
     } 

     //--- Return the obfuscated result. 
     return input; 
    } 



protected string obfusucateEmail(string input) 
    { 

     //--- Declarations 
     string email = string.Empty; 
     string obsEmail = string.Empty; 

     //--- Reset these value, in case we find more than one match. 
     email = string.Empty; 
     obsEmail = string.Empty; 

     //--- Get the email address out of the array 
     email = @input; 

     //--- Clean up the string. We need to get rid of the beginning of the tag, and the end >. First, 
     //--- let's flush out all quotes. 
     email = email.Replace("\"", ""); 

     //--- Now, let's replace the beginning of the tag. 
     email = email.Replace("<a href=mailto:", ""); 

     //--- Finally, let's get rid of the closing tag. 
     email = email.Replace(">", ""); 


     //--- Now, we have a cleaned mailto string. Let's obfusicate it. 
     Array matcharray = email.ToCharArray(); 

     //--- Loop through the CharArray and encode each letter. 
     foreach (char letter in matcharray) 
     { 
      //Convert each letter of the address to the corresponding ASCII code. 
      //Add XX to each value to break the direct ASCII code to letter mapping. We'll deal 
      // with subtracting XX from each number on the JavaScript side. 
      obsEmail += Convert.ToInt32((letter) + 42).ToString() + "~"; 
     } 

     //--- Before we return the obfusicated value, we need to reform the tag. 
     //--- Remember, up above, we stripped all this out. Well now, we need 
     //--- to add it again. 
     obsEmail = "<a href=\"mailto:" + obsEmail + "\">"; 

     return obsEmail; 
    } 

我很欣賞任何想法!

謝謝, 邁克

+0

HI邁克u能在這裏分享您的完整代碼? –

回答

1

你能做的就是利用匹配評估在你的正則表達式的另一件事....

protected string ObfuscateUsingMatchEvaluator(string input) 
{ 
      var re = new Regex(@"<a href=""mailto:[a-zA-Z0-9\.,|\-|[email protected]?= &]*"">",   RegexOptions.IgnoreCase | RegexOptions.Multiline); 
      return re.Replace(input, DoObfuscation); 

} 

protected string DoObfuscation(Match match) 
{ 
     return obfusucateEmail(match.Value); 
} 
+0

而且完美的作品。謝謝! – Mike

0

根據您的性能需求(取決於您的文檔的大小除其他事項外),你可能會考慮使用HTML Agility Pack,而不是你的正則表達式解析和操縱你的HTML。您可以使用Linq to Objects或XPath來識別所有的mailto標籤。

你應該能夠修改下面的例子中(從the codeplex wiki page)找到的mailto標籤:

HtmlDocument doc = new HtmlDocument(); 
doc.Load("file.htm"); 
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"]) 
{ 
    HtmlAttribute att = link["href"]; 
    if (att.Value.StartsWith("mailto:") EncryptValue(att); 
} 
doc.Save("file.htm"); 
+0

嘿菲利普,我確實看過那個產品。沒有足夠的文件或有效的樣本讓我嘗試將它帶入我們的環境中。我們有大客戶網站,我不能冒險。 – Mike

+0

@Mike增加了一個例子。代碼位於單個程序集中,您可以從項目中進行部署和引用。它被廣泛用於這些類型的任務。 –