2015-11-05 203 views
2

用C#翻譯帶有Microsoft Translator API(SOAP)的網頁。我想翻譯我的網站,但使用翻譯器小工具對我並不好,因爲我需要谷歌來抓取翻譯過的頁面。所以我需要在將它發送到瀏覽器之前進行翻譯。使用Microsoft Translator API將整個網頁翻譯爲C#

到目前爲止,沒有API(我試過找到它,我不能,如果你碰巧知道一個請提及),你可以通過一個網址,它會給你發送這樣的翻譯響應:http://www.microsofttranslator.com/bv.aspx?from=&to=nl&a=http%3A%2F%2Fwww.imdb.com%2F

這些是我到目前爲止的嘗試: 1.從Url下載字符串,傳遞給Client.Translate(..)。在反序列化請求消息的主體進行操作 「翻譯」錯誤:

格式化器,而試圖反序列化 消息引發了異常。在讀取XML數據時超過了最大字符串內容長度配額(30720) 。此配額可能會增加 更改創建XML閱讀器時使用的 XmlDictionaryReaderQuotas對象上的MaxStringContentLength屬性。 516 線,位置48

2.

private static void processDocument(HtmlAgilityPack.HtmlDocument html, LanguageServiceClient Client) 
     { 
      HtmlNodeCollection coll = html.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']"); 
      foreach (HtmlNode node in coll) 
      { 
       if (node.InnerText == node.InnerHtml) 
       { 
        //node.InnerHtml = translateText(node.InnerText); 
        node.InnerHtml = Client.Translate("", node.InnerText, "en", "fr", "text/html", "general"); 
       } 
      } 

     } 

這一個這樣服用了太多的時間。最後,我得到一個錯誤的請求(400)異常。

解決此問題的最佳方法是什麼?我還計劃保存這些文件,以便我不必每次翻譯。

回答

1

此C#示例從本地文件轉換HTML:

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 
using System.Threading.Tasks; 
using System.IO; 
using HtmlAgilityPack; 

namespace TranslationAssistant.Business 
{ 
class HTMLTranslationManager 
{ 
    public static int DoTranslation(string htmlfilename, string fromlanguage, string tolanguage) 
    { 
     string htmldocument = File.ReadAllText(htmlfilename); 
     string htmlout = string.Empty; 

     HtmlDocument htmlDoc = new HtmlDocument(); 
     htmlDoc.LoadHtml(htmldocument); 
     htmlDoc.DocumentNode.SetAttributeValue("lang", TranslationServices.Core.TranslationServiceFacade.LanguageNameToLanguageCode(tolanguage)); 
     var title = htmlDoc.DocumentNode.SelectSingleNode("//head//title"); 
     if (title != null) title.InnerHtml = TranslationServices.Core.TranslationServiceFacade.TranslateString(title.InnerHtml, fromlanguage, tolanguage, "text/html"); 
     var body = htmlDoc.DocumentNode.SelectSingleNode("//body"); 
     if (body != null) 
     { 
      if (body.InnerHtml.Length < 10000) 
      { 
       body.InnerHtml = TranslationServices.Core.TranslationServiceFacade.TranslateString(body.InnerHtml, fromlanguage, tolanguage, "text/html"); 
      } 
      else 
      { 
       List<HtmlNode> nodes = new List<HtmlNode>(); 
       AddNodes(body.FirstChild, ref nodes); 

       Parallel.ForEach(nodes, (node) => 
        { 
         if (node.InnerHtml.Length > 10000) 
         { 
          throw new Exception("Child node with a length of more than 10000 characters encountered."); 
         } 
         node.InnerHtml = TranslationServices.Core.TranslationServiceFacade.TranslateString(node.InnerHtml, fromlanguage, tolanguage, "text/html"); 
        }); 
      } 
     } 
     htmlDoc.Save(htmlfilename, Encoding.UTF8); 
     return 1; 
    } 

    /// <summary> 
    /// Add nodes of size smaller than 10000 characters to the list, and recurse into the bigger ones. 
    /// </summary> 
    /// <param name="rootnode">The node to start from</param> 
    /// <param name="nodes">Reference to the node list</param> 
    private static void AddNodes(HtmlNode rootnode, ref List<HtmlNode> nodes) 
    { 
     string[] DNTList = { "script", "#text", "code", "col", "colgroup", "embed", "em", "#comment", "image", "map", "media", "meta", "source", "xml"}; //DNT - Do Not Translate - these nodes are skipped. 
     HtmlNode child = rootnode; 
     while (child != rootnode.LastChild) 
     { 
      if (!DNTList.Contains(child.Name.ToLowerInvariant())) { 
       if (child.InnerHtml.Length > 10000) 
       { 
        AddNodes(child.FirstChild, ref nodes); 
       } 
       else 
       { 
        if (child.InnerHtml.Trim().Length != 0) nodes.Add(child); 
       } 
      } 
      child = child.NextSibling; 
     } 
    } 

} 
} 

這是http://github.com/microsofttranslator/documenttranslator HTMLTranslationManager.cs,它使用在TranslationServiceFacade.cs輔助函數TranslateString()。您可以簡化並在此處插入翻譯服務調用代替TranslateString()。