谷歌只是忽略AcceptCharset
的頭部信息編碼和ISO-8859-1
返回響應,因爲你可以從縮短響應看到:
HTTP/1.1 200 OK
Content-Type: text/html; charset=ISO-8859-1
Content-Language: en
Content-Length: 64202
<!DOCTYPE html><html><head><meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">
因此,當你使用UTF-8編碼解碼響應,你就會得到無效字符。如果你只想讓它迅速開展工作,我發現,當User-Agent
頭添加到請求,谷歌返回UTF-8的反應,你可以離開的未修改休息:
private static string translate(string input, string languagePair)
{
string url = String.Format("https://translate.google.com/?hl=en&ie=UTF8&text={0}&langpair={1}", input, languagePair);
WebClient wc = new WebClient();
wc.Headers.Add(HttpRequestHeader.AcceptCharset, "utf-8");
wc.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/55.0");
wc.Encoding = Encoding.UTF8;
string result = wc.DownloadString(url);
int start = result.IndexOf("result_box");
string sub = result.Substring(start);
sub = sub.Substring(0, sub.IndexOf("</span>"));
start = sub.LastIndexOf(">");
sub = sub.Substring(start + 1);
return sub;
}
更好的方法是檢測編碼用於響應並將其用於解碼。 WebClient
沒有這個檢測內置的,所以你可以使用的解決方案描述here或自動使用HttpClient
代替,這爲您完成此:
private static async Task<string> translate(string input, string languagePair)
{
string url = String.Format("https://translate.google.com/?hl=en&ie=UTF8&text={0}&langpair={1}", input, languagePair);
using (var hc = new HttpClient())
{
var result = await hc.GetStringAsync(url).ConfigureAwait(false);
int start = result.IndexOf("result_box");
string sub = result.Substring(start);
sub = sub.Substring(0, sub.IndexOf("</span>"));
start = sub.LastIndexOf(">");
sub = sub.Substring(start + 1);
return sub;
}
}
同時請注意,谷歌已經Translation API,這可能會更好使用而不是從HTML頁面解析翻譯。
請給出輸入('input'和'languagePair') –
示例輸入:would,lp:en | de將返回wrde而不是wurr – koin