2010-11-23 56 views
1

我使用HTTPS在Android上抓取網頁(忽略證書,因爲它既是自簽名的,也是過時的,如here-不要問,它不是我的服務器:))。Android獲取HTTPS頁面截斷

我定義我的

public class MyHttpClient extends DefaultHttpClient { 


    public MyHttpClient() { 
     super(); 
     final HttpParams params = getParams(); 
     HttpConnectionParams.setConnectionTimeout(params, 
       REGISTRATION_TIMEOUT); 
     HttpConnectionParams.setSoTimeout(params, REGISTRATION_TIMEOUT); 
     ConnManagerParams.setTimeout(params, REGISTRATION_TIMEOUT); 
    } 

    @Override 
    protected ClientConnectionManager createClientConnectionManager() { 
     SchemeRegistry registry = new SchemeRegistry(); 
     registry.register(new Scheme("http", PlainSocketFactory 
       .getSocketFactory(), 80)); 
     registry.register(new Scheme("https", new UnsecureSSLSocketFactory(), 443)); 
     return new SingleClientConnManager(getParams(), registry); 
    } 
} 

其中UnsecureSSLSocketFactory提到的是基於上述topic給出的建議。

我再使用此類fecth頁面

public class HTTPHelper { 

    private final static String TAG = "HTTPHelper"; 
    private final static String CHARSET = "ISO-8859-1"; 

    public static final String USER_AGENT = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)"; 
    public static final String ACCEPT_CHARSET = "ISO-8859-1,utf-8;q=0.7,*;q=0.7"; 
    public static final String ACCEPT = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"; 


    /** 
    * Sends an HTTP request 
    * @param url 
    * @param post 
    * @return 
    */ 
    public String sendRequest(String url, String post) throws ConnectionException { 

     MyHttpClient httpclient = new MyHttpClient(); 

     HttpGet httpget = new HttpGet(url); 
     httpget.addHeader("User-Agent", USER_AGENT); 
     httpget.addHeader("Accept", ACCEPT); 
     httpget.addHeader("Accept-Charset", ACCEPT_CHARSET); 

     HttpResponse response; 
     try { 
      response = httpclient.execute(httpget); 
     } catch (Exception e) { 
      throw new ConnectionException(e.getMessage()); 
     } 

     HttpEntity entity = response.getEntity(); 

     try { 
      pageSource = convertStreamToString(entity.getContent()); 
     } catch (Exception e) { 
      throw new ConnectionException(e.getMessage()); 
     } 
     finally { 
      if (entity != null) { 
       try { 
        entity.consumeContent(); 
       } catch (IOException e) { 
        throw new ConnectionException(e.getMessage()); 
       } 
      } 
     } 

     httpclient.getConnectionManager().shutdown(); 
     return pageSource; 

    } 

    /** 
    * Converts a stream to a string 
    * @param is 
    * @return 
    */ 
    private static String convertStreamToString(InputStream is) 
    { 
     try { 
      BufferedReader reader = new BufferedReader(new InputStreamReader(is, CHARSET)); 
      StringBuilder stringBuilder = new StringBuilder(); 
      String line = null; 
      try { 
       while ((line = reader.readLine()) != null) { 
        stringBuilder.append(line + "\n"); 
       } 
      } catch (IOException e) { 
       Log.d(TAG, "Exception in convertStreamToString", e); 
      } finally { 
       try { 
        is.close(); 
       } catch (IOException e) {} 
      } 
      return stringBuilder.toString(); 
     } catch (Exception e) { 
      throw new Error("Unsupported charset"); 
     } 
    } 

} 

頁我得到後約一個行幾百被截斷。它在一個精確點截斷,'_'(下劃線)字符後跟一個'r'字符。這不是頁面中的第一個下劃線。

我認爲這可能是一個編碼問題,所以我嘗試了UTF-8和ISO-8859-1,但它仍然被截斷。如果我用Firefox打開頁面,它會報告編碼爲ISO-8851-1。

如果你想知道,該網頁是https://ricarichiamoci.dsu.pisa.it/ 並且它能夠在線路169截斷,

function ChangeOffset(NewOffset) { 
    document.mainForm.last 

地方應該改爲

function ChangeOffset(NewOffset) { 
    document.mainForm.last_record.value = NewOffset; 

沒有人有,爲什麼一個想法頁面被截斷?

回答

7

我想通過下載的頁面不被截斷,但是我用來打印出來的函數(Log.d)會截斷字符串。

因此,下載頁面源代碼的方法工作正常,但Log.d()可能不打算輸出太多的文本。