2017-03-06 46 views
0

我正在開發Java程序,目的是將從少數網站收集的數據集成到輸出中。我從這些網站獲得了API信息,並且我可以使用PHP輕鬆運行它們,但是當我嘗試使用Java時,只有其中一個網站存在一個奇怪的問題。我已經設置了遵循重定向的代碼,但是如果我試圖訪問https://www.foo.com,它會將我指向127.0.0.1。它可以做到這一點,無論我使用的協議或包含/不包含www。如果我拿出重定向代碼,我會生成一個通用的「永久移動」頁面。Java HttpGet重定向到127.0.0.1

這裏是我使用

public static void main(String args[]) throws IOException { 

    String URLString = "http://www.sickw.com/"; 
    URL url = new URL(URLString); 
    URLConnection connection = url.openConnection(); 
    System.out.println(url.toString()); //See what URL is being used 
    String redirect = connection.getHeaderField("Location"); 
    while (redirect!=null) 
    { 
     System.out.println(redirect);  //Follow the redirects 
     connection = new URL(redirect).openConnection(); 
     redirect = connection.getHeaderField("Location"); 
    } 
    System.out.println("new " + connection.getURL().toString()); //Print the final destination 

    InputStreamReader inputStreamReader = new InputStreamReader(connection.getInputStream()); 
    int temp = inputStreamReader.read(); 
    while(temp!=-1) 
    { 
     System.out.print((char)temp); 
     temp = inputStreamReader.read(); 
    } 
} 
+0

也許該網站有機器人檢測和重定向您根據您的用戶代理字符串。 –

回答

0
package main; 

import java.io.BufferedReader; 
import java.io.IOException; 
import java.io.InputStreamReader; 
import java.net.HttpURLConnection; 
import java.net.URL; 

public class Main { 

    public static void main(String[] args) { 

     try { 

      String url = "https://sickw.com"; 

      URL obj = new URL(url); 
      HttpURLConnection conn = (HttpURLConnection) obj.openConnection(); 
      conn.setReadTimeout(5000); 
      conn.addRequestProperty("Accept-Language", "en-US,en;q=0.8"); 
      conn.addRequestProperty("User-Agent", "Mozilla"); 
      conn.addRequestProperty("Referer", "google.com"); 

      System.out.println("Requested URL -> " + url); 

      boolean redirect = false; 

      int status = conn.getResponseCode(); 
      if (status != HttpURLConnection.HTTP_OK) { 
       if (status == HttpURLConnection.HTTP_MOVED_TEMP 
         || status == HttpURLConnection.HTTP_MOVED_PERM 
         || status == HttpURLConnection.HTTP_SEE_OTHER) { 
        redirect = true; 
       } 
      } 

      System.out.println("Response Code -> " + status); 

      if (redirect) { 

       String newUrl = conn.getHeaderField("Location"); 

       String cookies = conn.getHeaderField("Set-Cookie"); 

       conn = (HttpURLConnection) new URL(newUrl).openConnection(); 
       conn.setRequestProperty("Cookie", cookies); 
       conn.addRequestProperty("Accept-Language", "en-US,en;q=0.8"); 
       conn.addRequestProperty("User-Agent", "Mozilla"); 
       conn.addRequestProperty("Referer", "google.com"); 

       System.out.println("Redirect to URL -> " + newUrl); 

      } 

      StringBuilder html; 
      try (BufferedReader in = new BufferedReader(
        new InputStreamReader(conn.getInputStream()))) { 
       String inputLine; 
       html = new StringBuilder(); 
       while ((inputLine = in.readLine()) != null) { 
        html.append(inputLine); 
        html.append("\n"); 
       } 
      } 

      System.out.println("URL Content -> \n" + html.toString()); 
      System.out.println("Completed"); 

     } catch (IOException e) { 
     } 

    } 

} 

這工作,但使用「https://foo.com

+0

謝謝,那個作品完全像我需要的一樣。 –

+0

不客氣,如果它適合你,你可以將其標記爲正確答案! – RKJ

0

我複製,粘貼,並在我的電腦上運行代碼,並得到正確的網頁字符串(http://www.foo.com/) 我覺得你的代碼有沒有問題的代碼。所以請檢查下面的內容。

  • ping命令行中的測試。即)ping www.foo.com
  • 檢查hosts文件是否包含重定向代碼。 即)127.0.0.1 www.foo.com
+0

我ping了主機,它看起來工作得很好。有一件事我注意到,無論我是ping sickw.com還是www.sickw.com,結果都顯示sickw.com。我通過與google.com和www.google.com做同樣的事情來檢查它,並給出了兩種不同的顯示。我認爲我的輸入流閱讀器仍在讀取使用重定向代碼發送的文件,但它在嘗試讀取時會引發FileNotFoundException。這只是一個奇怪的網站,或者有什麼我失蹤? –