Java：從目錄中的文本文件中讀取，從互聯網上

有誰知道如何遞歸地從Java中的特定目錄中讀取文件？我想從這個網頁目錄中的所有文本文件閱讀：http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/Java：從目錄中的文本文件中讀取，從互聯網上

我知道如何在多個文件，這些文件在我的電腦上的文件夾中閱讀，我如何從互聯網上一個單獨的文件讀取。但是我怎樣才能讀取互聯網上的多個文件，而不需要對網址進行硬編碼？

的東西，我試過：

// List the files on my Desktop 
final File folder = new File("/Users/crystal/Desktop"); 
File[] listOfFiles = folder.listFiles(); 

for (int i = 0; i < listOfFiles.length; i++) { 
    File fileEntry = listOfFiles[i]; 
    if (!fileEntry.isDirectory()) { 
     System.out.println(fileEntry.getName()); 
    } 
}

另一件事我想：

// Reading data from the web 
try 
{ 
    // Create a URL object 
    URL url = new URL("http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/5_1_1.txt"); 

    // Read all of the text returned by the HTTP server 
    BufferedReader in = new BufferedReader (new InputStreamReader(url.openStream())); 

    String htmlText;  // String that holds current file line 

    // Read through file one line at a time. Print line 
    while ((htmlText = in.readLine()) != null) 
    { 
     System.out.println(htmlText); 
    } 
    in.close(); 
} catch (MalformedURLException e) { 
    e.printStackTrace(); 
} catch (IOException e) { 
    // If another exception is generated, print a stack trace 
    e.printStackTrace(); 
}

謝謝！

來源

2011-05-29 Crystal

解析html並讀取文件的URL。 HTMLUnit可能會有所幫助。 – Endophage 2011-05-29 03:16:12

[Looking for a simple Java spider]（http://stackoverflow.com/questions/4903363/looking-for-a-simple-java-spider） – 2011-05-29 03:19:59

「http：//www.cs.ucdavis .. .170-S11/Female/「哇，那些自稱'水晶'的小夥子現在已經迫切需要爲女性拖網（或者更確切地說是服務器上的目錄）嗎？ ;） – 2011-05-29 03:52:00

由於您提到的URL已啓用索引，因此您很幸運。您在這裏有幾個選項。

解析html以使用SAX2或任何其他XML解析器來查找a標籤的屬性。 htmlunit也會工作，我認爲。
使用一點點正則表達式魔法來匹配<a href="和">之間的所有字符串，並將其用作url的讀取地址。

一旦你得到了你需要的所有URL列表，那麼第二段代碼應該可以正常工作。只需遍歷列表，然後從該列表構造您的URL。

這裏有一個示例正則表達式應該匹配你想要的。它確實捕獲了一些額外的鏈接，但你應該能夠過濾掉這些鏈接。

<a\ href="(.+?)">

來源

2011-05-29 03:32:44

謝謝！我認爲這正是我需要的。 – Crystal 2011-05-29 03:45:58

沒問題。樂意效勞。 – 2011-05-29 03:59:25

強制性[「不要用正則表達式解析html」]（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454）評論。雖然在這種情況下，我肯定它的罰款，因爲它只是一個頁面:) – luke 2011-05-29 06:20:00

Java：從目錄中的文本文件中讀取，從互聯網上

回答

相關問題