解析Java中的文本文件以獲取字段的HashMap

-1

我試圖解析多個文件並將它們分成一組HashMap中的字段。這是一個樣本文件。解析Java中的文本文件以獲取字段的HashMap

COCONUT OIL CONTRACT TO CHANGE - DUTCH TRADERS 

    ROTTERDAM, March 18 - Contract terms for trade in coconut 
oil are to be changed from long tons to tonnes with effect from 
the Aug/Sep contract onwards, Dutch vegetable oil traders said. 
    Operators have already started to take account of the 
expected change and reported at least one trade in tonnes for 
Aug/Sept shipment yesterday.

我需要的程序，這個文檔解析爲一個自定義文檔類具有鍵，文件名，文件名稱，地點，日期，作者，內容，類別字段中。

這是我嘗試過的。

public static Document parse(String filename) { 

     File f = new File(filename); 

     if (f.isFile()){ 



      String fileId; 
      if (filename.indexOf(".") > 0) { 
       fileId = filename.substring(0, filename.lastIndexOf(".")); 
      } 
      String category = f.getParent(); 

      InputStream in = new FileInputStream(f); 

      byte buf[] = new byte[1024]; 
      int len = in.read(buf); 
      while(len > 0){ 
       .......... 
      } 
      in.close(); 
     } 


     return null; 
    }

來源

2014-09-19 Umar Gul

我很抱歉你試圖在這裏完成？：O – 2014-09-19 19:18:44

那麼，這是一個開始，但很難以相同的方式繼續。如果我是你，我現在不再編寫代碼，首先找出需要採取的高級步驟。把這些步驟寫在一張紙上。 '1。將文件完全讀入字符串。 2.提取文件標題...等等。然後你可以開始一步一步編碼，在每一步之後測試結果。 – biziclop 2014-09-19 19:20:17

下面的代碼可以幫助你：

try { 
     FileInputStream fstream = new FileInputStream("myFile.txt"); 
     DataInputStream in = new DataInputStream(fstream); 
     BufferedReader br = new BufferedReader(new InputStreamReader(in)); 
     StringBuffer contentBuffer = new StringBuffer(); 
     String line = null; 
     boolean foundTitle = false; 
     boolean foundPlaceAndDate = false; 
     String date = ""; 
     while ((line = br.readLine()) != null) { 
      if (line.matches("^[a-z-A-Z0-9].*") && !foundTitle) { 
       // If line starts with a letter or number and has no title yet, that's the title 
       System.out.println("Title: " + line); 
       foundTitle = true; 
      } else if (line.matches("^[\\ \t].*") && !foundPlaceAndDate) { 
       // If line starts with a space or tab and it's out first paragraph, then this paragraph has place and date 
       System.out.println("Place: " + line.trim().substring(0, line.trim().indexOf(","))); 
       date = line.trim().substring(line.trim().indexOf(",") + 1, line.trim().indexOf("-")).trim(); 
       System.out.println("Date: " + date); 
       foundPlaceAndDate = true; 
      } 
      contentBuffer.append(line); 
     } 

     String content = contentBuffer.toString().substring(contentBuffer.toString().indexOf(date) + date.length() + 2).trim(); 
     System.out.println("Content: " + content); 

     br.close(); 
     fstream.close(); 
    } catch (Exception e) { 
     System.err.println("Oh no! I got the following error: " + e.getMessage()); 
    }

輸出將是：

標題：椰子油合同變更 - 荷蘭商人

地點： ROTTERDAM

日期：3月18日

內容：貿易在椰子油合同條款將被從長噸改爲噸，起fromthe八月/九月合同的效力，荷蘭植物油貿易商稱。運營商已經開始考慮預期的變化，並且昨天至少報告了一次交易的噸數。

來源

2014-09-19 19:46:57 shimatai

這確實讓我開始了，但我需要將該文件解析爲文檔類，它看起來像this.public類文檔{0} {0} {0} {0} \t \t \t公共文獻（）{ \t \t地圖=新的HashMap （）; \t} \t \t \t \t 公共無效setField（FIELDNAMES FN，字符串... O）{ \t \t map.put（FN，O）; \t} \t \t \t \t \t公共字符串[] getfield命令（FIELDNAMES FN）{ \t \t返回map.get（FN）; \t} } – 2014-09-19 19:52:27

現在您只需填寫Document類的字段即可。例如：'Document document = new Document（）; document.setField（「title」，title）;' – shimatai 2014-09-22 18:10:59

解析Java中的文本文件以獲取字段的HashMap

回答

相關問題