2010-08-31 153 views
1

我使用Apache POI HWPF來提取.doc文件,我發現提取的文本沒有章節號,可以用POI提取文本的章節號?如何在文本中提取.doc文件中的章節號?

public void readDocFile() { 
    File docFile = null; 
    WordExtractor docExtractor = null; 
    WordExtractor exprExtractor = null; 
    try { 
     docFile = new File("C:\\Documents and Settings\\Administrator\\Desktop\\Topo6.doc"); 
     // A FileInputStream obtains input bytes from a file. 
     FileInputStream fis = new FileInputStream(docFile.getAbsolutePath()); 

     // A HWPFDocument used to read document file from FileInputStream 
     HWPFDocument doc = new HWPFDocument(fis); 
     docExtractor = new WordExtractor(doc); 
    } catch (Exception exep) { 
     System.out.println(exep.getMessage()); 
    } 

    // This Array stores each line from the document file. 
    String text = docExtractor.getText(); 
    System.out.println(text); 


} 

回答

2

好吧,我明白了。

在office word中生成的.doc文件中的章節號是動態的,所以我必須得到每個段落的級別並自己計算章節號。