2017-07-06 49 views
0

提取的docx文件的文本,我想提取的docx文件中的文本,我一直在使用猛獁象如何使用的NodeJS

var mammoth = require("mammoth"); 
mammoth.extractRawText({path: "./doc.docx"}) 
    .then(function(result){ 
     var text = result.value; // The raw text 

     //this prints all the data of docx file 
     console.log(text); 

     for (var i = 0; i < text.length; i++) { 
      //this prints all the data char by char in separate lines 
      console.log(text[i]); 
     } 
     var messages = result.messages; 
    }) 
    .done(); 

嘗試,但這裏的問題是,在這個for循環我用線要數據線代替char by char,請在這裏幫助我,還是有其他方法,你知道嗎?

+0

你是什麼意思逐行?像單詞文檔的單個行或由換行符分隔的段落一樣? –

+0

像文檔的單獨行@ExplosionPills – iwayankit

+0

一種方法是將文本分割爲「\ n」s! – tashakori

回答

0

一種方法是獲取整個文本,然後通過'\n'分裂:

import superagent from 'superagent'; 
import mammoth from 'mammoth'; 

const url = 'http://www.ojk.ee/sites/default/files/respondus-docx-sample-file_0.docx'; 

const main = async() => { 

    const response = await superagent.get(url) 
    .parse(superagent.parse.image) 
    .buffer(); 

    const buffer = response.body; 

    const text = (await mammoth.extractRawText({ buffer })).value; 
    const lines = text.split('\n'); 

    console.log(lines); 
}; 

main().catch(error => console.error(error));