python -docx從word docx中提取表格

我知道這是一個重複的問題，但這些答案不適用於我。我有一個word文件，其中包含一張表，現在我希望該表格作爲我的python程序的輸出。我使用Python 3.6，我也安裝了python -docx。這裏是我的數據提取代碼python -docx從word docx中提取表格

from docx.api import Document 

document = Document('test_word.docx') 
table = document.tables[0] 

data = [] 

keys = None 
for i, row in enumerate(table.rows): 
    text = (cell.text for cell in row.cells) 

    if i == 0: 
     keys = tuple(text) 
     continue 
    row_data = dict(zip(keys, text)) 
    data.append(row_data) 
    print (data)

我想要的結果什麼看起來在word docx文件。在此先感謝

來源

2017-10-07 Arun Baskar

問題是？錯誤？ –

我試過你的代碼，它適用於我。 –

您的代碼適合我。將它插入數據框怎麼樣？

import pandas as pd 
from docx.api import Document 

document = Document('test_word.docx') 
table = document.tables[0] 

data = [] 

keys = None 
for i, row in enumerate(table.rows): 
    text = (cell.text for cell in row.cells) 

    if i == 0: 
     keys = tuple(text) 
     continue 
    row_data = dict(zip(keys, text)) 
    data.append(row_data) 
    print (data) 

df = pd.DataFrame(data)

如何在該表中顯示特定的行和列？我們可以提取但是基於索引行列數與ILOC

# iloc[row,columns] 
df.iloc[0,:].tolist() # [5,6,7,8] - row index 0 
df.iloc[:,0].tolist() # [5,9,13,17] - column index 0 
df.iloc[0,0] # 5 - cell(0,0) 
df.iloc[1:,2].tolist() # [11,15,19] - column index 2, but skip first row

等等...

，如果你的列都有名稱（在這種情況下，它是一個數字），你可以像下面這樣做：

#df["name"].tolist() 
df[1].tolist() # [5,6,7,8] - column with name 1

print(df)

版畫，這是怎樣的表看起來像我的樣本文檔。

1 2 3 4 
0 5 6 7 8 
1 9 10 11 12 
2 13 14 15 16 
3 17 18 19 20

來源

2017-10-07 09:41:03

謝謝你。很好的工作，我有另一個問題，我如何顯示該表中的特定行和列？ –

@ArunBaskar我會更新 –

你能否像以前發送的那樣粘貼完整的代碼。我對這段代碼感到困惑。例如如何從docx.api進口文獻編輯該代碼進口熊貓作爲PD 文檔=文檔（ 'test_word.docx'）表= document.tables [0] 數據= [] 鍵=無對於i，行中的枚舉（table.rows）：文本=（cell.text用於row.cells小區）如果我== 0：鍵=元組（文本）繼續 ROW_DATA =字典（ zip（keys，text）） data.append（row_data） print（data） df = pd.DataFra我（數據）打印（df） –

python -docx從word docx中提取表格

回答

相關問題