事實上,正如@MaxU指出,Excel文件是格式錯誤但有趣的是,在正確保存爲.xls時會解決問題x文件。可能地,通過簡單地將擴展名改爲.xlsx,試圖從先前的.xls版本升級無效文件。這兩種文件格式不是簡單的文本文件,可以更改擴展名而不會造成危害,但二進制格式非常不同
考慮使用wn32com
模塊運行COM接口,以使用Excel的Workbook.SaveAs方法將格式錯誤的文件正確保存到實際的OpenXML工作簿中。注意:此解決方案僅適用於安裝了MS Excel的Windows for Python。
import pandas as pd
import glob
import win32com.client as win32
xlsxfiles = glob.glob("C:\\Path\\To\\Workbooks\\*.xlsx")
def save_xlsx(srcfile):
try:
newfile = srcfile.replace('.xlsx', '_new.xlsx')
print('Malformed file saved as {}'.format(newfile))
xlApp = win32.gencache.EnsureDispatch('Excel.Application')
wb = xlApp.Workbooks.Open(srcfile)
wb.SaveAs(newfile, 51)
except Exception as e:
print(e)
finally:
wb.Close(True); wb = None
xlApp.Quit; xlApp = None
return newfile
def xl_read():
dfs = []
for f in xlsxfiles:
try:
df = pd.read_excel(f)
except Exception as e:
df = pd.read_excel(save_xlsx(f))
print('File: {}, Shape: {}'.format(f, df.shape))
dfs.append(df)
return pd.concat(dfs)
print('Final dataframe shape: {}'.format(xl_read().shape))
輸出
File: C:\Path\To\Workbooks\2016-06-20–2016-06-26.xlsx, Shape: (5912, 27)
File: C:\Path\To\Workbooks\2016-06-27–2016-07-03.xlsx, Shape: (5362, 27)
File: C:\Path\To\Workbooks\2016-07-04–2016-07-10.xlsx, Shape: (5387, 27)
File: C:\Path\To\Workbooks\2016-07-11–2016-07-17.xlsx, Shape: (5331, 28)
File: C:\Path\To\Workbooks\2016-08-01–2016-08-07.xlsx, Shape: (4965, 28)
Malformed file saved as C:\Path\To\Workbooks\2016-08-15–2016-08-21_new.xlsx
File: C:\Path\To\Workbooks\2016-08-15–2016-08-21.xlsx, Shape: (5315, 27)
File: C:\Path\To\Workbooks\2016-08-22–2016-08-28.xlsx, Shape: (5179, 27)
File: C:\Path\To\Workbooks\2016-08-29–2016-09-04.xlsx, Shape: (5855, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-09-05–2016-09-11_new.xlsx
File: C:\Path\To\Workbooks\2016-09-05–2016-09-11.xlsx, Shape: (5838, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-09-12–2016-09-18_new.xlsx
File: C:\Path\To\Workbooks\2016-09-12–2016-09-18.xlsx, Shape: (5729, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-09-19–2016-09-25_new.xlsx
File: C:\Path\To\Workbooks\2016-09-19–2016-09-25.xlsx, Shape: (6401, 27)
File: C:\Path\To\Workbooks\2016-09-26–2016-10-02.xlsx, Shape: (7018, 27)
File: C:\Path\To\Workbooks\2016-09.xlsx, Shape: (23874, 27)
File: C:\Path\To\Workbooks\2016-10-03–2016-10-09.xlsx, Shape: (6587, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-10-10–2016-10-12_new.xlsx
File: C:\Path\To\Workbooks\2016-10-10–2016-10-12.xlsx, Shape: (2883, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-10-10–2016-10-13_new.xlsx
File: C:\Path\To\Workbooks\2016-10-10–2016-10-13.xlsx, Shape: (4174, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-10-17–2016-10-20_new.xlsx
File: C:\Path\To\Workbooks\2016-10-17–2016-10-20.xlsx, Shape: (4560, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-10-17–2016-10-23_new.xlsx
File: C:\Path\To\Workbooks\2016-10-17–2016-10-23.xlsx, Shape: (7111, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-10-24–2016-10-27_new.xlsx
File: C:\Path\To\Workbooks\2016-10-24–2016-10-27.xlsx, Shape: (4921, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-10-24–2016-10-30_new.xlsx
File: C:\Path\To\Workbooks\2016-10-24–2016-10-30.xlsx, Shape: (8005, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-10-31–2016-11-06_new.xlsx
File: C:\Path\To\Workbooks\2016-10-31–2016-11-06.xlsx, Shape: (7029, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-10_new.xlsx
File: C:\Path\To\Workbooks\2016-10.xlsx, Shape: (28098, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-11-07–2016-11-13_new.xlsx
File: C:\Path\To\Workbooks\2016-11-07–2016-11-13.xlsx, Shape: (7076, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-11-14–2016-11-20_new.xlsx
File: C:\Path\To\Workbooks\2016-11-14–2016-11-20.xlsx, Shape: (7758, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-11-21_new.xlsx
File: C:\Path\To\Workbooks\2016-11-21.xlsx, Shape: (1689, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-11-21–2016-11-23_new.xlsx
File: C:\Path\To\Workbooks\2016-11-21–2016-11-23.xlsx, Shape: (4711, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-11-28–2016-12-04_new.xlsx
File: C:\Path\To\Workbooks\2016-11-28–2016-12-04.xlsx, Shape: (9286, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-11_new.xlsx
File: C:\Path\To\Workbooks\2016-11.xlsx, Shape: (30505, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-12-05–2016-12-11_new.xlsx
File: C:\Path\To\Workbooks\2016-12-05–2016-12-11.xlsx, Shape: (8802, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-12-12–2016-12-18_new.xlsx
File: C:\Path\To\Workbooks\2016-12-12–2016-12-18.xlsx, Shape: (8333, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-12-16–2016-12-22_new.xlsx
File: C:\Path\To\Workbooks\2016-12-16–2016-12-22.xlsx, Shape: (8592, 27)
Malformed file saved as C:\Path\To\Workbooks\2016-12-26–2016-12-31_new.xlsx
File: C:\Path\To\Workbooks\2016-12-26–2016-12-31.xlsx, Shape: (5362, 27)
Malformed file saved as C:\Path\To\Workbooks\2017-01-01–2017-01-08_new.xlsx
File: C:\Path\To\Workbooks\2017-01-01–2017-01-08.xlsx, Shape: (4322, 27)
Malformed file saved as C:\Path\To\Workbooks\2017-01-09–2017-01-15_new.xlsx
File: C:\Path\To\Workbooks\2017-01-09–2017-01-15.xlsx, Shape: (7608, 27)
Malformed file saved as C:\Path\To\Workbooks\2017-01-23–2017-01-29_new.xlsx
File: C:\Path\To\Workbooks\2017-01-23–2017-01-29.xlsx, Shape: (8903, 27)
Malformed file saved as C:\Path\To\Workbooks\2017-01-30–2017-02-05_new.xlsx
File: C:\Path\To\Workbooks\2017-01-30–2017-02-05.xlsx, Shape: (9173, 27)
Malformed file saved as C:\Path\To\Workbooks\2017-02-13–2017-02-12_new.xlsx
File: C:\Path\To\Workbooks\2017-02-13–2017-02-12.xlsx, Shape: (9144, 27)
Malformed file saved as C:\Path\To\Workbooks\2017-02-13–2017-02-19_new.xlsx
File: C:\Path\To\Workbooks\2017-02-13–2017-02-19.xlsx, Shape: (9911, 27)
File: C:\Path\To\Workbooks\test.xlsx, Shape: (5315, 27)
Malformed file saved as C:\Path\To\Workbooks\Выгрузка 12-15.12_new.xlsx
File: C:\Path\To\Workbooks\Выгрузка 12-15.12.xlsx, Shape: (4818, 27)
Malformed file saved as C:\Path\To\Workbooks\Выгрузка 21-27_new.xlsx
File: C:\Path\To\Workbooks\Выгрузка 21-27.xlsx, Shape: (8876, 27)
File: C:\Path\To\Workbooks\Выгрузка 26-29.12.xlsx, Shape: (4539, 27)
Final dataframe shape: (330257, 30)
考慮甚至通過pyodbc
使用Windows的ACE引擎數據庫引擎的方式來查詢與相應的工作簿(的330257行和30列的最終數據幀)熊貓read_sql
由於每個分享相同的名稱,TDSheet。
#...same as above
import pyodbc
def sql_read():
dfs = []
for f in xlsxfiles:
try:
conn = pyodbc.connect('Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};'+\
'DBQ=C:\\Path\\To\\Workbooks\\{};'.format(f), autocommit=True)
df = pd.read_sql('SELECT * FROM [TDSheet$];', conn)
except Exception as e:
conn.close()
conn = pyodbc.connect('Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};'+\
'DBQ=C:\\Path\\To\\Workbooks\\{};'.format(save_xlsx(f)), autocommit=True)
df = pd.read_excel('SELECT * FROM [TDSheet$];', conn)
conn.close()
print('File: {}, Shape: {}'.format(f, df.shape))
dfs.append(df)
我無法重現您的錯誤。 'x = pd.read_excel('D:\\ download \\ 2016-08-152016-08-21.xlsx')' - 對我來說工作得很好(Pandas v 0.19.2)。你的Pandas版本是什麼? – MaxU
嘗試了相同的文件。完美加載,沒有錯誤。 但是,當我加載一堆文件錯誤發生。 我使用木星。 –
爲了能夠幫助你,我們需要一個__reproducible__ excel文件(s)... – MaxU