爲什麼win32com比xlrd慢得多？

我有相同的代碼，使用win32com和xlrd編寫。 xlrd在不到一秒的時間內完成算法，而win32com需要幾分鐘的時間。爲什麼win32com比xlrd慢得多？

這裏是win32com：

def makeDict(ws): 
"""makes dict with key as header name, 
    value as tuple of column begin and column end (inclusive)""" 
wsHeaders = {} # key is header name, value is column begin and end inclusive 
for cnum in xrange(9, find_last_col(ws)): 
    if ws.Cells(7, cnum).Value: 
     wsHeaders[str(ws.Cells(7, cnum).Value)] = (cnum, find_last_col(ws)) 
     for cend in xrange(cnum + 1, find_last_col(ws)): #finds end column 
      if ws.Cells(7, cend).Value: 
       wsHeaders[str(ws.Cells(7, cnum).Value)] = (cnum, cend - 1) 
       break 
return wsHeaders

而且xlrd

def makeDict(ws): 
"""makes dict with key as header name, 
    value as tuple of column begin and column end (inclusive)""" 
wsHeaders = {} # key is header name, value is column begin and end inclusive 
for cnum in xrange(8, ws.ncols): 
    if ws.cell_value(6, cnum): 
     wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, ws.ncols) 
     for cend in xrange(cnum + 1, ws.ncols):#finds end column 
      if ws.cell_value(6, cend): 
       wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, cend - 1) 
       break 
return wsHeaders

來源

2010-06-03 Josh

（0）您問「爲什麼win32com比xlrd慢得多？」 ......這個問題有點像「你不再毆打你的妻子嗎？」 ---它是基於一個可能不是真的假設; win32com由一位出色的程序員用C語言編寫，但是xlrd是由普通程序員用純Python編寫的。真正的區別在於win32com必須調用COM，其中涉及進程間通信和由you-know-who編寫，而xlrd則直接讀取Excel文件。此外，場景中還有第四方：您。請繼續閱讀。

（1）您不會向我們展示在COM代碼中重複使用的find_last_col()函數的來源。在xlrd代碼中，您總是樂意使用相同的值（ws.ncols）。所以在COM代碼中，你應該調用find_last_col(ws) ONCE，然後使用返回的結果。更新請參閱answer to your separate question關於如何從COM獲取xlrd的Sheet.ncols的等效項。（2）訪問每個單元值TWICE正在減慢兩個代碼的速度。取而代之的

if ws.cell_value(6, cnum): 
    wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, ws.ncols)

嘗試

value = ws.cell_value(6, cnum) 
if value: 
    wsHeaders[str(value)] = (cnum, ws.ncols)

注：有每個代碼片斷2案件這一點。

（3）你的嵌套循環的目的並不明顯，但似乎有一些冗餘計算，涉及從COM冗餘提取。如果你想通過例子告訴我們你想要達到的目標，我們可以幫助你使其運行得更快。至少，從COM中提取值然後在Python中嵌套循環中處理它們應該更快。有多少列？

更新2同時，小精靈走上與直腸鏡你的代碼，並與下面的腳本上來：

tests= [ 
    "A/B/C/D", 
    "A//C//", 
    "A//C//E", 
    "A///D", 
    "///D", 
    ] 
for test in tests: 
    print "\nTest:", test 
    row = test.split("/") 
    ncols = len(row) 
    # modelling the OP's code 
    # (using xlrd-style 0-relative column indexes) 
    d = {} 
    for cnum in xrange(ncols): 
     if row[cnum]: 
      k = row[cnum] 
      v = (cnum, ncols) #### BUG; should be ncols - 1 ("inclusive") 
      print "outer", cnum, k, '=>', v 
      d[k] = v 
      for cend in xrange(cnum + 1, ncols): 
       if row[cend]: 
        k = row[cnum] 
        v = (cnum, cend - 1) 
        print "inner", cnum, cend, k, '=>', v 
        d[k] = v 
        break 
    print d 
    # modelling a slightly better algorithm 
    d = {} 
    prev = None 
    for cnum in xrange(ncols): 
     key = row[cnum] 
     if key: 
      d[key] = [cnum, cnum] 
      prev = key 
     elif prev: 
      d[prev][1] = cnum 
    print d 
    # if tuples are really needed (can't imagine why) 
    for k in d: 
     d[k] = tuple(d[k]) 
    print d

它輸出這樣的：因爲它

Test: A/B/C/D 
outer 0 A => (0, 4) 
inner 0 1 A => (0, 0) 
outer 1 B => (1, 4) 
inner 1 2 B => (1, 1) 
outer 2 C => (2, 4) 
inner 2 3 C => (2, 2) 
outer 3 D => (3, 4) 
{'A': (0, 0), 'C': (2, 2), 'B': (1, 1), 'D': (3, 4)} 
{'A': [0, 0], 'C': [2, 2], 'B': [1, 1], 'D': [3, 3]} 
{'A': (0, 0), 'C': (2, 2), 'B': (1, 1), 'D': (3, 3)} 

Test: A//C// 
outer 0 A => (0, 5) 
inner 0 2 A => (0, 1) 
outer 2 C => (2, 5) 
{'A': (0, 1), 'C': (2, 5)} 
{'A': [0, 1], 'C': [2, 4]} 
{'A': (0, 1), 'C': (2, 4)} 

Test: A//C//E 
outer 0 A => (0, 5) 
inner 0 2 A => (0, 1) 
outer 2 C => (2, 5) 
inner 2 4 C => (2, 3) 
outer 4 E => (4, 5) 
{'A': (0, 1), 'C': (2, 3), 'E': (4, 5)} 
{'A': [0, 1], 'C': [2, 3], 'E': [4, 4]} 
{'A': (0, 1), 'C': (2, 3), 'E': (4, 4)} 

Test: A///D 
outer 0 A => (0, 4) 
inner 0 3 A => (0, 2) 
outer 3 D => (3, 4) 
{'A': (0, 2), 'D': (3, 4)} 
{'A': [0, 2], 'D': [3, 3]} 
{'A': (0, 2), 'D': (3, 3)} 

Test: ///D 
outer 3 D => (3, 4) 
{'D': (3, 4)} 
{'D': [3, 3]} 
{'D': (3, 3)}

來源

2010-06-03 23:46:22

+1我同意 - 放緩不是因爲COM，而是因爲OP的使用，使得它看起來更慢。 – Cam 2010-06-04 01:55:36

正是由於IPC的開銷，還有許多方法可以使用COM通過安全數組讀取和寫入多個值。你可以通過win32com透明地做到這一點，通過指定範圍而不是單個單元。 – 2010-06-04 03:38:46

COM需要談論其實際處理請求的另一個過程。 xlrd在數據結構本身上正在進行工作。

來源

2010-06-03 19:48:17

所以是不可能做到這一點在合理的時間，用win32com？ – Josh 2010-06-03 22:26:48

思想我昨晚要睡覺，並最終使用這個。一個遠遠超出我的原始版本：

def makeDict(ws): 
"""makes dict with key as header name, 
    value as tuple of column begin and column end (inclusive)""" 
wsHeaders = {} # key is header name, value is column begin and end inclusive 
last_col = find_last_col(ws) 

for cnum in xrange(9, last_col): 
    if ws.Cells(7, cnum).Value: 
     value = ws.Cells(7, cnum).Value 
     cstart = cnum 
    if ws.Cells(7, cnum + 1).Value: 
     wsHeaders[str(value)] = (cstart, cnum) #cnum is last in range 
return wsHeaders

來源

2010-06-04 14:48:34 Josh

（1）你仍然訪問同一個單元兩次（2）以前的代碼硬編碼第8列和第6行;這個分別使用9和7;爲什麼？ – 2010-06-04 21:51:22

爲什麼win32com比xlrd慢得多？

回答

相關問題