列表vs內置排序（）函數的Python排序（）方法

我知道__builtin__ sorted（）函數適用於任何迭代。但是有人可以解釋一下anylist.sort（）與sorted（anylist）之間的巨大（10倍）性能差異嗎？另外，請指出我是否以這種測量方式做錯了什麼。列表vs內置排序（）函數的Python排序（）方法

 
""" 
Example Output: 
$ python list_sort_timeit.py 
Using sort method: 20.0662879944 
Using sorted builin method: 259.009809017 
""" 

import random 
import timeit 

print 'Using sort method:', 
x = min(timeit.Timer("test_list1.sort()","import random;test_list1=random.sample(xrange(1000),1000)").repeat()) 
print x 

print 'Using sorted builin method:', 
x = min(timeit.Timer("sorted(test_list2)","import random;test_list2=random.sample(xrange(1000),1000)").repeat()) 
print x

正如標題所說，我很感興趣，比較list.sort（）與排序（列表）。上面的代碼片段顯示了一些有趣的事情，python的排序函數對於已排序的數據表現得非常好。正如Anurag所指出的那樣，在第一種情況下，排序方法正在處理已經排序的數據，而在第二種排序方法中，它正在處理新鮮事物以便一次又一次地工作。

所以我寫了這個測試，是的，他們非常接近。

 
""" 
Example Output: 
$ python list_sort_timeit.py 
Using sort method: 19.0166599751 
Using sorted builin method: 23.203567028 
""" 

import random 
import timeit 

print 'Using sort method:', 
x = min(timeit.Timer("test_list1.sort()","import random;test_list1=random.sample(xrange(1000),1000);test_list1.sort()").repeat()) 
print x 

print 'Using sorted builin method:', 
x = min(timeit.Timer("sorted(test_list2)","import random;test_list2=random.sample(xrange(1000),1000);test_list2.sort()").repeat()) 
print x

哦，我看到亞歷克斯·馬爾泰利有反應，因爲我打字這一個。（我將離開這個編輯，因爲它可能是有用的）。

來源

2009-09-17 Senthil Kumaran

順便說一句，因爲這是一個合法答案的問題，它可能不應該是社區wiki。 – 2009-09-17 06:15:28

好的，我會記住的，丹尼爾。這是一個很好的指針。 – 2009-09-17 06:23:36

你的測量誤差如下：您的test_list1.sort()第一個電話後，說列表對象IS排序 - 和Python的排序，又名timsort，是邪惡地快速已經排序的列表！這是使用timeit時最常出現的錯誤 - 無意中得到了副作用，而沒有考慮到它們。

這裏有一個很好的一套測量，使用timeit在命令行，因爲它是最適合用於：

$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' ' 
y=list(x); y.sort()' 
1000 loops, best of 3: 452 usec per loop 
$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' ' 
x.sort()' 
10000 loops, best of 3: 37.4 usec per loop 
$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' ' 
sorted(x)' 
1000 loops, best of 3: 462 usec per loop

正如你看到的，y.sort()和sorted(x)並駕齊驅，但x.sort()感謝副作用漲幅超過一個數量級的優勢 - 只是因爲你的測量誤差，儘管如此：這告訴你沒有任何關於sort vs sorted本身！ - ）

來源

2009-09-17 06:23:30

謝謝Alex！你能解釋爲什麼就地排序的時間要比sorted（）返回的列表稍微少一些？丹尼爾提到複製，但我們還沒有複製任何東西，對吧？我假設sort（）生成的排序列表是一個內存中操作，當返回時，分配一個新的列表對象。 – 2009-09-17 11:32:22

@Senthil：列表中的對象不應該被複制，但列表本身是（例如在'y = sorted（x）'後面有兩個列表的副本，'x'和'y'）。所以'sort（）'調用至少需要一個'malloc（n * sizeof（PyObject *））'，並且'n'指針要從第一個列表複製到新列表中。我的猜測是，複製這些「n」指針可以彌補Alex基準測試顯示的時間差約2％。 – 2009-09-17 14:21:54

@Daniel，yep，malloc本身加上指針的副本。只需要一個簡單的列表淺拷貝操作（我建議在命令行上這樣做 - ），以查看佔多少時間。 – 2009-09-17 15:04:09

好吧，列表的.sort()方法將列表排序，而sorted()創建一個新的列表。所以如果你有一個大的列表，你的性能差異的一部分將由於複製。

儘管如此，一個數量級的差異似乎比我預期的要大。也許list.sort()有一些特殊優化，sorted()不能使用。例如，由於list類已經有一個正確大小的內部Py_Object*[]數組，因此它可能會更有效地執行交換。

編輯：亞歷克斯和阿努拉格是正確的，幅度差的順序是因爲你不小心分揀你的測試用例已排序列表。但是，正如Alex的基準測試所示，list.sort()比sorted()快大約2％，由於複製開銷，這很有意義。

來源

2009-09-17 06:13:47

這是一個有用的指針。我剛開始挖掘並試圖理解就地排序與返回操作之間差異的原因。我已經評論了亞歷克斯的迴應。謝謝。 – 2009-09-17 11:33:45

因爲list.sort確實排序，所以第一次排序，但下一次排序排序列表。

例如試試這個，你會在大部分的時間都花在被複制timeit情況下獲得相同的結果和分類也確實多了一個複製

import time 
import random 
test_list1=random.sample(xrange(1000),1000) 
test_list2=random.sample(xrange(1000),1000) 

s=time.time() 
for i in range(100): 
    test_list1.sort() 
print time.time()-s 

s=time.time() 
for i in range(100): 
    test_list2=sorted(test_list2) 
print time.time()-s

來源

2009-09-17 06:20:30

是的，你指出了正確的。我確實知道[] .sort（）的就地排序，但不知怎麼的繼續查看兩者實現的算法是否有區別（正在出現）。看起來，[] .sort（）在已經排序的列表中工作得很好。順便說一句，由於很多原因[1]，我會用timeit來顯示比time.time更多的片段。使用timeit進行單次執行的結果相同。 http://diveintopython.org/performance_tuning/timeit.html – 2009-09-17 06:38:10

列表vs內置排序（）函數的Python排序（）方法

回答

相關問題