2012-07-07 69 views
1

此代碼有效。但我不禁感到這是一種黑客行爲,尤其是「抵消」部分。我不得不把它放在那裏,因爲否則刪除操作中的所有索引值都會被移位一次。更好的方法來刪除統計異常值比這個?

# remove outliers > devs # of std deviations 
    devs = 1 
    deletes = [] 
    for num, duration in enumerate(durations): 
     if (duration > (mean_duration + (devs * std_dev_one_test))) or \ 
      (duration < (mean_duration - (devs * std_dev_one_test))): 
      deletes.append(num) 
    offset = 0 
    for delete in deletes: 
     del durations[delete - offset] 
     del dates[delete - offset] 
     offset += 1 

想法如何使它更好?

+0

'(持續時間>(mean_duration +(開發者* std_dev_one_test)))或(持續時間<(mean_duration - (devs * std_dev_one_test)))'簡化爲'abs(duration-mean_duration)> devs * std_dev_one_test',而不會失去任何可讀性。 – PaulMcG 2012-07-07 07:22:05

回答

4

建設成爲你遍歷列表飼養員的列表:

def isKeeper(duration): 
    if (duration > (mean_duration + (devs * std_dev_one_test))) or \ 
      (duration < (mean_duration - (devs * std_dev_one_test))): 
     return False 
    return True 

durations = [duration for duration in durations if isKeeper(duration)] 
1

是否從列表中刪除項目並導致索引偏移並且您使用偏移量進行補償?

如果是這樣,那麼只需將表格從後面刪除到前面,這樣刪除項目時不會影響列表的其餘部分。

所以開始迭代從最後一項到列表的前面。

這些所謂的問題可能會感興趣Delete many elements of list (python)Python: Removing list element while iterating over list

另一個好,所以討論可以在這裏找到:Remove items from a list while iterating(感謝@PaulMcGuire經由意見建議)

+0

這是另一個關於這個話題的好的討論:http://stackoverflow.com/questions/1207406/remove-items-from-a-list-while-iterating-in-python,尤其是Alex Martelli的補充評論。 – PaulMcG 2012-07-07 07:25:25

+0

@PaulMcGuire謝謝..這是一個很好的鏈接,我會將它添加到我的答案,如果你不介意,以防有人跳過評論。 – Levon 2012-07-07 10:32:45

0

如果數據集很小你可以扭轉你的邏輯,並保留價值而不是刪除它們:

# keep value outliers < devs # of std deviations 
devs = 1 
keeps = [] 
for duration in durations: 
    if (duration <= (mean_duration + (devs * std_dev_one_test))) and \ 
     (duration >= (mean_duration - (devs * std_dev_one_test))): 
     keeps.append(duration) 
3

也許是這樣的:

import numpy as np   

myList = [1,2,3,4,5,6,7,3,4,5,3,5,99] 

mean_duration = np.mean(myList) 
std_dev_one_test = np.std(myList)  

def drop_outliers(x): 
    if abs(x - mean_duration) <= std_dev_one_test: 
     return x 

myList = filter(drop_outliers, myList) 

結果:

>>> myList 
[1, 2, 3, 4, 5, 6, 7, 3, 4, 5, 3, 5] 
相關問題