2015-09-25 577 views
7

我在3D空間中有一組點,我需要從中找到帕累託邊界。執行速度在這裏非常重要,並且時間增加非常快,因爲我添加了測試點。快速計算Python中的Pareto前沿

的點的集合是這樣的:

[[0.3296170319979843, 0.0, 0.44472108843537406], [0.3296170319979843,0.0, 0.44472108843537406], [0.32920760896951373, 0.0, 0.4440408163265306], [0.32920760896951373, 0.0, 0.4440408163265306], [0.33815192743764166, 0.0, 0.44356462585034007]] 

現在,我使用這個算法:

def dominates(row, candidateRow): 
    return sum([row[x] >= candidateRow[x] for x in range(len(row))]) == len(row) 

def simple_cull(inputPoints, dominates): 
    paretoPoints = set() 
    candidateRowNr = 0 
    dominatedPoints = set() 
    while True: 
     candidateRow = inputPoints[candidateRowNr] 
     inputPoints.remove(candidateRow) 
     rowNr = 0 
     nonDominated = True 
     while len(inputPoints) != 0 and rowNr < len(inputPoints): 
      row = inputPoints[rowNr] 
      if dominates(candidateRow, row): 
       # If it is worse on all features remove the row from the array 
       inputPoints.remove(row) 
       dominatedPoints.add(tuple(row)) 
      elif dominates(row, candidateRow): 
       nonDominated = False 
       dominatedPoints.add(tuple(candidateRow)) 
       rowNr += 1 
      else: 
       rowNr += 1 

     if nonDominated: 
      # add the non-dominated point to the Pareto frontier 
      paretoPoints.add(tuple(candidateRow)) 

     if len(inputPoints) == 0: 
      break 
    return paretoPoints, dominatedPoints 

這裏找到:http://code.activestate.com/recipes/578287-multidimensional-pareto-front/

什麼是找到的最快方法一組解決方案中的非主導解決方案?或者,簡而言之,Python可以比這個算法做得更好嗎?

回答

6

如果你擔心實際速度,你一定要使用numpy的(因爲聰明的算法調整可能具有比漲幅影響較小的方式被使用數組操作了)。這裏有兩個解決方案。的 「啞」 的解決方案是在大多數情況下速度慢,但隨着成本的數量增加更快:

import numpy as np 


def is_pareto_efficient_dumb(costs): 
    """ 
    :param costs: An (n_points, n_costs) array 
    :return: A (n_points,) boolean array, indicating whether each point is Pareto efficient 
    """ 
    is_efficient = np.ones(costs.shape[0], dtype = bool) 
    for i, c in enumerate(costs): 
     is_efficient[i] = np.all(np.any(costs>=c, axis=1)) 
    return is_efficient 


def is_pareto_efficient(costs): 
    """ 
    :param costs: An (n_points, n_costs) array 
    :return: A (n_points,) boolean array, indicating whether each point is Pareto efficient 
    """ 
    is_efficient = np.ones(costs.shape[0], dtype = bool) 
    for i, c in enumerate(costs): 
     if is_efficient[i]: 
      is_efficient[is_efficient] = np.any(costs[is_efficient]<=c, axis=1) # Remove dominated points 
    return is_efficient 

仿形測試:

隨着10000的樣品,2項成本:

dumb: Elapsed time is 0.9168s 
smart: Elapsed time is 0.004274s 

隨着5000樣品,15費用:

dumb: Elapsed time is 1.394s 
smart: Elapsed time is 1.982s 
+1

哇,我錯過了,謝謝彼得!我不確定我是否能夠獲得成本陣列,你能舉一個簡單的例子嗎?再一次感謝,這看起來太棒了。 – Rodolphe

+1

成本數組只是一個二維數組,其中cost [i,j]是第j個我認爲它和你的inputPoints數組是一樣的,你可以看到[tests here](https://github.com/QUVA-Lab/artemis/blob/master/artemis/general/) test_pareto_efficiency.py),它演示了它的用法。 – Peter

5

我花了一些時間重寫相同的算法與幾個調整。我認爲你的大部分問題來自inputPoints.remove(row)。這要求通過點列表搜索;按索引去除會更有效率。 我也修改了dominates函數以避免一些冗餘的比較。這可以在更高維度上得心應手。

def dominates(row, rowCandidate): 
    return all(r >= rc for r, rc in zip(row, rowCandidate)) 

def cull(pts, dominates): 
    dominated = [] 
    cleared = [] 
    remaining = pts 
    while remaining: 
     candidate = remaining[0] 
     new_remaining = [] 
     for other in remaining[1:]: 
      [new_remaining, dominated][dominates(candidate, other)].append(other) 
     if not any(dominates(other, candidate) for other in new_remaining): 
      cleared.append(candidate) 
     else: 
      dominated.append(candidate) 
     remaining = new_remaining 
    return cleared, dominated 
+0

謝謝,我正在嘗試。任何想法將如何比較這裏的第一個答案:http://stackoverflow.com/questions/21294829/fast-calculations-of-the-pareto-front-in-r? – Rodolphe

+1

我不確定。我嘗試了一些類似的解決方案,第一次嘗試。對於每個維度,我按值排列點並獲得索引對。取所有這些對的交集給出了所有的統治關係。然而,我無法讓我的python代碼運行得如此快。 –

1

dominates的定義不正確。當且僅當它在所有維度上優於或等於B,並且在至少一個維度上嚴格地更好時,A支配B.