查找使矩陣中的行有唯一性的列的最小子集

什麼是通用高效算法，用於查找離散值矩陣中使列的唯一性最小的列子集。查找使矩陣中的行有唯一性的列的最小子集

例如，考慮這個矩陣（名爲列）：

矩陣中的每一行都是唯一的。但是，如果我們刪除列a和d我們維護相同的屬性。

我可以枚舉列的所有可能的組合，但是，隨着矩陣的增長，這將很快變得棘手。有沒有更快，最佳的算法來做到這一點？

來源

2016-04-06 Tim Hopper

每個列本身不會爲每行構成一個具有唯一值的矩陣嗎？ – vmg

@vmg否，例如在列「a」2中重複。 – asmeurer

我們在（1）列數和（2）行數方面討論了什麼樣的數量級？ –

其實，我原來的配方不是很好。作爲一套封面，這更好。

import pulp 

# Input data 
A = [ 
    [2, 1, 0, 0], 
    [2, 0, 0, 0], 
    [2, 1, 2, 2], 
    [1, 2, 2, 2], 
    [2, 1, 1, 0] 
] 

# Preprocess the data a bit. 
# Bikj = 1 if Aij != Akj, 0 otherwise 
B = [] 
for i in range(len(A)): 
    Bi = [] 
    for k in range(len(A)): 
     Bik = [int(A[i][j] != A[k][j]) for j in range(len(A[i]))] 
     Bi.append(Bik) 
    B.append(Bi) 

model = pulp.LpProblem('Tim', pulp.LpMinimize) 

# Variables turn on and off columns. 
x = [pulp.LpVariable('x_%d' % j, cat=pulp.LpBinary) for j in range(len(A[0]))] 

# The sum of elementwise absolute difference per element and row. 
for i in range(len(A)): 
    for k in range(i + 1, len(A)): 
     model += sum(B[i][k][j] * x[j] for j in range(len(A[i]))) >= 1 

model.setObjective(pulp.lpSum(x)) 
assert model.solve() == pulp.LpStatusOptimal 
print([xi.value() for xi in x])

來源

2016-04-06 13:07:55

這是我的貪婪解決方案。（是的，這不符合你的「最佳」標準。）隨機挑選一排可以安全扔掉並丟棄的行。繼續前進，直到沒有更多這樣的行。我相信is_valid可以優化。

rows = [ 
    [2, 1, 0, 0], 
    [2, 0, 0, 0], 
    [2, 1, 2, 2], 
    [1, 2, 2, 2], 
    [2, 1, 1, 0] 
] 

col_names = [0, 1, 2, 3] 

def is_valid(rows, col_names): 
    # it's valid if every row has a distinct "signature" 
    signatures = { tuple(row[col] for col in col_names) for row in rows } 
    return len(signatures) == len(rows) 

import random  
def minimal_distinct_columns(rows, col_names): 
    col_names = col_names[:] 
    random.shuffle(col_names) 
    for i, col in enumerate(col_names): 
     fewer_col_names = col_names[:i] + col_names[(i+1):] 
     if is_valid(rows, fewer_col_names): 
      return minimal_distinct_columns(rows, fewer_col_names) 
    return col_names

由於它是貪婪的，它沒有得到最好的答案總是，但它應該是比較快速的（簡單）。

來源

2016-04-06 14:28:56 Joel

'繼續'是否應該跳回到'真正'？它不會只是繼續'for'循環（即什麼都不做）？ – asmeurer

呃，你是對的，讓我解決這個問題 – Joel

一個觀察：如果M有唯一的行，沒有列i和j，那麼它具有獨立的行，沒有列i和獨立的列j（換句話說，將一列添加到具有唯一行的矩陣中，行不是唯一的）。因此，您應該能夠通過使用深度優先搜索來找到最小（而非最小）的解決方案。

def has_unique_rows(M): 
    return len(set([tuple(i) for i in M])) == len(M) 

def remove_cols(M, cols): 
    ret = [] 
    for row in M: 
     new_row = [] 
     for i in range(len(row)): 
      if i in cols: 
       continue 
      new_row.append(row[i]) 
     ret.append(new_row) 
    return ret 


def minimum_unique_rows(M): 
    if not has_unique_rows(M): 
     raise ValueError("M must have unique rows") 

    cols = list(range(len(M[0]))) 

    def _cols_to_remove(M, removed_cols=(), max_removed_cols=()): 
     for i in set(cols) - set(removed_cols): 
      new_removed_cols = removed_cols + (i,) 
      new_M = remove_cols(M, new_removed_cols) 
      if not has_unique_rows(new_M): 
       continue 
      if len(new_removed_cols) > len(max_removed_cols): 
       max_removed_cols = new_removed_cols 
      return _cols_to_remove(M, new_removed_cols, max_removed_cols) 
     return max_removed_cols 

    removed_cols = _cols_to_remove(M) 
    return remove_cols(M, removed_cols), removed_cols

（請注意，我的變量命名是可怕的）

這是它在其上顯示如下

⎡0 1 0 1 0 1 1⎤ 
⎢     ⎥ 
⎢0 1 1 2 0 0 2⎥ 
⎢     ⎥ 
⎢1 0 1 1 1 0 0⎥ 
⎢     ⎥ 
⎢1 2 2 1 1 2 2⎥ 
⎢     ⎥ 
⎢2 0 0 0 0 1 1⎥ 
⎢     ⎥ 
⎢2 0 2 2 1 1 0⎥ 
⎢     ⎥ 
⎢2 1 2 1 1 0 1⎥ 
⎢     ⎥ 
⎢2 2 1 2 1 0 1⎥ 
⎢     ⎥ 
⎣2 2 2 1 1 2 1⎦

你的矩陣

In [172]: rows = [ 
    .....:  [2, 1, 0, 0], 
    .....:  [2, 0, 0, 0], 
    .....:  [2, 1, 2, 2], 
    .....:  [1, 2, 2, 2], 
    .....:  [2, 1, 1, 0] 
    .....: ] 

In [173]: minimum_unique_rows(rows) 
Out[173]: ([[1, 0], [0, 0], [1, 2], [2, 2], [1, 1]], (0, 3))

我產生一個隨機矩陣（使用sympy.randMatrix）

（注意排序M的行有助於checki很多納克用手這些東西）

In [224]: M1 = [[0, 1, 0, 1, 0, 1, 1], [0, 1, 1, 2, 0, 0, 2], [1, 0, 1, 1, 1, 0, 0], [1, 2, 2, 1, 1, 2, 2], [2, 0, 0, 0, 0, 1, 1], [2, 0, 2, 2, 1, 1, 0], [2, 1, 2, 1, 1, 0 
, 1], [2, 2, 1, 2, 1, 0, 1], [2, 2, 2, 1, 1, 2, 1]] 

In [225]: minimum_unique_rows(M1) 
Out[225]: ([[1, 1, 1], [2, 0, 2], [1, 0, 0], [1, 2, 2], [0, 1, 1], [2, 1, 0], [1, 0, 1], [2, 0, 1], [1, 2, 1]], (0, 1, 2, 4))

這裏有一個強力的支票，這是最低的答案（實際上有相當多的最小值）。

In [229]: from itertools import combinations 

In [230]: print([has_unique_rows(remove_cols(M1, r)) for r in combinations(range(7), 6)]) 
[False, False, False, False, False, False, False] 

In [231]: print([has_unique_rows(remove_cols(M1, r)) for r in combinations(range(7), 5)]) 
[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False] 

In [232]: print([has_unique_rows(remove_cols(M1, r)) for r in combinations(range(7), 4)]) 
[False, True, False, False, False, False, False, False, False, False, True, False, False, False, False, False, True, False, False, False, False, False, False, False, True, False, False, True, False, False, False, False, False, True, True]

來源

2016-04-06 17:40:23 asmeurer

雖然我敢肯定有更好的方法，這種深情讓我想起了一些遺傳算法的東西，我在90年代一樣。我使用R的GA軟件包編寫了一個快速版本。

library(GA) 

matrix_to_minimize <- matrix(c(2,2,1,1,2, 
           1,0,1,2,1, 
           0,0,2,2,1, 
           0,0,2,2,0), ncol=4) 

evaluate <- function(indices) { 
    if(all(indices == 0)) { 
    return(0) 
    } 
    selected_cols <- matrix_to_minimize[, as.logical(indices), drop=FALSE] 

    are_unique <- nrow(selected_cols) == nrow(unique(selected_cols)) 
    if (are_unique == FALSE) { 
    return(0) 
    } 

    retval <- (1/sum(as.logical(indices))) 
    return(retval) 
} 

ga_results <- ga("binary", evaluate, 
      nBits=ncol(matrix_to_minimize), 
      popSize=10 * ncol(matrix_to_minimize), #why not 
      maxiter=1000, 
      run=10) #probably want to play with this 

print("Best Solution: ") 
print([email protected])

我不知道這是好還是最佳，但我敢打賭它會在合理的時間內提供合理的好答案？ :)

來源

2016-04-06 18:51:24 earino

查找使矩陣中的行有唯一性的列的最小子集

回答

相關問題