2014-10-27 79 views
1

對應的k行我有一個矩陣X (shape mXn)和矢量y(Shape mX1)和概率向量p(shape mX1)選擇從矩陣和矢量

我要採樣的行從X和相應的行y中基於概率p k行..

我該如何在python中實現這個功能(因爲在那裏已經有內置的函數了嗎?)

回答

1

你需要使用累積分佈函數(或者使用numpy或者自己寫),然後將這些向量一起壓縮到實施你打算實現的目標

實施

def sample(population, k, prob = None): 
    import random 
    from bisect import bisect 
    from operator import itemgetter 
    def cdf(population, k, prob): 
     population = map(itemgetter(1), sorted(zip(prob, population))) 
     _cumm = [prob[0]] 
     for i in range(1, len(P)): 
      _cumm.append(_cumm[-1] + P[i]) 
     return [population[bisect(_cumm, random.random())] for i in range(k)] 
    if prob == None: 
     return random.sample(population, k) 
    else: 
     return cdf(population, k, prob) 

def gen_sample_data(m, n): 
    X = [random.sample(range(100), n) for _ in range(m)] 
    Y = random.sample(range(100), m) 
    P = random.sample(range(100), m) 
    P = [1. * e/sum(P) for e in P] 
    return X, Y, P 


>>> X, Y, P = gen_sample_data(10, 5) 
>>> pprint.pprint(X) 
[[29, 14, 95, 4, 83], 
[80, 73, 34, 70, 49], 
[67, 25, 94, 46, 83], 
[78, 24, 80, 38, 91], 
[90, 22, 53, 20, 71], 
[91, 0, 64, 90, 59], 
[82, 66, 22, 33, 93], 
[25, 34, 7, 5, 2], 
[87, 0, 91, 8, 78], 
[17, 30, 73, 14, 63]] 
>>> pprint.pprint(Y) 
[83, 61, 62, 59, 41, 72, 56, 23, 36, 97] 
>>> pprint.pprint(P) 
[0.015424164524421594, 
0.002570694087403599, 
0.2544987146529563, 
0.02570694087403599, 
0.10796915167095116, 
0.033419023136246784, 
0.08483290488431877, 
0.20565552699228792, 
0.2236503856041131, 
0.04627249357326478] 
>>> pprint.pprint(zip(*sample(zip(X,Y), 5, prob = P))) 
[([67, 25, 94, 46, 83], 
    [87, 0, 91, 8, 78], 
    [82, 66, 22, 33, 93], 
    [87, 0, 91, 8, 78], 
    [87, 0, 91, 8, 78]), 
(62, 36, 56, 36, 36)] 
+0

是否有範圍對於i一個錯字(1,LEN(P)):什麼是P' – Fraz 2014-10-28 06:22:51

+0

是的,這是一個印刷錯誤。它實際上應該是prob – Abhijit 2014-10-28 06:44:20