2017-04-08 149 views
1

我想在python 3中重新創建這個密度圖:math.stackexchange.com/questions/845424/the-expected-outcome-of-a -random-遊戲的棋在同一圖上繪製多個密度曲線:在Python 3中對子集類別進行加權

End Goal: I need my density plot to look like this

藍色曲線下面積等於紅色,綠色,紫色曲線相結合的,因爲不同的結果(繪圖,黑勝,和白色勝利)是總數(全部)的子集。

我該如何讓python實現並相應地繪製它?

這裏是1000模擬pastebin.com/YDVMx2DL

from matplotlib import pyplot as plt 
import seaborn as sns 

black = results_df.loc[results_df['outcome'] == 'Black'] 
white = results_df.loc[results_df['outcome'] == 'White'] 
draw = results_df.loc[results_df['outcome'] == 'Draw'] 
win = results_df.loc[results_df['outcome'] != 'Draw'] 

Total = len(results_df.index) 
Wins = len(win.index) 

PercentBlack = "Black Wins ≈ %s" %('{0:.2%}'.format(len(black.index)/Total)) 
PercentWhite = "White Wins ≈ %s" %('{0:.2%}'.format(len(white.index)/Total)) 
PercentDraw = "Draw ≈ %s" %('{0:.2%}'.format(len(draw.index)/Total)) 
AllTitle = 'Distribution of Moves by All Outcomes (nSample = %s)' %(workers) 

sns.distplot(results_df.moves, hist=False, label = "All") 
sns.distplot(black.moves, hist=False, label=PercentBlack) 
sns.distplot(white.moves, hist=False, label=PercentWhite) 
sns.distplot(draw.moves, hist=False, label=PercentDraw) 
plt.title(AllTitle) 
plt.ylabel('Density') 
plt.xlabel('Number of Moves') 
plt.legend() 
plt.show() 

後results_df的.csv文件上面代碼沒有重量密度曲線,這是我真的需要想出如何生成密度曲線的權重因此以及保存我的標籤圖例

density curves, no weights; help

我也試過頻率直方圖,即按比例分配高度正確,但我寧願保持4條曲線重疊在一起,以獲得「更乾淨」的外觀... 我不喜歡這個頻率圖,但這是我目前的修復。

results_df.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = "All") 
draw.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentDraw) 
white.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentWhite) 
black.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentBlack) 
plt.title(AllTitle) 
plt.ylabel('Frequency') 
plt.xlabel('Number of Moves') 
plt.legend() 
plt.show() 

如果與正確子集的權重4條密度曲線輸出第一情節以及保留了自定義圖例,顯示百分比任何人都可以寫蟒3代碼,這將不勝感激。

一旦密度曲線與正確的子集的權重繪製的,我也有興趣在Python 3代碼找到每個密度曲線的最高點座標顯示的移動最高頻率,一旦我將其放大至50萬迭代。

謝謝

回答

1

你需要小心。你製作的情節是正確的。所示的所有曲線都是基礎分佈的概率密度函數。

在您想要的圖中,只有標有「全部」的曲線是概率密度函數。其他曲線不是。

在任何情況下,如果您想按照所需的圖中所示進行縮放,您將需要自己計算內核密度估計值。這可以使用scipy.stats.gaussial_kde()完成。

爲了重現所需的情節,我看到兩個選項。

計算所有相關病例的kde,並用樣本數量進行比例。

import numpy as np; np.random.seed(0) 
import matplotlib.pyplot as plt 
import scipy.stats 

a = np.random.gumbel(80, 25, 1000).astype(int) 
b = np.random.gumbel(200, 46, 4000).astype(int) 

kdea = scipy.stats.gaussian_kde(a) 
kdeb = scipy.stats.gaussian_kde(b) 

both = np.hstack((a,b)) 
kdeboth = scipy.stats.gaussian_kde(both) 
grid = np.arange(500) 

#weighted kde curves 
wa = kdea(grid)*(len(a)/float(len(both))) 
wb = kdeb(grid)*(len(b)/float(len(both))) 

print "a.sum ", wa.sum() 
print "b.sum ", wb.sum() 
print "total.sum ", kdeb(grid).sum() 

fig, ax = plt.subplots() 
ax.plot(grid, wa, lw=1, label = "weighted a") 
ax.plot(grid, wb, lw=1, label = "weighted b") 
ax.plot(grid, kdeboth(grid), color="crimson", lw=2, label = "pdf") 

plt.legend() 
plt.show() 

enter image description here

計算KDE的所有個案,規範自己的總和來獲得總。

import numpy as np; np.random.seed(0) 
import matplotlib.pyplot as plt 
import scipy.stats 

a = np.random.gumbel(80, 25, 1000).astype(int) 
b = np.random.gumbel(200, 46, 4000).astype(int) 

kdea = scipy.stats.gaussian_kde(a) 
kdeb = scipy.stats.gaussian_kde(b) 

grid = np.arange(500) 


#weighted kde curves 
wa = kdea(grid)*(len(a)/float(len(a)+len(b))) 
wb = kdeb(grid)*(len(b)/float(len(a)+len(b))) 

total = wa+wb 

fig, ax = plt.subplots(figsize=(5,3)) 
ax.plot(grid, wa, lw=1, label = "weighted a") 
ax.plot(grid, wb, lw=1, label = "weighted b") 
ax.plot(grid, total, color="crimson", lw=2, label = "pdf") 

plt.legend() 
plt.show() 

enter image description here