2017-09-05 83 views
1

the plotly directions繪製的分佈,我想畫出類似下面的代碼的東西:與長度不均勻

import plotly.plotly as py 
import plotly.figure_factory as ff 

import numpy as np 

# Add histogram data 
x1 = np.random.randn(200) - 2 
x2 = np.random.randn(200) 
x3 = np.random.randn(200) + 2 
x4 = np.random.randn(200) + 4 


# Group data together 
hist_data = [x1, x2, x3, x4] 

group_labels = ['Group 1', 'Group 2', 'Group 3', 'Group 4'] 

# Create distplot with custom bin_size 
fig = ff.create_distplot(hist_data, group_labels, bin_size = [.1, .25, .5, 1]) 

# Plot! 
py.iplot(fig, filename = 'Distplot with Multiple Bin Sizes') 

不過,我有一個現實世界的數據集是不均勻的樣品尺寸(即第1組的計數與組2中的計數不同等)。此外,它是名稱 - 值對格式。

下面是一些假的數據來說明:

# Add histogram data 
x1 = pd.DataFrame(np.random.randn(100)) 
x1['name'] = 'x1' 

x2 = pd.DataFrame(np.random.randn(200) + 1) 
x2['name'] = 'x2' 

x3 = pd.DataFrame(np.random.randn(300) - 1) 
x3['name'] = 'x3' 

df = pd.concat([x1, x2, x3]) 
df = df.reset_index(drop = True) 
df.columns = ['value', 'names'] 

df 

正如你所看到的,每個域名(X1,X2,X3)具有不同的數量,也是「名稱」一欄是我想什麼用作顏色。

有誰知道我怎麼可以陰謀策劃這個?

FYI在R,它非常簡單,我只是簡單的叫ggplot,並在aes(fill = names)

任何幫助將不勝感激,謝謝!

回答

2

你可以嘗試切片你的數據幀,然後把它放入Ploty中。

fig = ff.create_distplot([df[df.names == a].value for a in df.names.unique()], df.names.unique(), bin_size=[.1, .25, .5, 1]) 

enter image description here

import plotly 
import pandas as pd 
plotly.offline.init_notebook_mode() 
x1 = pd.DataFrame(np.random.randn(100)) 
x1['name']='x1' 

x2 = pd.DataFrame(np.random.randn(200)+1) 
x2['name']='x2' 

x3 = pd.DataFrame(np.random.randn(300)-1) 
x3['name']='x3' 

df=pd.concat([x1,x2,x3]) 
df=df.reset_index(drop=True) 
df.columns = ['value','names'] 
fig = ff.create_distplot([df[df.names == a].value for a in df.names.unique()], df.names.unique(), bin_size=[.1, .25, .5, 1]) 
plotly.offline.iplot(fig, filename='Distplot with Multiple Bin Sizes') 
+0

感謝您一個完美的解決方案。 –

1

plotly的文檔中的example工作了不均勻的樣本框的尺寸太大:

#!/usr/bin/env python 

import plotly 
import plotly.figure_factory as ff 
plotly.offline.init_notebook_mode() 
import numpy as np 

# data with different sizes 
x1 = np.random.randn(300)-2 
x2 = np.random.randn(200) 
x3 = np.random.randn(4000)+2 
x4 = np.random.randn(50)+4 

# Group data together 
hist_data = [x1, x2, x3, x4] 

# use custom names 
group_labels = ['x1', 'x2', 'x3', 'x4'] 

# Create distplot with custom bin_size 
fig = ff.create_distplot(hist_data, group_labels, bin_size=.2) 

# change that if you don't want to plot offline 
plotly.offline.plot(fig, filename='Distplot with Multiple Datasets') 

以上腳本將產生以下結果:


enter image description here