2016-12-30 177 views
13

事情是這樣的: enter image description here如何在Python中做華夫餅圖? (方餅圖)

有一個很好的封裝to do it in R。在Python,最好,我可以弄清楚是這樣的,使用squarify封裝(由a post on how to do treemaps啓發):

import numpy as np 
import pandas as pd 
import matplotlib as mpl 
import matplotlib.pyplot as plt 
import seaborn as sns # just to have better line color and width 
import squarify 
# for those using jupyter notebooks 
%matplotlib inline 


df = pd.DataFrame({ 
        'v1': np.ones(100), 
        'v2': np.random.randint(1, 4, 100)}) 
df.sort_values(by='v2', inplace=True) 

# color scale 
cmap = mpl.cm.Accent 
mini, maxi = df['v2'].min(), df['v2'].max() 
norm = mpl.colors.Normalize(vmin=mini, vmax=maxi) 
colors = [cmap(norm(value)) for value in df['v2']] 

# figure 
fig = plt.figure() 
ax = fig.add_subplot(111, aspect="equal") 
ax = squarify.plot(df['v1'], color=colors, ax=ax) 
ax.set_xticks([]) 
ax.set_yticks([]); 

waffle

但是,當我創建不是100個,但200個元素(或其他非正方形數字),正方形變得不對齊。

enter image description here

另一個問題是,如果我改變v2的一些分類變量(例如,百作爲,BS,CS和DS),我得到這個錯誤:

could not convert string to float: 'a'

所以,能任何人都可以幫助我解決這兩個問題:

  • 我該如何解決非平方觀測值的對準問題?
  • 如何在v2中使用分類變量?

除此之外,如果有其他可以更有效地創建華夫餅圖的python包,我真的很開放。

+1

[這裏](http://bokeh.pydata.org/en/latest/docs/gallery/unemployment.html )是一個使用'bokeh'的例子......你必須稍微調整一下才能得到你的比例視圖,但是,有可能用Python來完成。 – blacksite

+0

非常感謝@not_a_robot,本週我會嘗試散景。 – lincolnfrias

+1

200不是方形號碼 –

回答

7

我花了幾天來構建更加通用的解決方案,PyWaffle。

您可以通過

pip install pywaffle 

的源代碼安裝:https://github.com/ligyxy/PyWaffle

PyWaffle不使用matshow()方法,但建立這些廣場一個接一個。這使得定製更容易。此外,它提供的是一個自定義的圖類,它返回一個圖形對象。通過更新圖形的屬性,基本上可以控制圖表中的所有內容。

一些例子:

有色的或透明的背景:

import matplotlib.pyplot as plt 
from pywaffle import Waffle 

data = {'Democratic': 48, 'Republican': 46, 'Libertarian': 3} 
fig = plt.figure(
    FigureClass=Waffle, 
    rows=5, 
    values=data, 
    colors=("#983D3D", "#232066", "#DCB732"), 
    title={'label': 'Vote Percentage in 2016 US Presidential Election', 'loc': 'left'}, 
    labels=["{0} ({1}%)".format(k, v) for k, v in data.items()], 
    legend={'loc': 'lower left', 'bbox_to_anchor': (0, -0.4), 'ncol': len(data), 'framealpha': 0} 
) 
fig.gca().set_facecolor('#EEEEEE') 
fig.set_facecolor('#EEEEEE') 
plt.show() 

enter image description here

使用圖標替換正方形:

data = {'Democratic': 48, 'Republican': 46, 'Libertarian': 3} 
fig = plt.figure(
    FigureClass=Waffle, 
    rows=5, 
    values=data, 
    colors=("#232066", "#983D3D", "#DCB732"), 
    legend={'loc': 'upper left', 'bbox_to_anchor': (1, 1)}, 
    icons='child', icon_size=18, 
    icon_legend=True 
) 

enter image description here

多個副區在一個圖表:

import pandas as pd 
data = pd.DataFrame(
    { 
     'labels': ['Hillary Clinton', 'Donald Trump', 'Others'], 
     'Virginia': [1981473, 1769443, 233715], 
     'Maryland': [1677928, 943169, 160349], 
     'West Virginia': [188794, 489371, 36258], 
    }, 
).set_index('labels') 

fig = plt.figure(
    FigureClass=Waffle, 
    plots={ 
     '311': { 
      'values': data['Virginia']/30000, 
      'labels': ["{0} ({1})".format(n, v) for n, v in data['Virginia'].items()], 
      'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 8}, 
      'title': {'label': '2016 Virginia Presidential Election Results', 'loc': 'left'} 
     }, 
     '312': { 
      'values': data['Maryland']/30000, 
      'labels': ["{0} ({1})".format(n, v) for n, v in data['Maryland'].items()], 
      'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.2, 1), 'fontsize': 8}, 
      'title': {'label': '2016 Maryland Presidential Election Results', 'loc': 'left'} 
     }, 
     '313': { 
      'values': data['West Virginia']/30000, 
      'labels': ["{0} ({1})".format(n, v) for n, v in data['West Virginia'].items()], 
      'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.3, 1), 'fontsize': 8}, 
      'title': {'label': '2016 West Virginia Presidential Election Results', 'loc': 'left'} 
     }, 
    }, 
    rows=5, 
    colors=("#2196f3", "#ff5252", "#999999"), # Default argument values for subplots 
    figsize=(9, 5) # figsize is a parameter of plt.figure 
) 

enter image description here

8

我已經放在一起了一個工作示例,下面,我認爲滿足您的需求。需要做一些工作來全面概括這種方法,但我認爲你會發現這是一個好的開始。訣竅是使用matshow()解決您的非方形問題,並構建自定義圖例以輕鬆解釋分類值。

import numpy as np 
import pandas as pd 
import matplotlib as mpl 
import matplotlib.pyplot as plt 
import matplotlib.patches as mpatches 

# Let's make a default data frame with catagories and values. 
df = pd.DataFrame({ 'catagories': ['cat1', 'cat2', 'cat3', 'cat4'], 
        'values': [84911, 14414, 10062, 8565] }) 
# Now, we define a desired height and width. 
waffle_plot_width = 20 
waffle_plot_height = 7 

classes = df['catagories'] 
values = df['values'] 

def waffle_plot(classes, values, height, width, colormap): 

    # Compute the portion of the total assigned to each class. 
    class_portion = [float(v)/sum(values) for v in values] 

    # Compute the number of tiles for each catagories. 
    total_tiles = width * height 
    tiles_per_class = [round(p*total_tiles) for p in class_portion] 

    # Make a dummy matrix for use in plotting. 
    plot_matrix = np.zeros((height, width)) 

    # Popoulate the dummy matrix with integer values. 
    class_index = 0 
    tile_index = 0 

    # Iterate over each tile. 
    for col in range(waffle_plot_width): 
     for row in range(height): 
      tile_index += 1 

      # If the number of tiles populated is sufficient for this class... 
      if tile_index > sum(tiles_per_class[0:class_index]): 

       # ...increment to the next class. 
       class_index += 1  

      # Set the class value to an integer, which increases with class. 
      plot_matrix[row, col] = class_index 

    # Create a new figure. 
    fig = plt.figure() 

    # Using matshow solves your "non-square" problem. 
    plt.matshow(plot_matrix, cmap=colormap) 
    plt.colorbar() 

    # Get the axis. 
    ax = plt.gca() 

    # Minor ticks 
    ax.set_xticks(np.arange(-.5, (width), 1), minor=True); 
    ax.set_yticks(np.arange(-.5, (height), 1), minor=True); 

    # Gridlines based on minor ticks 
    ax.grid(which='minor', color='w', linestyle='-', linewidth=2) 

    # Manually constructing a legend solves your "catagorical" problem. 
    legend_handles = [] 
    for i, c in enumerate(classes): 
     lable_str = c + " (" + str(values[i]) + ")" 
     color_val = colormap(float(i+1)/len(classes)) 
     legend_handles.append(mpatches.Patch(color=color_val, label=lable_str)) 

    # Add the legend. Still a bit of work to do here, to perfect centering. 
    plt.legend(handles=legend_handles, loc=1, ncol=len(classes), 
       bbox_to_anchor=(0., -0.1, 0.95, .10)) 

    plt.xticks([]) 
    plt.yticks([]) 

# Call the plotting function. 
waffle_plot(classes, values, waffle_plot_height, waffle_plot_width, 
      plt.cm.coolwarm) 

下面是這個腳本產生的輸出示例。正如你所看到的,它對我來說效果相當好,並滿足你所有的需求。只要讓我知道它是否會給你帶來麻煩。請享用!

waffle_plot

+0

是的,這是一個非常好的方法!我希望你能解決傳說中的問題:) – lincolnfrias

+0

@lincolnfrias,我已經修復並編輯了答案。它現在應該做你想要的一切。 –

+0

非常感謝賈斯汀這樣一個及時而優秀的回答。恭喜! – lincolnfrias