2016-05-17 47 views
0

我正嘗試讀取csv並使用Bokeh計算PDF和CDF。我收到錯誤。輸入文件是keywordfreq。頻率的分佈是繪製的。下面的輸入是來自超過50k行的幾行。錯誤 - 繪製PDF和CDF散景:不支持的操作數類型爲/:'list'和'int'

輸入:

#sportsnews,8 
#mashupradiomx,1 
#arrestobama,2 
#alemanha,1 
#bizeskiden,1 
#musicnews,4 
#costumedesign,2 
#champain,1 
#pacer,1 
#brunner,1 
#fotoviajera,1 
#itsjihadstupid,1 
#lesdernierssurvivants,1 
#sainsburycentre,1 
#alanalwaysinourheart,1 
#runinapp,1 
#foroporlavida,1 
#kidsday,1 
#momentofart,2 

代碼:

# -*- coding: utf-8 -*- 
import numpy as np 
import scipy.special 
import pandas as pd 

from bokeh.plotting import figure, show, output_file, vplot 

df = pd.read_csv('keyword.csv', header = None) 

df.columns = ['keyword','freq'] 

p5 = figure(title="Weibull Distribution (λ=1, k=1.25)", tools="save", 
      background_fill_color="#E8DDCB") 

lam, k = 1, 1.25 

#measured = lam*(-np.log(np.random.uniform(0, 1, 1000)))**(1/k) 
#hist, edges = np.histogram(measured, density=True, bins=50) 

x = df['freq'] 
pdf = (k/lam)*(x/lam)**(k-1) * np.exp(-(x/lam)**k) 
cdf = 1 - np.exp(-(x/lam)**k) 

p5.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], 
     fill_color="#036564", line_color="#033649") 

p5.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF") 
p5.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF") 

p5.legend.location = "top_left" 
p5.xaxis.axis_label = 'x' 
p5.yaxis.axis_label = 'Pr(x)' 

output_file('histogram.html', title="histogram.py example") 

show(vplot(p5)) 

我只想繪製兩個line地塊。

錯誤:

Traceback (most recent call last): 
    File "pdf_bokeh.py", line 21, in <module> 
    pdf = (k/lam)*(x/lam)**(k-1) * np.exp(-(x/lam)**k) 
TypeError: unsupported operand type(s) for /: 'list' and 'int' 

編輯1:改變x=df['freq']後,我越來越陌生輸出。 完整的輸入文件Dropbox數據本質上是離散的,但仍然分佈圖不像下面的輸出。

輸出:這不是真的在什麼地方接近它應該。

enter image description here

+0

什麼'x'意味着要爲你已經將它定義爲'x = ['freq']'? – EdChum

+0

@EdChum'x'是要繪製的'freq' –

+0

我假設你想要x = df ['freq'] – oystein

回答

0

您的問題是x軸的理論分佈,你不想在x=df['freq']品嚐這一點,但在一些非常期待座標。我下載了你的數據集,並能得到的東西明智的:

x = np.linspace(0, 5, 100) # 100 points between 0 and 5 

你可以做一些花哨的統計數據弄清楚,大部分數據是低於5

+0

如果我把x = linespace(0,5,10)然後我在哪裏給我的輸入? –

相關問題