熊貓：理解如何合併工作的困難

我做錯了合併，我不明白它是什麼。我已經做了如下估算了一系列的整數值的直方圖：熊貓：理解如何合併工作的困難

import pandas as pnd 
import numpy as np 

series = pnd.Series(np.random.poisson(5, size = 100)) 
tmp = {"series" : series, "count" : np.ones(len(series))} 
hist = pnd.DataFrame(tmp).groupby("series").sum() 
freq = (hist/hist.sum()).rename(columns = {"count" : "freq"})

如果我打印hist和freq這是我得到：

> print hist 
     count 
series  
0   2 
1   4 
2   13 
3   15 
4   12 
5   16 
6   18 
7   7 
8   8 
9   3 
10   1 
11   1 

> print freq 
     freq 
series  
0  0.02 
1  0.04 
2  0.13 
3  0.15 
4  0.12 
5  0.16 
6  0.18 
7  0.07 
8  0.08 
9  0.03 
10  0.01 
11  0.01

他們都是由"series"索引但如果我嘗試合併：

> df = pnd.merge(freq, hist, on = "series")

我得到KeyError: 'no item named series'異常。如果我省略on = "series"，我會得到一個IndexError: list index out of range異常。

我不明白我做錯了什麼。可能是「系列」是索引而不是列，所以我必須以不同的方式做？

來源

2012-04-13 Rafael S. Calsaverini

從docs：

上：列（名）加入上。必須在左邊和右邊的DataFrame對象中找到。如果沒有通過，left_index和right_index 都是假的，在DataFrames列的交叉點會推斷是連接鍵

我不知道這是爲什麼沒有在文檔字符串，但它說明你的問題。

您可以給left_index和right_index：

In : pnd.merge(freq, hist, right_index=True, left_index=True) 
Out: 
     freq count 
series 
0  0.01  1 
1  0.04  4 
2  0.14  14 
3  0.12  12 
4  0.21  21 
5  0.14  14 
6  0.17  17 
7  0.07  7 
8  0.05  5 
9  0.01  1 
10  0.01  1 
11  0.03  3

或者你可以讓你的索引中的列，並使用on：

In : freq2 = freq.reset_index() 

In : hist2 = hist.reset_index() 

In : pnd.merge(freq2, hist2, on='series') 
Out: 
    series freq count 
0  0 0.01  1 
1  1 0.04  4 
2  2 0.14  14 
3  3 0.12  12 
4  4 0.21  21 
5  5 0.14  14 
6  6 0.17  17 
7  7 0.07  7 
8  8 0.05  5 
9  9 0.01  1 
10  10 0.01  1 
11  11 0.03  3

或者更簡單地說，DataFrame具有join方法，它不正是你想要什麼：

In : freq.join(hist) 
Out: 
     freq count 
series 
0  0.01  1 
1  0.04  4 
2  0.14  14 
3  0.12  12 
4  0.21  21 
5  0.14  14 
6  0.17  17 
7  0.07  7 
8  0.05  5 
9  0.01  1 
10  0.01  1 
11  0.03  3

來源

2012-04-13 19:22:11 Avaris

改進合併文檔字符串的時間！ – 2012-04-13 22:23:10

@WesMcKinney：好:) – Avaris 2012-04-13 23:11:16

熊貓：理解如何合併工作的困難

回答

相關問題