2017-09-04 156 views
1

我是熊貓模塊的新手。我有一個關於熊貓合併方法的小問題。假設我有兩個單獨的表,如下所示:熊貓合併兩個數據幀

Original_DataFrame

machine weekNum Percent 
M1  2  75 
M1  5  80 
M1  8  95 
M1  10  90 

New_DataFrame

machine weekNum Percent 
M1  1  100 
M1  2  100 
M1  3  100 
M1  4  100 
M1  5  100 
M1  6  100 
M1  7  100 
M1  8  100 
M1  9  100 
M1  10  100 

我用熊貓模塊的合併方法,如下所示:

pd.merge(orig_df, new_df, on='weekNum', how='left') 

我得到如下:

machine weekNum Percent_x Percent_y 
0 M1   2  75   100 
1 M1   5  80   100 
2 M1   8  95   100 
3 M1   10  90   100 

不過,我期待填補跳過weekNums,並把100那些行得到需要的結果如下。

machine weekNum Percent 
M1  1  100 
M1  2  75 
M1  3  100 
M1  4  100 
M1  5  80 
M1  6  100 
M1  7  100 
M1  8  95 
M1  9  100 
M1  10  90 

任何人都可以請指導我如何繼續?

回答

1

我覺得你共同列需要combine_first,但首先set_index

df11 = df1.set_index(['machine','weekNum']) 
df22 = df2.set_index(['machine','weekNum']) 

df = df11.combine_first(df22).astype(int).reset_index() 
print (df) 
    machine weekNum Percent 
0  M1  1  100 
1  M1  2  75 
2  M1  3  100 
3  M1  4  100 
4  M1  5  80 
5  M1  6  100 
6  M1  7  100 
7  M1  8  95 
8  M1  9  100 
9  M1  10  90 


df.plot.bar('weekNum', 'Percent') 

graph

編輯:

對於標籤:

plt.figure(figsize=(12, 8)) 
ax = df.plot.bar('weekNum', 'Percent') 
rects = ax.patches 

for rect, label in zip(rects, df['Percent']): 
    height = rect.get_height() 
    ax.text(rect.get_x() + rect.get_width()/2, height + 1, label, ha='center', va='bottom') 

plt.ylim(ymax=120) 

graph2

+0

給我一個錯誤,如下所示,運行上次的代碼之後: ValueError異常:無效的字面INT()基數爲10:「M1」 – SalN85

+0

對不起,我在代碼的第一個版本錯字。需要'df11'和'df22' - 'df = df11.combine_first(df22).astype(int).reset_index()' – jezrael

+0

仍然是同樣的錯誤。 ValueError:無效文字爲int()以10爲基數:'M1' :( – SalN85

0

不一樣優雅與其他解決方案,但無論如何作品:

# join 
merged = pd.merge(data1, data2, on=['machine','weekNum'], how='outer') 
# combine percent columns 
merged['Percent'] = merged['Percent_x'].fillna(merged['Percent_y']) 
# remove extra columns 
result = merged[['machine','weekNum', 'Percent']] 

結果:

machine weekNum Percent 
M1 2 75 
M1 5 80 
M1 8 95 
M1 10 90 
M1 1 100 
M1 3 100 
M1 4 100 
M1 6 100 
M1 7 100 
M1 9 100 
+0

這是真的,但我想用原始數據覆蓋weekNumbers 2,5,8和10的記錄。 – SalN85

+0

作品!謝謝derline – SalN85

0

你可以試試這個。根據您的總體目標,這可能不夠「程序化」。

import pandas as pd  
df1 = pd.DataFrame({"machine":["M1"]*4, "WeekNum": [2,5,8,10], "Percent":[75,80,95,90]}) 
df2 = pd.DataFrame({"machine":["M1"]*10,"WeekNum":np.arange(1,11,1),"Percent":[100]*10}) 
newcol = df2.merge(df1, on = "WeekNum", how = "outer")["Percent_y"].fillna(100) 
df2["Percent"] = newcol