2016-06-08 73 views
0

我正在使用python,熊貓和numpy來讀取一些數據。熊貓:如何將矩陣與不同的索引和列結合起來?

我有兩個數據幀:

輸入1-成本矩陣(它具有每季節和地區的費用):索引=區域和列=季節 輸入2-二進制矩陣(值1時一個月「 a「屬於季節」b「:index = seasons,columns = months

我想要的輸出是一個矩陣C,它具有每個區域和月份的成本:index = region,column month。

任何人都可以請幫我嗎?我用Google搜索了很多,但我不能找到解決方案。

我的代碼更新

import pandas as pd 
import numpy as np 
from xlwings import Workbook, Range 
import os 
print(os.getcwd()) 
link = (os.getcwd() + '/test.xlsx') 
print(link) 

#Open the Workbook 
wb = Workbook(link) 
# 
#Reading data 

regions=np.array(Range('Sheet1','regions').value) 
#[u'Region A' u'Region B' u'Region C' u'Region D'] 

seasons=np.array(Range('Sheet1','seasons').value) 
#[u'Season A' u'Season B' u'Season C' u'Season D'] 

months=np.array(Range('Sheet1','months').value) 
#[u'Jan' u'Feb' u'Mar' u'Apr' u'May' u'Jun' u'Jul' u'Aug'] 

#read relationship between season and month 
data=Range('Sheet1','rel').table.value 
relationship=pd.DataFrame(data[0:], index = regions, columns=months) 
#   Jan Feb Mar Apr May Jun Jul Aug 
#Region A 1 1 0 0 0 0 0 0 
#Region B 0 0 1 1 0 0 0 0 
#Region C 0 0 0 0 1 1 0 0 
#Region D 0 0 0 0 0 0 1 1 

#read the cost per region 
data=Range('Sheet1','cost').table.value 
cost=pd.DataFrame(data[0:], index = regions, columns=seasons) 
#   Season A Season B Season C Season D 
#Region A   1   9   7   2 
#Region B   7   0   3   3 
#Region C   4   0   7   5 
#Region D   3  10   3  10 


#What I want: 
#  Jan Feb Mar Apr May Jun Jul Aug 
#Region A 1 1 9 9 7 7 2 2 
#Region B 7 7 0 0 3 3 3 3 
#Region C 4 4 0 0 7 7 5 5 
#Region D 3 3 10 10 3 3 10 10 
+0

你能提供你的數據框的樣本數據嗎? –

回答

0

我相信,在您的示例中的關係數據框中一個錯誤,因爲你明確規定,它應該是賽季(而不是區域)和月份之間的關係,所以我相應地改變了它。

import pandas as pd 
import numpy as np 

regions = ['Region A', 'Region B', 'Region C', 'Region D'] 
seasons = ['Season A', 'Season B', 'Season C', 'Season D'] 
cost_data = np.array([[1, 9, 7, 2], [7, 0, 3, 3], [4, 0, 7, 5], [3, 10, 3, 10]]) 

cost = pd.DataFrame(data=cost_data, index=regions, columns=seasons) 

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug'] 
rel_data = np.array([[1, 1, 0, 0, 0, 0, 0, 0], 
        [0, 0, 1, 1, 0, 0, 0, 0], 
        [0, 0, 0, 0, 1, 1, 0, 0], 
        [0, 0, 0, 0, 0, 0, 1, 1]]) 

rel = pd.DataFrame(data=rel_data, index=seasons, columns=months) 

c = pd.DataFrame(index=regions, columns=months) 
for region in regions: 
    for month in months: 
     for season in seasons: 
      if rel.loc[season][month]: 
       c.loc[region][month] = cost.loc[region][season] 

print c 

#   Jan Feb Mar Apr May Jun Jul Aug 
#Region A 1 1 9 9 7 7 2 2 
#Region B 7 7 0 0 3 3 3 3 
#Region C 4 4 0 0 7 7 5 5 
#Region D 3 3 10 10 3 3 10 10 
+0

嘿,我用我的代碼更新了我的問題......我試圖合併,但我認爲我的做法是錯誤的......我怎麼能從我在第一個問題中添加的示例中做到這一點? –