2017-07-03 100 views
1

我有一個名爲inventory的熊貓df,其中有一列包含Part Numbers(AlphaNumeric)。其中一些零件編號已被取代,我還有另一個名爲replace_with的df,其中包含兩列'old part numbers''new part numbers'。 例如:使用具有相應替換項的另一個熊貓df替換pandas列中的值

庫存具有值等:

* 123AAA 
* 123BBB 
* 123CCC 
...... 

和替換-與具有類似於

**oldPartnumbers** .....  **newPartnumbers** 

* 123AAA  ............   123ABC 
* 123CCC   ...........   123DEF 

值所以,我需要與新的編號,以取代在庫存對應的值。更換後庫存將如下所示:

* 123ABC 
* 123BBB 
* 123DEF 

是否有一種簡單的方法可以在python中執行該操作?謝謝!

+0

是'DF [ 'part_numbers'] = DF [ 'new_part_numbers']'足夠? – glegoux

回答

1

讓說你有2 DF如下:

import pandas as pd 
df1 = pd.DataFrame([[1,3],[5,4],[6,7]], columns = ['PN','name']) 
df2 = pd.DataFrame([[2,22],[3,33],[4,44],[5,55]], columns = ['oldname','newname']) 

DF1:

PN oldname 
0 1 3 
1 5 4 
2 6 7 

DF2:

oldname newname 
0 2  22 
1 3  33 
2 4  44 
3 5  55 

運行左加入他們之間:

temp = df1.merge(df2,'left',left_on='name',right_on='oldname') 

溫度:

PN  name  oldname newname 
0 1  3   3.0  33.0 
1 5  4   4.0  44.0 
2 6  7   NaN  NaN 

然後計算新name列,並替換:

df1['name'] = temp.apply(lambda row: row['newname'] if pd.notnull(row['newname']) else row['name'], axis=1) 

DF1:

PN name 
0 1 33.0 
1 5 44.0 
2 6 7.0 

,或者如一個襯裏

df1['name'] = df1.merge(df2,'left',left_on='name',right_on='oldname').apply(lambda row: row['newname'] if pd.notnull(row['newname']) else row['name'], axis=1) 
2

設置

考慮dataframes inventoryreplace_with

inventory = pd.DataFrame(dict(Partnumbers=['123AAA', '123BBB', '123CCC'])) 

replace_with = pd.DataFrame(dict(
     oldPartnumbers=['123AAA', '123BBB', '123CCC'], 
     newPartnumbers=['123ABC', '123DEF', '123GHI'] 
    )) 

選項1
map

​​

選項2
replace

d = replace_with.set_index('oldPartnumbers').newPartnumbers 
inventory['Partnumbers'].replace(d, inplace=True) 

inventory 

    Partnumbers 
0  123ABC 
1  123DEF 
2  123GHI 
1

該溶液是比較快的 - 它採用熊貓數據對準和numpy的 「CopyTo從」 功能。

import pandas as pd 
import numpy as np 

df1 = pd.DataFrame({'partNumbers': ['123AAA', '123BBB', '123CCC', '123DDD']}) 
df2 = pd.DataFrame({'oldPartnumbers': ['123AAA', '123BBB', '123CCC'], 
        'newPartnumbers': ['123ABC', '123DEF', '123GHI']}) 

# assign index in each dataframe to original part number columns 
# (faster than set_index method, but use set_index if original index must be preserved) 
df1.index = df1.partNumbers 
df2.index = df2.oldPartnumbers 
# use pandas index data alignment 
df1['updatedPartNumbers'] = df2.newPartnumbers 
# use numpy to copy in old part num when a new part num is not found 
np.copyto(df1.updatedPartNumbers.values, 
      df1.partNumbers.values, 
      where=pd.isnull(df1.updatedPartNumbers)) 
# reset index 
df1.reset_index(drop=True, inplace=True) 

DF1:

partNumbers updatedPartNumbers 
0  123AAA    123ABC 
1  123BBB    123DEF 
2  123CCC    123GHI 
3  123DDD    123DDD