Python - 從包含值列表的字典中添加具有映射值的新列

我試圖從映射的字典中爲數據框添加至少一個或多個列。我有一本產品目錄編號的字典，其中包含該產品編號的標準化分層命名清單。下面的例子。Python - 從包含值列表的字典中添加具有映射值的新列

dict = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']} 
df = pd.DataFrame({"product": [1, 2, 3]}) 
df['catagory'] = df['product'].map(dict) 
print(df)

我得到以下結果：

product  catagory 
0  1 [a, b, c, d] 
1  2 [w, x, y, z] 
2  3   NaN

我想獲取以下信息：

 product  cat1  cat2  cat3  cat4 
0  1   a  b  c   d 
1  2   w  x  y   z 
2  3   NaN  NaN  NaN  NaN

甚至更好：

 product  category 
0  1   d 
1  2   z 
2  3   NaN

我一直在努力只是爲了解析我們的一個項目字典中的列表並將其追加到數據框中，但只能根據此EXAMPLE找到映射包含列表中的一個項目的字典的建議。

任何幫助表示讚賞。

來源

2017-07-14 Rudabagle

這可能會有所幫助：https：//開頭stackoverflow.com/questions/32468402/how-to-explode-a-list-inside-a-dataframe-cell-into-separate-rows/32470490#32470490 – Alexander

再拿，apply，add_prefix，reset_index：

df_out = (df.set_index('product')['catagory'] 
    .apply(lambda x:pd.Series(x))) 

df_out.columns = df_out.columns + 1 

df_out.add_prefix('cat').reset_index()

輸出：

product cat1 cat2 cat3 cat4 
0  1 a b c d 
1  2 w x y z 
2  3 NaN NaN NaN NaN

要到下一個even better值存取：

(df.set_index('product')['catagory'] 
    .apply(lambda x:pd.Series(x)) 
    .stack(dropna=False) 
    .rename('category') 
    .reset_index() 
    .drop('level_1',axis=1) 
    .drop_duplicates() 
)

輸出：

product category 
0  1  a 
1  1  b 
2  1  c 
3  1  d 
4  2  w 
5  2  x 
6  2  y 
7  2  z 
8  3  NaN

來源

2017-07-15 06:26:01

注意：

不要使用保留字像list，type，dict ...作爲掩蔽因爲內置函數變量。

因此，如果使用：

#dict is variable name 
dict = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']} 
#create dictionary is not possible, because dict is dictionary 
print (dict(a=1, b=2)) 
{'a': 1, 'b': 2}

得到錯誤：

TypeError: 'dict' object is not callable

和調試是非常複雜的。（測試重新啓動IDE後）

所以請使用其他變量像d或categories：

d = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']} 
print (dict(a=1, b=2)) 
{'a': 1, 'b': 2}

我認爲你需要DataFrame.from_dict與join：

d = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']} 
df = pd.DataFrame({"product": [1, 2, 3]}) 
print (df) 
    product 
0  1 
1  2 
2  3 

df1 = pd.DataFrame.from_dict(d, orient='index') 
df1.columns = ['cat' + (str(i+1)) for i in df1.columns] 
print(df1) 
    cat1 cat2 cat3 cat4 
1 a b c d 
2 w x y z 

df2 = df.join(df1, on='product') 
print (df2) 
    product cat1 cat2 cat3 cat4 
0  1 a b c d 
1  2 w x y z 
2  3 NaN NaN NaN NaN

然後可以使用melt或stack：

df3 = df2.melt('product', value_name='category').drop('variable', axis=1) 
print (df3) 
    product category 
0   1  a 
1   2  w 
2   3  NaN 
3   1  b 
4   2  x 
5   3  NaN 
6   1  c 
7   2  y 
8   3  NaN 
9   1  d 
10  2  z 
11  3  NaN

df2 = df.set_index('product').join(df1) 
     .stack(dropna=False) 
     .reset_index(level=1, drop=True) 
     .rename('category') 
     .reset_index() 
print (df2) 
    product category 
0   1  a 
1   1  b 
2   1  c 
3   1  d 
4   2  w 
5   2  x 
6   2  y 
7   2  z 
8   3  NaN 
9   3  NaN 
10  3  NaN 
11  3  NaN

如果列category是df解決方案是類似的，只是有必要刪除行與NaN由DataFrame.dropna：

d = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']} 
df = pd.DataFrame({"product": [1, 2, 3]}) 
df['category'] = df['product'].map(d) 
print(df) 

df1 = df.dropna(subset=['category']) 
df1 = pd.DataFrame(df1['category'].values.tolist(), index=df1['product']) 
df1.columns = ['cat' + (str(i+1)) for i in df1.columns] 
print(df1) 
     cat1 cat2 cat3 cat4 
product      
1   a b c d 
2   w x y z 

df2 = df[['product']].join(df1, on='product') 
print (df2) 
    product cat1 cat2 cat3 cat4 
0  1 a b c d 
1  2 w x y z 
2  3 NaN NaN NaN NaN

來源

2017-07-15 07:26:24 jezrael

d = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']} 

#Split product to 4 columns 
df[['product']].join(
    df.apply(lambda x: pd.Series(d.get(x['product'],[np.nan])),axis=1) 
     .rename_axis(lambda x: 'cat{}'.format(x+1), axis=1) 
    ) 
Out[187]: 
    product cat1 cat2 cat3 cat4 
0  1 a b c d 
1  2 w x y z 
2  3 NaN NaN NaN NaN 

#only take the last element 
df['catagory'] = df.apply(lambda x: d.get(x['product'],[np.nan])[-1],axis=1) 

df 
Out[171]: 
    product catagory 
0  1  d 
1  2  z 
2  3  NaN

來源

2017-07-15 08:23:51 Allen

Python - 從包含值列表的字典中添加具有映射值的新列

回答

相關問題