2016-09-27 86 views
-1

我有這個樣本數據...變化數據結構數據幀

import pandas as pd 

from StringIO import StringIO 

stock_list="""EAN code, name, stock 
, MONIN Syrups, 
12345, Monin Mojito Mint Syrup 250 ml, 100 
, BONNE MAMAN, 
7890. Bonne Maman Strawberry Preserve 370g, 200 
6543, Bonne Maman Raspberry 370g, 150""" 

audit = pd.read_csv(StringIO(stock_list), sep=",") 

如果EAN碼是「非數字」,那麼它實際上是產品類型。因此,應將產品名稱「MONIN糖漿」移動到下列產品的類型欄中,直到下一個NaN。 最終的名單將是這個樣子......

expected_list="""type, EAN code, name, stock 
MONIN Syrups, 12345, Monin Mojito Mint Syrup 250 ml, 100 
BONNE MAMAN, 7890, Bonne Maman Strawberry Preserve 370g, 200 
BONNE MAMAN, 6543, Bonne Maman Raspberry 370g, 150""" 

pd.read_csv(StringIO(expected_list), sep=",") 

如何把當前的「stock_list」數據幀,並改變它這樣一種方式,它看起來像expected_list?

回答

3

name複製柱type柱,明確元素爲NaN和ffill()它:

import pandas as pd 

from io import StringIO 

stock_list="""EAN code, name, stock 
, MONIN Syrups, 
12345, Monin Mojito Mint Syrup 250 ml, 100 
, BONNE MAMAN, 
7890, Bonne Maman Strawberry Preserve 370g, 200 
6543, Bonne Maman Raspberry 370g, 150""" 


audit = pd.read_csv(StringIO(stock_list), sep=",", skipinitialspace=True) 


audit["type"] = audit["name"] 

mask = ~audit["EAN code"].isnull() 
audit.loc[mask, "type"] = np.nan 
audit["type"].ffill(inplace=True) 
res = audit.loc[mask].reset_index(drop=True) 
print(res) 

輸出:

EAN code         name stock   type 
0 12345.0  Monin Mojito Mint Syrup 250 ml 100.0 MONIN Syrups 
1 7890.0 Bonne Maman Strawberry Preserve 370g 200.0 BONNE MAMAN 
2 6543.0   Bonne Maman Raspberry 370g 150.0 BONNE MAMAN