2017-07-15 64 views
2

我正嘗試使用Pandas從原始文本文件創建數據幀。該文件包括3個類別,其中包含與類別名稱後面的每個類別相關的項目。我能夠根據類別創建一個系列,但不知道如何將每個項目類型與其各自的類別相關聯,並創建一個數據框。以下是我的初始代碼以及數據幀的所需輸出。你能否幫助指導我以正確的方式來做到這一點?Python Pandas使用文本文件創建數據幀

category = ['Fruits', 'Vegetables', 'Meats'] 

items='''Fruits 
apple 
orange 
pear 
Vegetables 
broccoli 
squash 
carrot 
Meats 
chicken 
beef 
lamb''' 

Category = pd.Series() 

i = 0 
for item in items.splitlines(): 
    if item in category: 
     Category = Category.set_value(i, item) 
     i += 1 
df = pd.DataFrame(Category) 
print(df) 

期望中的數據幀輸出:

Category Item 
Fruits  apple 
      orange 
      pear 
Vegetables broccoli 
      squash 
      carrot 
Meats  chicken 
      beef 
      lamb 

回答

0

考慮反覆追加到列表中,而不是一系列的字典。然後,將字典轉換爲數據框。

from io import StringIO 
import pandas as pd 

txtobj = StringIO('''Fruits 
apple 
orange 
pear 
Vegetables 
broccoli 
squash 
carrot 
Meats 
chicken 
beef 
lamb''') 

items = {'Category':[], 'Item':[]} 

for line in txtobj: 
    curr_line = line.replace('\n','') 
    if curr_line in ['Fruits','Vegetables', 'Meats']: 
     curr_category = curr_line  

    if curr_category != curr_line:  
     items['Category'].append(curr_category) 
     items['Item'].append(curr_line) 

df = pd.DataFrame(items).assign(key=1) 
print(df) 
#  Category  Item key 
# 0  Fruits  apple 1 
# 1  Fruits orange 1 
# 2  Fruits  pear 1 
# 3 Vegetables broccoli 1 
# 4 Vegetables squash 1 
# 5 Vegetables carrot 1 
# 6  Meats chicken 1 
# 7  Meats  beef 1 
# 8  Meats  lamb 1 

print(df['key'].groupby([df['Category'], df['Item']]).count())  
# Category Item  
# Fruits  apple  1 
#    orange  1 
#    pear  1 
# Meats  beef  1 
#    chicken  1 
#    lamb  1 
# Vegetables broccoli 1 
#    carrot  1 
#    squash  1 
# Name: key, dtype: int64 
+0

這工作出色。謝謝! – MBasith

1

這裏是不使用熊貓循環的解決方案:下面關鍵,因爲你需要一個數字對於這樣的分組是用來輸出期望的結果。

import pandas as pd 
category = ['Fruits', 'Vegetables', 'Meats'] 

items='''Fruits 
apple 
orange 
pear 
Vegetables 
broccoli 
squash 
carrot 
Meats 
chicken 
beef 
lamb''' 

in_df = pd.DataFrame(items.splitlines()) 

根據該行是否屬於類別創建組。

in_df = in_df.assign(group=in_df.isin(category).cumsum()) 

每個組中創建一個從第一行數據幀

cat_df = in_df.groupby('group').first() 

加入每組回到第一行的第二行,創建cateogry水果關係

df_out = in_df.groupby('group').apply(lambda x: x[1:]).reset_index(drop = True).merge(cat_df, left_on='group', right_index=True) 

降分組鍵和重命名列

df_out = df_out.drop('group',axis=1).rename(columns={'0_x':'Fruit','0_y':'Category'}) 
print(df_out) 

輸出:

 Fruit Category 
0  apple  Fruits 
1 orange  Fruits 
2  pear  Fruits 
3 broccoli Vegetables 
4 squash Vegetables 
5 carrot Vegetables 
6 chicken  Meats 
7  beef  Meats 
8  lamb  Meats 
2

使用:


category = ['Fruits', 'Vegetables', 'Meats'] 

items='''Fruits 
apple 
orange 
pear 
Vegetables 
broccoli 
squash 
carrot 
Meats 
chicken 
beef 
lamb''' 

df = pd.DataFrame({'Fruit':items.splitlines()}) 

mask = df['Fruit'].isin(category) 
df.insert(0,'Category', df['Fruit'].where(mask).ffill()) 
df = df[df['Category'] != df['Fruit']].reset_index(drop=True) 
print (df) 
    Category  Fruit 
0  Fruits  apple 
1  Fruits orange 
2  Fruits  pear 
3 Vegetables broccoli 
4 Vegetables squash 
5 Vegetables carrot 
6  Meats chicken 
7  Meats  beef 
8  Meats  lamb 

最後如果需要計數CategoriesFruits使用groupbysize

What is the difference between size and count in pandas?

df1 = df.groupby(['Category','Fruit']).size() 
print (df1) 
Category Fruit 
Fruits  apple  1 
      orange  1 
      pear  1 
Meats  beef  1 
      chicken  1 
      lamb  1 
Vegetables broccoli 1 
      carrot  1 
      squash  1 
dtype: int64