爆炸一行到多行大熊貓數據幀

我有下面的頭一個數據幀：爆炸一行到多行大熊貓數據幀

id, type1, ..., type10, location1, ..., location10

，我想將它轉換爲：

id, type, location

我能夠做到這一點使用嵌入for循環，但它很慢：

new_format_columns = ['ID', 'type', 'location'] 
new_format_dataframe = pd.DataFrame(columns=new_format_columns) 



print(data.head()) 
new_index = 0 
for index, row in data.iterrows(): 
     ID = row["ID"] 

     for i in range(1,11): 
       if row["type"+str(i)] == np.nan: 
         continue 
       else: 
         new_row = pd.Series([ID, row["type"+str(i)], row["location"+str(i)]]) 
         new_format_dataframe.loc[new_index] = new_row.values 
         new_index += 1

任何使用本地熊貓功能的改進建議？

來源

2016-10-04 MedAli

你的數據集有多大？ – MMF

@MMF現在幾GB – MedAli

您可以使用lreshape：

types = [col for col in df.columns if col.startswith('type')] 
location = [col for col in df.columns if col.startswith('location')] 

print(pd.lreshape(df, {'Type':types, 'Location':location}, dropna=False))

樣品：

import pandas as pd 

df = pd.DataFrame({ 
'type1': {0: 1, 1: 4}, 
'id': {0: 'a', 1: 'a'}, 
'type10': {0: 1, 1: 8}, 
'location1': {0: 2, 1: 9}, 
'location10': {0: 5, 1: 7}}) 

print (df) 
    id location1 location10 type1 type10 
0 a   2   5  1  1 
1 a   9   7  4  8 

types = [col for col in df.columns if col.startswith('type')] 
location = [col for col in df.columns if col.startswith('location')] 

print(pd.lreshape(df, {'Type':types, 'Location':location}, dropna=False)) 
    id Location Type 
0 a   2  1 
1 a   9  4 
2 a   5  1 
3 a   7  8

雙melt另一種解決方案：

print (pd.concat([pd.melt(df, id_vars='id', value_vars=types, value_name='type'), 
        pd.melt(df, value_vars=location, value_name='Location')], axis=1) 
     .drop('variable', axis=1)) 

    id type Location 
0 a  1   2 
1 a  4   9 
2 a  1   5 
3 a  8   7

編輯：

lreshape現在沒有記錄，但將來可能會被刪除（with pd.wide_to_long too）。

可能的解決方案是將所有3個功能合併到一個 - 也許melt，但現在它不實現。也許在一些新版熊貓中。然後我的答案將被更新。

來源

2016-10-04 13:14:59 jezrael

爆炸一行到多行大熊貓數據幀

回答

相關問題