2017-04-09 44 views
2

我有一個數據幀與我的索引「國家」 我想改變多個國家的名字,我有舊/新值在字典中,如下圖所示:更改值

我試着分割列表中的值和列表,並且這也行不通。代碼沒有錯誤,但我的數據框中的值沒有改變。

`import pandas as pd 
import numpy as np 

energy = (pd.read_excel('Energy Indicators.xls', 
         skiprows=17, 
         skip_footer=38)) 

energy = (energy.drop(energy.columns[[0, 1]], axis=1)) 
energy.columns = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']   
energy['Energy Supply'] = energy['Energy Supply'].apply(lambda x: x*1000000) 

#This code isn't working properly 
energy['Country'] = energy['Country'].replace({'China, Hong Kong Special Administrative Region':'Hong Kong', 'United Kingdom of Great Britain and Northern Ireland':'United Kingdom', 'Republic of Korea':'South Korea', 'United States of America':'United States', 'Iran (Islamic Republic of)':'Iran'})` 

已解決:這是我沒有注意到的數據問題。

energy['Country'] = (energy['Country'].str.replace('\s*\(.*?\)\s*', '').str.replace('\d+','')) 

這條線的「問題」線下坐着的時候,居然被要求它清理乾淨,替換前可以工作。例如。美利堅合衆國20實際上是在Excel文件中,因此替換跳過它

感謝您的幫助!

+0

鈣添加一些數據樣品?我測試是,它完美的作品。 – jezrael

+0

我沒有密碼:( – jezrael

+0

請修改回答。 – jezrael

回答

3

您需要通過replace刪除supercript:

d = {'China, Hong Kong Special Administrative Region':'Hong Kong', 
    'United Kingdom of Great Britain and Northern Ireland':'United Kingdom', 
    'Republic of Korea':'South Korea', 'United States of America':'United States', 
    'Iran (Islamic Republic of)':'Iran'} 

energy['Country'] = energy['Country'].str.replace('\d+', '').replace(d) 

你也可以提高你的解決方案 - 使用參數usecols用於過濾列和names爲設置新的列名:

names = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable'] 

energy = pd.read_excel('Energy Indicators.xls', 
         skiprows=17, 
         skip_footer=38, 
         usecols=range(2,6), 
         names=names) 


d = {'China, Hong Kong Special Administrative Region':'Hong Kong', 
    'United Kingdom of Great Britain and Northern Ireland':'United Kingdom', 
    'Republic of Korea':'South Korea', 'United States of America':'United States', 
    'Iran (Islamic Republic of)':'Iran'} 

#for multiple is faster use * 
energy['Energy Supply'] = energy['Energy Supply'] * 1000000 
energy['Country'] = energy['Country'].str.replace('\d', '').replace(d) 
#print (energy) 
+0

剛剛發現它,並添加到答案頂部。謝謝:) –

+0

謝謝。我也嘗試改進一下你的解決方案,請檢查它。如果我的回答很有幫助,請不要忘記[接受](http://meta.stackexchange.com/a/5235/295067)它。謝謝。 – jezrael

+0

太棒了,謝謝 –