2017-09-14 73 views
0

請幫我弄清楚如何做到這一點。我有一個數據框。在「指標」欄中有一堆不同的參數(字符串),但我只需要「生活滿意度」。我不知道如何刪除其他指標,如「沒有基礎設施的住房」及其相應的價值觀和國家。從其他列刪除字符串行及其相應的值

import numpy as np 
import pandas as pd 

oecd_bli = pd.read_csv("/Users/vladelec/Desktop/Life.csv") 
df = pd.DataFrame(oecd_bli) 
df.drop(df.columns[[0,2,4,5,6,7,8,9,10,11,12,13,15,16]], axis=1, inplace=True) 
#dropped other columns that a do not need 

這裏是我的數據框的截圖:

Example of Dataframe

+0

你不需要做'oecd_bli = pd.read_csv( 「/用戶/ vladelec /桌面/ Life.csv」) DF多重比較= pd.DataFrame(oecd_bli)'只有第一行。 – GiantsLoveDeathMetal

+0

[刪除基於列值的Pandas中的DataFrame行]可能的副本(https://stackoverflow.com/questions/18172851/deleting-dataframe-row-in-pandas-based-on-column-value) – GiantsLoveDeathMetal

回答

1

你可以在你的數據加載像這樣:

file_name = "/Users/vladelec/Desktop/Life.csv" 

# Columns you want to load 
keep_cols = ['Country', 'Indicator'] 

# pd.read_csv() will load the data into a pd.DataFrame 
oecd_bli = pd.read_csv(file_name, usecols=keep_cols) 

如果只想"Life Satisfaction"Indicator那麼你就可以請執行以下操作:

oecd_bli = oecd_bli[oecd_bli['Indicator'] == "Life Satisfaction"] 

如果您有更多的Indicators你想保持,那麼你可以這樣做:

keep_indicators = [ 
    "Life Satisfaction", 
    "Crime Indicator", 
] 

oecd_bli = oecd_bli[oecd_bli['Indicator'].isin(keep_indicators)] 
+0

謝謝你man爲您的答案! –

+0

不要忘記接受答案 – GiantsLoveDeathMetal

0

@GiantsLoveDeathMetal具有很好的點。原則上,您可以讀取oecd_bli中的原始數據,並選擇滿足某些條件的DataFrame的子集。

演示

import pandas as pd 


# Given a DataFrame of raw data 
d = { 
    "Country": pd.Series(["Australia", "Austria", "Fiji", "Japan"]), 
    "Indicator": pd.Series(["Dwellings ...", "Dwellings ...", "Life ...", "Life ..."]), 
    "Value": pd.Series([1.1, 1.0, 2.2, 2.9]), 
} 

oecd_bli = pd.DataFrame(d, columns=["Country", "Indicator", "Value"]) 
oecd_bli 

enter image description here

# Select rows starting with "Life" in column "Indicator" 
condition = oecd_bli["Indicator"].str.startswith("Life") 
subset = oecd_bli[condition] 
subset 

enter image description here

可替代地,通過選擇.loc使用標籤的索引的子集:

subset = oecd_bli.loc[condition, :] 

這裏loc預計[<rows>, <columns>]。因此,顯示符合條件的那些行。


詳細

通知數據幀的視圖被呈現的每一行,給出了一個True條件。這是因爲DataFrame響應boolean arrays。一個布爾陣列的

實施例:

>>> condition = oecd_bli["Indicator"].str.startswith("Life") 
>>> condition 

0 False 
1 False 
2  True 
3  True 
Name: Indicator, dtype: bool 

其他方式設置條件:

>>> condition = oecd_bli["Indicator"] == "Life ..." 
>>> condition = ~oecd_bli["Indicator"].str.startswith("Dwell") 
>>> condition = oecd_bli["Indicator"].isin(["Life ...", "Crime ..."]) 
>>> condition = (oecd_bli["Indicator"] == "Life ...") | (oecd_bli["Indicator"] == "Crime ...") 
  1. 直接平等(==
  2. 排除(~)不希望出現
  3. 包括通過列入白名單的列
  4. 與邏輯位運算符(|&等)
相關問題