我正在尋找熊貓的布爾等價物to_numeric（）我希望函數將列轉換爲True/False/nan（如果可能的話），並且如果不拋出錯誤。布爾等效熊貓to_numeric（）

我的動機是我需要自動識別和轉換數據集中的布爾列，列數爲1000列。我可以做類似的事情，使用下面的代碼彩車/整數：

df = df_raw.apply(pd.to_numeric, errors='ignore')

來源

2017-05-04 Selah

你想看起來像' 'True''和'' False''字符串轉換爲TRUE;返回FALSE然後確定整個列是否只是「真」和「假」？ – piRSquared

astype是pd.to_numeric一個更具體的版本：

df = df_raw.astype('bool')

來源

2017-05-04 17:52:40 ayhan

這就是我在想...但是，布爾（3.14）'是'真實的... ...那是OP想要的嗎？我不確定。加一因爲你搖滾，這是有用的。 – piRSquared

是的，我想我沒有仔細閱讀過這個問題。謝謝。 :) – ayhan

由於pd.to_numeric主要用於字符串轉換爲數值，我將在假設你想轉換字符串布爾值的情況下工作。

考慮數據框df

df = pd.DataFrame([ 
     ['1', None, 'True'], 
     ['False', 2, True] 
    ]) 

print(df) 

     0 1  2 
0  1 NaN True 
1 False 2.0 True

我的選擇
這就是我建議。接下來，我將其分解以試圖解釋發生了什麼。

def try_eval2(x): 
    if type(x) is str: 
     try: 
      x = literal_eval(x) 
     except: 
      x = np.nan 

    if type(x) is not bool: 
     x = np.nan 

    return x 

vals = df.values 
v = vals.ravel() 
a = np.array([try_eval2(x) for x in v.tolist()], dtype=object) 
pd.DataFrame(a.reshape(vals.shape), df.index, df.columns) 

     0 1  2 
0 NaN NaN True 
1 False NaN True

定時
你會發現，我所提出的解決方案是非常快

%%timeit 
vals = df.values 
v = vals.ravel() 
a = np.array([try_eval2(x) for x in v.tolist()], dtype=object) 
pd.DataFrame(a.reshape(vals.shape), df.index, df.columns) 
10000 loops, best of 3: 149 µs per loop 

%timeit df.astype(str).applymap(to_boolean) 
1000 loops, best of 3: 1.28 ms per loop 

%timeit df.astype(str).stack().map({'True':True, 'False':False}).unstack() 
1000 loops, best of 3: 1.27 ms per loop

說明

步驟1
現在，我將創建一個使用ast.literal_eval將字符串轉換爲數值

from ast import literal_eval 

def try_eval(x): 
    try: 
     x = literal_eval(x) 
    except: 
     pass 
    return x

步驟2
applymap與我的新功能的簡單功能。它會看起來一樣！

d1 = df.applymap(try_eval) 
print(d1) 

     0 1  2 
0  1 NaN True 
1 False 2.0 True

步驟3
使用where和applymap再次尋找到值實際上bool

d2 = d1.where(d1.applymap(type).eq(bool)) 
print(d2) 

     0 1  2 
0 NaN NaN True 
1 False NaN True

步驟4
您可以刪除列的所有NaN

print(d2.dropna(1, 'all')) 

     0  2 
0 NaN True 
1 False True

來源

2017-05-04 18:14:10 piRSquared

我想我後來檢查了這一點，發現他們接近等效時間。我會嘗試迴圈並展示它。 – piRSquared

您需要replace與where，其中替換到NaN都不boolean：

df = df.replace({'True':True,'False':False}) 
df = df.where(df.applymap(type) == bool)

老溶液（很慢）：

你可以astype到字符串如果一些布爾值在df,applymap自定義功能和ast.literal_eval用於轉換：

from ast import literal_eval 

def to_boolean(x): 
    try: 
     x = literal_eval(x) 
     if type(x) == bool: 
      return x 
     else: 
      return np.nan 
    except: 
     x = np.nan 
    return x 

print (df.astype(str).applymap(to_boolean)) 
#with borrowing sample from piRSquared 
     0 1  2 
0 NaN NaN True 
1 False NaN True

時序：

In [76]: %timeit (jez(df)) 
1 loop, best of 3: 488 ms per loop 

In [77]: %timeit (jez2(df)) 
1 loop, best of 3: 527 ms per loop 

#piRSquared fastest solution 
In [78]: %timeit (pir(df)) 
1 loop, best of 3: 5.42 s per loop 

#maxu solution 
In [79]: %timeit df.astype(str).stack().map({'True':True, 'False':False}).unstack() 
1 loop, best of 3: 1.88 s per loop 

#jezrael ols solution 
In [80]: %timeit df.astype(str).applymap(to_boolean) 
1 loop, best of 3: 13.3 s per loop

代碼定時：

df = pd.DataFrame([ 
     ['True', False, '1', 0, None, 5.2], 
     ['False', True, '0', 1, 's', np.nan]]) 

#[20000 rows x 60 columns] 
df = pd.concat([df]*10000).reset_index(drop=True) 
df = pd.concat([df]*10, axis=1).reset_index(drop=True) 
df.columns = pd.RangeIndex(len(df.columns)) 
#print (df)

def to_boolean(x): 
    try: 
     x = literal_eval(x) 
     if type(x) == bool: 
      return x 
     else: 
      return np.nan 
    except: 
     x = np.nan 
    return x 


def try_eval2(x): 
    if type(x) is str: 
     try: 
      x = literal_eval(x) 
     except: 
      x = np.nan 

    if type(x) is not bool: 
     x = np.nan 

    return x

def pir(df): 
    vals = df.values 
    v = vals.ravel() 
    a = np.array([try_eval2(x) for x in v.tolist()], dtype=object) 
    df2 = pd.DataFrame(a.reshape(vals.shape), df.index, df.columns) 
    return (df2) 

def jez(df): 
    df = df.replace({'True':True,'False':False}) 
    df = df.where(df.applymap(type) == bool) 
    return (df) 

def jez2(df): 
    df = df.replace({'True':True,'False':False}) 
    df = df.where(df.applymap(type).eq(bool)) 
    return (df)

來源

2017-05-04 18:38:32 jezrael

我用@ piRSquared的樣品DF：

In [39]: df 
Out[39]: 
     0 1  2 
0  1 NaN True 
1 False 2.0 True 

In [40]: df.astype(str).stack().map({'True':True, 'False':False}).unstack() 
Out[40]: 
     0 1  2 
0 NaN NaN True 
1 False NaN True

來源

2017-05-04 19:08:02 MaxU

布爾等效熊貓to_numeric（）

回答

說明

相關問題