熊貓條件地串聯

鑑於以下數據幀：熊貓條件地串聯

df = pd.DataFrame({'foo':['[a]','[b'], 
        'bar':['[a','[b]']}) 
df 
    bar foo 
0 [a] [a 
1 [b [b]

欲右括號「]」添加到那些其中值缺乏它們的細胞。期望的結果是：

bar foo 
0 [a] [a] 
1 [b] [b]

不過，我不確定有多少列將有，所以我想將它應用到整個數據幀。

我開始了這一點，但沒有運氣：

df2 = df(lambda x: str(x)+"]" if (len(x)<3))

提前感謝！

更新：我實際上用一個看起來像這樣的表工作：

0  1  2 
0 b [r] None None 
1 c [d d [r] f[d] 
2 g [r] h [d] None 
3 m [r p [d] None 
4 b [r] n [d 
5 m [d] a [r] None

來源

2016-03-15 Dance Party2

感謝您的好評。我注意到，當一些單元格包含「無」或是空白時，我得到以下錯誤：IndexError：字符串索引超出範圍。我應該在原始問題中列出這張表，但我認爲這不重要。我會在上面張貼它。 –

您可以通過列使用循環，因爲string功能與Series工作。通過~使用indexing with str與loc

print df 
    bar foo 
0 [a [a] 
1 [b] [b 

for cols in df.columns: 
    #print df[cols].str[-1] != ']' 
    df.loc[df[cols].str[-1] != ']', cols] = df[cols] + ']' 
print df 
    bar foo 
0 [a] [a] 
1 [b] [b]

或者使用contains與反轉掩碼：

for cols in df.columns: 
    df.loc[~df[cols].str[-1].str.contains(']'), cols] = df[cols] + ']' 
print df 
    bar foo 
0 [a] [a] 
1 [b] [b]

感謝您的root使用endswith評論：

for cols in df.columns: 
    df.loc[~df[cols].str.endswith(']'), cols] = df[cols] + ']' 
print df

編輯：

如果有空strings和none值：

print df 
0 [a  
1 [b] [b 
2 [a None 

for col in df.columns: 
    df.loc[~df[col].str.endswith(']').replace({np.nan: False}), col] = df[col] + ']' 
    df[col] = df[col].replace({']': ''}) 

print df 
    bar foo 
0 [a]  
1 [b] [b] 
2 [a] NaN

來源

2016-03-15 17:01:04 jezrael

我認爲使用'endswith'可能比'contains'更簡單。例如。 '〜df [cols] .str.endswith（']'）' – root

對不起，我應該在這裏發表評論;如果有一個空白的單元格呢？這似乎給我一個索引錯誤：字符串索引超出範圍。我試過這個但沒有骰子：df2.loc [（〜df2 [cols] .str [-1] .str.endswith（']'））＆（〜pd.isnull [cols]），cols] = df2 [cols ] +']' –

如果有'空'字符串，可以輸出'NaN'？ – jezrael

讓我們來了解DataFrame.applymap()功能

df.applymap(func_reference)

上面的一行將拜會每個單元的func_reference在df。現在我們可以設計我們的func_reference。

def my_filter(cell): 
    if cell[-1] == ']': 
     return cell 
    return cell + ']' 

filtered_df = df.applymap(my_filter)

這可能不是最有效的方法，但我認爲它很可讀。

來源

2016-03-15 17:06:55 Mai

熊貓條件地串聯

回答

相關問題