找出失配之間兩個列值

-1

user  data        dep      
1  ['dep_78','fg7uy8']    78 
2  ['the_dep_45','34_dep','re23u'] 45 
3  ['fhj56','dep_89','hgjl09']  91

我想集中列的「數據」與含有字符串值「DEP」，看看是否數量附加到該字符串與「dep」列中的數字匹配。例如，用戶1的數據列中的dep_78與dep列中的dep 78匹配。我想輸出不匹配的行。所以結果應該給我 -

user  data      dep 
2  ['the_dep_45','34_dep'] 45 
3  ['dep_89']    91

的問題是隻取特定值的數據列與字符串「DEP」，然後比較附有與「DEP」列這些字符串的數字。

來源

2017-08-07 ComplexData

在「data」列中包含「dep」的所有字符串附加的數字應該與「dep」列中的數字匹配。數據中的dep_89與dep列中的91不匹配。 – ComplexData

這是我在電話上看的錯，我錯過了第一個街區的'dep'。不過，我認爲你的第一步是將數據中的字符串分開？爲什麼你首先有這種格式的數據框？ – roganjosh

你能爲你的問題提供一些背景嗎？你試過什麼了？爲什麼不按照給你的建議重構你的數據框[這裏]（https://stackoverflow.com/questions/45552952/extracting-specific-rows-from-a-data-frame/45553169#45553169）？ – RagingRoosevelt

-1

你可以做到這一點

def select(row): 
    keystring = 'dep_'+str(row['dep']) 
    result = [] 
    for one in row['data']: 
     if (one!=keystring)&('dep' in one): 
      result.append(one) 
    return result 

df['data'] =df.apply(lambda x:select(x),axis=1) 
df['datalength'] = df['data'].map(lambda x:len(x)) 
result = df[df['datalength']>0][df.columns[:3]] 
print(result) 
    user     data dep 
1  2 [the_dep_45, 34_dep] 45 
2  3    [dep_89] 91

來源

2017-08-07 21:31:54

'[]'在這裏不太理想。解決方案當然是要修復最初的DF？我不明白爲什麼所有東西都放在一列中 – roganjosh

@roganjosh你可以直接過濾它們。 –

好的，但爲什麼用這種方法打擾熊貓呢？它運行在python時間，所以你不妨使用'for'循環 – roganjosh

這個怎麼樣？

import re 

r = re.compile('\d+') 

idx = df.apply(lambda x: str(x['dep']) in r.search(x['data']).group(0), axis=1) 

0  True 
1  True 
2 False 
dtype: bool 


df[idx] 

    user        data dep 
0  1    ['dep_78','fg7uy8'] 78 
1  2 ['the_dep_45','34_dep','re23u'] 45

來源

2017-08-07 21:47:18

TypeError：（'期望的字符串或緩衝區'，在索引0'發生了'） – ComplexData

它適用於您提供的樣本數據 –

找出失配之間兩個列值

回答

相關問題