蟒蛇傳遞

我有像蟒蛇（真正的一個是巨大的，我不僅可以通過看它這樣做）以下列表：蟒蛇傳遞

original1=[['email', 'tel', 'fecha', 'descripcion', 'categ'], 
      ['[email protected]', '1', '2014-08-06 00:00:06', 'MySpace a', 'animales'], 
      ['[email protected]', '1', '2014-08-01 00:00:06', 'My Space a', 'ropa'], 
      ['[email protected]', '2', '2014-08-06 00:00:06', 'My Space b', 'electronica'], 
      ['[email protected]', '3', '2014-08-10 00:00:06', 'Myace c', 'animales'], 
      ['[email protected]', '4', '2014-08-10 00:00:06', 'Myace c', 'animales']]

我的數據和名稱工作之間拆分它數據：

datos=original1[-(len(original1)-1):len(original1)]

我需要做的是擁有所有的副本一起，考慮電子郵件和電話的字典，但我需要應用及物：由於線0 =行，如果我們考慮到的電子郵件2，但還行1如果我們考慮電話，並且第一行=如果我們再次考慮電子郵件的第三行，我需要坦白地說在這種情況下，ate是0,1,2和3，而4是單獨的。

我創建了下面的代碼：

from collections import defaultdict 
email_to_indices = defaultdict(list) 
phone_to_indices = defaultdict(list) 

for idx, row in enumerate(datos): 
    email = row[0].lower() 
    phone = row[1] 
    email_to_indices[email].append(idx) 
    phone_to_indices[phone].append(idx)

所以現在我需要申請轉移性規則，扎堆0-3，獨自4

如果打印

print 'email', email_to_indices 
print 'phone', phone_to_indices

您將獲得：

email defau （，{'[email protected]'：[0，2]，'[email protected]'： [1,3]，'[email protected]'：[4]}）

電話defaultdict（{ '1'：[0，1]， '3'：[3]， '2'：[2]， '4'：[4]}）

不要知道如何獲得那些考慮傳遞性財產的人的聯合。我需要這樣的：

first_group：0，1，2，3]
second_group：[4]

謝謝！

來源

2014-08-27 GabyLP

應該是什麼輸出字典的關鍵？它是否應該爲每個唯一的電子郵件和電話設置一個密鑰，每個電子郵件和電話都引用相同的數據列表，還是應該存在某種由所有電子郵件和數字重疊構成的合併密鑰？你的預期產出是多少？ – 2014-08-27 18:16:40

在我看來，行動的自然過程將是類似於[[line0，line1，line2，line3]，[line4]]的數據結構' – 2014-08-27 18:28:21

Adam，事情就是這是一個例子，真正的表是巨大的。這就是我編寫代碼的原因。 – GabyLP 2014-08-27 18:29:55

在這裏你有一個圖表，或更準確。節點有兩種類型：電子郵件和電話。如果有電子郵件和電話的記錄，則連接兩個節點。或者我們甚至可以說記錄本身就是連接兩個節點的邊緣。

任務是找到此圖的Connected components。通過下面的鏈接，你可以找到可以在線性時間內完成的算法。

當然，一些快速和骯髒的解決方案也可以發明，甚至可以被認爲是適當的，如果你的數據集足夠小。

你可以在這裏找到一些Python實現：Python connected components

UPDATE：這裏是你如何能構建圖表的例子：

graph = {}; 
EMAIL = "email"; 
PHONE = "phone"; 

for rec in datos: 
    graph.setdefault((EMAIL, rec[0]), set()).add((PHONE, rec[1])); 
    graph.setdefault((PHONE, rec[1]), set()).add((EMAIL, rec[0])); 

print "\n".join("%s: %s" % (str(node), str(linkedNodes)) for (node, linkedNodes) in graph.iteritems());

所以每個節點都有一個類型（EMAIL或PHONE ，它們實際上可以只是整數，例如0和1，我只爲了打印效果而使它們成爲字符串）和一個值。圖是一個字典，其中節點作爲鍵和連接節點集作爲值。

來源

2014-08-27 18:37:01

嗨安東，這將是一個好主意，但有沒有python模塊這樣做？那會是怎樣呢？ – GabyLP 2014-08-27 18:41:52

@GabyP我認爲不是，但[快速搜索]（https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=python%20connected%20components）給出了很多的信息，包括有關SO的相關問題。 – 2014-08-27 18:44:30

這是另一種方法：

當你正在構建的email_to_indices字典，你可以存儲在該行的值的電話號碼，然後讓phone_to_indices有行的索引。這樣我們創建一個email_to_indices到phone_to_indices來索引行圖。

隨着這一修改和基本操作集，我能得到你想要的究竟是什麼：

from collections import defaultdict 

email_to_indices = defaultdict(list) 
phone_to_indices = defaultdict(list) 
combined = defaultdict(set) 

original=[['email', 'tel', 'fecha', 'descripcion', 'categ'], 
      ['[email protected]', '1', '2014-08-06 00:00:06', 'MySpace a', 'animales'], 
      ['[email protected]', '1', '2014-08-01 00:00:06', 'My Space a', 'ropa'], 
      ['[email protected]', '2', '2014-08-06 00:00:06', 'My Space b', 'electronica'], 
      ['[email protected]', '3', '2014-08-10 00:00:06', 'Myace c', 'animales'], 
      ['[email protected]', '4', '2014-08-10 00:00:06', 'Myace c', 'animales']] 


for idx, row in enumerate(original[1:], start=1): 
    email = row[0].lower() 
    phone = row[1] 
    email_to_indices[email].append(phone) # Here is what I changed 
    phone_to_indices[phone].append(idx) 

random_key = 0 
for idx, row in enumerate(original[1:], start=1): 
    grouped_rows = [] 
    if row[0].lower() in email_to_indices: 
     for phone_no in email_to_indices[row[0].lower()]: 
      grouped_rows.extend(phone_to_indices[phone_no]) 

    if len(combined[random_key]) > 0 and len(set(grouped_rows).intersection(combined[random_key])) > 0: 
     combined[random_key].update(set(grouped_rows)) 
    elif len(combined[random_key]) > 0: 
     random_key += 1 
     combined[random_key].update(set(grouped_rows)) 
    else: 
     combined[random_key].update(set(grouped_rows)) 

print combined

這給：

defaultdict(<type 'set'>, {0: set([1, 2, 3, 4]), 1: set([5])})

來源

2014-08-27 19:38:52 shaktimaan

回答

相關問題