迭代字典中的多個值？

我有一個單詞列表和字典：迭代字典中的多個值？

word_list = ["it's","they're","there's","he's"]

並作爲在words_list的話如何頻繁地出現在幾個文件包含信息的字典：

dict = [('document1',{"it's": 0,"they're": 2,"there's": 5,"he's": 1}), 
('document2',{"it's": 4,"they're": 2,"there's": 3,"he's": 0}), 
('document3',{"it's": 7,"they're": 0,"there's": 4,"he's": 1})]

我想開發一個數據結構（數據幀，也許？），看起來像如下：

file  word  count 
document1 it's  0 
document1 they're  2 
document1 there's  5 
document1 he's  1 
document2 it's  4 
document2 they're  2 
document2 there's  3 
document2 he's  0 
document3 it's  7 
document3 they're  0 
document3 there's  4 
document3 he's  1

我試圖找到這些文檔中最常使用的是。我有900多個文件。

我在考慮類似如下：

res = {} 
for i in words_list: 
    count = 0 
    for j in dict.items(): 
     if i == j: 
       count = count + 1 
       res[i,j] = count

我在哪裏可以從這裏走？

來源

2015-11-04 blacksite

這不是一個字典死心塌地的線條。 – user2357112

您應該使用Python Pandas lib來創建您在帖子中顯示的數據框的類型。 –

我從哪裏開始？我應該看的任何方法？ – blacksite

好第一件事情，你的字典是不是一個字典，並且現在應建設成爲一個像這樣

d = {'document1':{"it's": 0,"they're": 2,"there's": 5,"he's": 1}, 
    'document2':{"it's": 4,"they're": 2,"there's": 3,"he's": 0}, 
    'document3':{"it's": 7,"they're": 0,"there's": 4,"he's": 1}}

有，我們實際上我們可以用大熊貓建立一個數據幀一本字典，而是在爲了以你想要的方式獲得它，我們將不得不從字典中建立一個列表清單。然後，我們將創建一個數據框和標記列，然後排序

import collections 
import pandas as pd 

d = {'document1':{"it's": 0,"they're": 2,"there's": 5,"he's": 1}, 
    'document2':{"it's": 4,"they're": 2,"there's": 3,"he's": 0}, 
    'document3':{"it's": 7,"they're": 0,"there's": 4,"he's": 1}} 

d = pd.DataFrame([[k,k1,v1] for k,v in d.items() for k1,v1 in v.items()], columns = ['File','Words','Count']) 
print d.sort(['File','Count'], ascending=[1,1]) 

     File Words Count 
1 document1  it's  0 
0 document1  he's  1 
3 document1 they're  2 
2 document1 there's  5 
4 document2  he's  0 
7 document2 they're  2 
6 document2 there's  3 
5 document2  it's  4 
11 document3 they're  0 
8 document3  he's  1 
10 document3 there's  4 
9 document3  it's  7

如果你想與前n次出現，那麼你可以使用groupby()，然後要麼排序

d = d.sort(['File','Count'], ascending=[1,1]).groupby('File').head(2) 

     File Words Count 
1 document1  it's  0 
0 document1  he's  1 
4 document2  he's  0 
7 document2 they're  2 
11 document3 they're  0 
8 document3  he's  1

時head() or tail()列表理解返回名單列表，看起來像這樣

d = [['document1', "he's", 1], ['document1', "it's", 0], ['document1', "there's", 5], ['document1', "they're", 2], ['document2', "he's", 0], ['document2', "it's", 4], ['document2', "there's", 3], ['document2', "they're", 2], ['document3', "he's", 1], ['document3', "it's", 7], ['document3', "there's", 4], ['document3', "they're", 0]]

爲了正確地建立字典，你只需要使用一些東西克

d['document1']['it\'s'] = 1

如果由於某種原因，你使用STR的元組和類型的字典的列表，你可以使用這個列表理解，而不是

[[i[0],k1,v1] for i in d for k1,v1 in i[1].items()]

來源

2015-11-04 21:19:45 SirParselot

很好的答案。一個問題：'d.sort（['File'，'Count']，升序= [1,1]）'也會改變索引。你爲什麼要這樣做的任何特殊原因？ –

@JoeR我只是改變了它，所以文件從低到高的順序，然後設置相同的計數。這不是必要的，但我認爲它看起來好一點。 – SirParselot

這樣的事情呢？第一

word_list = ["it's","they're","there's","he's"] 

frequencies = [('document1',{"it's": 0,"they're": 2,"there's": 5,"he's": 1}), 
('document2',{"it's": 4,"they're": 2,"there's": 3,"he's": 0}), 
('document3',{"it's": 7,"they're": 0,"there's": 4,"he's": 1})] 

result = [] 
for document in frequencies: 
    for word in word_list: 
     result.append({"file":document[0], "word":word,"count":document[1][word]}) 

print result

來源

2015-11-04 20:53:12 Jephron

我得到以下錯誤：'TypeError：字符串索引必須是整數，而不是str'。我不能使用這個詞本身來索引 – blacksite

您是否使用與我相同的數據運行代碼？唯一可能失敗的地方是'document [1] [word]'，並且'document [1]'中的所有鍵都是提供的數據中的字符串。不應該失敗。編輯：第二個想到的錯誤意味着你試圖訪問另一個字符串的字符串的元素。你的頻率是否包含任何原始字符串？ – Jephron

我不這麼認爲。從字面上看，這雖然比我使用的實際數據簡單得多。它遵循完全相同的語法結構，但「頻率」只是方式更容易談論 – blacksite

迭代字典中的多個值？

回答

相關問題