2017-06-02 72 views
0

假設我有幾個文檔和一個df列,其中包含我需要搜索的特定單詞,那麼如何計算單詞在文檔中出現的次數?基於熊貓中的列和字符串進行計數

一個例子更好。

例子:

doc1 = "I am a cat that barks. I like dog food instead of cat food. Roff" 

doc2 = "Frog that barks. Frog like cats." 

df['words'] = ["dog","cat","frog"] 

enter image description here

尋找它變成一個DF,將這個樣子。

它看起來像這樣,但我意識到它只是循環到同一個單元格。所以我總是變得零。

for i in range(len(doc)): 
    for key, value in doc.items(): 
     for word in df['word']: 
      df['doc_' + str(i)] = value.count(word) 

回答

0
doc1 = "I am a cat that barks. I like dog food instead of cat food. Roff" 
doc2 = "Frog that barks. Frog like cats." 
strings = [doc1, doc2] 
words = ["dog","cat","frog"] 

def count_occ(word, sentence): 
    return sentence.lower().split().count(word)  

cts = [] 

def counts_df(strings, words):  
    for w in words: 
     for s in strings: 
      cts.append(count_occ(w, s)) 
    df = pd.DataFrame(np.array(cts).reshape((len(words), len(strings))), 
         index=words, 
         columns=['doc' + str(i) for i in range(1, len(strings) + 1)])  
    return df 

counts_df(strings, words) 
Out[61]: 
     doc1 doc2 
dog  1  0 
cat  2  0 
frog  0  2