如何計算Python中特定目錄中文本文件的唯一字數？

-2

即時通訊編寫報告，我需要計算文本文件的唯一字。如何計算Python中特定目錄中文本文件的唯一字數？

我的文本是在d：\ shakeall，他們是完全42文件...

我知道了一些關於Python，但我不知道現在該做什麼。

這是我知道它是如何工作的。

讀取目錄
文件組成的文本單詞列表
總計數/唯一詞

我所知道的就是這個。和一些關於，而列表和索引，變量，列表...

我想要做的是使我自己的函數庫，並使用它來獲得結果。

我真的很感激任何關於我的問題的建議。

------ p.s。

我對Python幾乎一無所知。我只能做一個簡單的數學或在列表中打印單詞。給我的主題太難了。抱歉。

來源

2012-08-07 rocksland

你能在這裏發表您編寫任何代碼，所以我們可以看到你已經嘗試了什麼？ – 2012-08-07 09:05:40

創建一個空的['set']（http://docs.python.org/library/stdtypes.html#set-types-set-frozenset）並用文件中的文字循環填充它。那麼該集的'len'將是唯一的字數。查看['os.listdir']（http://docs.python.org/library/os.html#os.listdir）以獲取文件迭代。 – 2012-08-07 09:16:56

textfile=open('somefile.txt','r') 
text_list=[line.split(' ') for line in textfile] 
unique_words=[word for word in text_list if word not in unique_words] 
print(len(unique_words))

這是它的一般要點

來源

2012-08-07 09:15:12 Snaaa

import os 
uniquewords = set([]) 

for root, dirs, files in os.walk("D:\\shakeall"): 
    for name in files: 
     [uniquewords.add(x) for x in open(os.path.join(root,name)).read().split()] 

print list(uniquewords) 
print len(uniquewords)

來源

2012-08-07 09:15:35

如何計算Python中特定目錄中文本文件的唯一字數？

回答

相關問題