刪除標點符號/數字從文本問題

我有一些代碼工作正常刪除標點符號/數字在Python中使用正則表達式，我不得不改變代碼有點，所以停止列表工作，並不特別重要。無論如何，現在這個標點符號並沒有被刪除，坦率地說，我很難理解爲什麼。刪除標點符號/數字從文本問題

import re 
import nltk 

# Quran subset 
filename = raw_input('Enter name of file to convert to ARFF with extension, eg. name.txt: ') 

# create list of lower case words 
word_list = re.split('\s+', file(filename).read().lower()) 
print 'Words in text:', len(word_list) 
# punctuation and numbers to be removed 
punctuation = re.compile(r'[-.?!,":;()|0-9]') 
for word in word_list: 
    word = punctuation.sub("", word) 
print word_list

爲什麼它不工作將是巨大的任何指針，我在Python中沒有專家，所以它可能是一些可笑的愚蠢。謝謝。

來源

2011-04-01 Alex

變化

for word in word_list: 
    word = punctuation.sub("", word)

到

word_list = [punctuation.sub("", word) for word in word_list]

分配到在上面的for-loopword，簡單地改變由該臨時變量引用的值。它不會改變word_list。

來源

2011-04-01 11:37:05 unutbu

你沒有更新你的單詞列表。嘗試

for i, word in enumerate(word_list): 
    word_list[i] = punctuation.sub("", word)

請記住，雖然word開始了作爲一個word_list參考字符串對象，分配重新綁定名稱word由sub函數返回新的字符串對象。它不會更改最初引用的對象。

來源

2011-04-01 11:35:18

刪除標點符號/數字從文本問題

回答

相關問題