我使用Python 3.6。我想在read.txt文件中檢查一些字符串。問題在於,.txt文件的寫法使得句子可能會被剪切並放入不同的行中。例如:f.read()不在行之間讀取
bla bla bla internal control over financial reporting or an attestation
report of our auditors
.txt文件在單詞「attestation」之後剪切句子,並在下面的行中以「report」開頭。我想在文件中查找整個句子,而不管它是哪行(如果句子在文件中,則創建var1 = 1,否則爲0)。
我用下面的代碼來解析(而且似乎我不知道如何指定我不操心線):
string1 = 'internal control over financial reporting or an attestation report of our auditors'
exemptions = []
for eachfile in file_list: #I have many .txt files in my directory
with open(eachfile, 'r+', encoding='utf-8') as f:
line2 = f.read() # line2 should be a var with all the .txt file
var1 = re.findall(str1, line2, re.I) # find str1 in line2
if len(re.findall(str1, line2, re.I)) > 0:
exemptions.append('1') # if it detects smthg, then append exemptions list with var1=1
else:
exemptions.append('0') # otherwise var1= 0
如何做到這一點任何想法?我認爲通過使用line2 = f.read(),我實際上檢查整個.txt文件,不管線條如何,但它似乎並不那麼......
謝謝反正!
正如@asongtoruin所說,讀取文件時,它在'attestation'和'report'之間用'\ n'讀取句子,這意味着這需要用空格替換'''',否則這句話將有一個'\ n',它不符合你的'正則表達式' –