Q

如何將正則表達式應用於文件的內容？

python
regex

2011-02-07 68 views 2 likes

2

我想申請正則表達式的文件的內容，而將整個文件加載到一個字符串。 RegexObject將第一個參數作爲字符串或緩衝區。有沒有辦法將文件轉換成緩衝區？如何將正則表達式應用於文件的內容？

2011-02-07 Candy Chiu

+0

你試圖將正則表達式應用到整個文件 - 我。e試圖將整個文件與您的正則表達式匹配 - 或者您是否試圖逐行匹配文件或以其他一些大小的塊進行匹配？ – 2011-02-07 19:29:05

A

回答

2

報價：

緩衝區對象不是直接通過 Python語法支持，但可以通過調用內置功能緩衝（）創建的。

和其他一些有趣的部分：

緩衝液（對象[，偏移，大小]]）

對象參數必須是支持緩衝器呼叫接口的對象（如字符串，數組和緩衝區）

名File對象沒有實現緩衝界面 - 讓你不得不改變其內容要麼轉換爲字符串（f.read()）或成陣列（使用mmap爲該）。

2011-02-07 19:27:52

4

是的！嘗試mmap：

可以使用re模塊通過一個內存映射文件

2011-02-07 19:23:31

+1

哇，想象回溯會做那種情況。 – sln 2011-02-07 19:49:55

1

搜索讀入行的文件在一個時間並應用REG EXP到該行。似乎被堆疊起來處理字符串。 http://docs.python.org/library/re.html包含更多的細節，但我無法找到有關緩衝區的任何內容。從Python的文檔

2011-02-07 19:25:55 Bassdread

+0

唯一的問題是如果正則表達式匹配跨行（`/ foo \ nbar /`）... – ircmaxell 2011-02-07 20:00:26

0

進行緩衝自己。如果正則表達式匹配塊的一部分，則從該塊中刪除該部分，繼續使用未使用的部分，讀取下一個塊，重複。

如果正則表達式被設計爲一個特定的理論最大的，對什麼都不匹配，緩衝是在執法機關一樣大的情況下，清除緩衝區，在接下來的塊讀取。一般來說，正則表達式不是用來處理非常大的數據塊的。正則表達式越複雜，它所做的回溯越多。

2011-02-07 19:56:41 sln

0

下面的代碼演示：

打開文件
文件
在求只讀取文件
使用正則表達式匹配的模式

的一部分
假設：所有的句子是個Ë相同長度

# import random for randomly choosing in a list 
import random 
# import re for regular expression matching 
import re 

#open a new file for read/writing 
file = open("TEST", "r+") 

# some strings to put in the sentence 
typesOfSentences = ["test", "flop", "bork", "flat", "pork"] 
# number of types of sentences 
numTypes = len(typesOfSentences) 

# for i values 0 to 99 
for i in range(100): 
    # Create a random sentence for example 
    # "This is a test sentence 01" 
    sentence = "This is a %s sentence %02d\n" % (random.choice(typesOfSentences), i) 
    # write the sentence to the file 
    file.write(sentence) 

# Go back to beginning of file 
file.seek(0) 

# print out the whole file 
for line in file: 
    print line 

# Determine the length of the sentence 
length = len(sentence) 

# go to 20th sentence from the beginning 
file.seek(length * 20) 

# create a regex matching the type and the number at the end 
pathPattern = re.compile("This is a (.*?) sentence (\d\d)") 

# print the next ten types and numbers 
for i in range(10): 
    # read the next line 
    line = file.readline() 
    # match the regex 
    match = pathPattern.match(line) 
    # if there was a match 
    if match: 
     # NOTE: match.group(0) is always the entire sentence 
     # Print type of sentence it was and it's number 
     print "Sentence %02d is of type %s" % (int(match.group(2)), match.group(1))

2011-02-07 20:08:15 manifest

相關問題