Perl：從數組中搜索關鍵字的文本文件

如何在正則表達式中使用數組中的關鍵字來搜索文件。Perl：從數組中搜索關鍵字的文本文件

我想看看一個文本文件，看看是否和關鍵字出現在哪裏。有兩個文件keywords.txt

keyword.txt 
word1 
word2 
word3 

filestosearchon.txt 
a lot of words that go on and one and contain linebreaks and linebreaks (up to 100000 characters)

我想找到關鍵字和匹配的位置。這適用於一個單詞，但我無法弄清楚如何迭代正則表達式上的關鍵字。

#!/usr/bin/perl 

# open profanity list 
open(FILE, "keywords.txt") or die("Unable to open file"); 
@keywords = <FILE>; 
close(FILE); 

# open text file 
local $/=undef; 
open(txt, "filetosearchon.txt") or die("Unable to open file"); 
$txt = <txt>; 

$regex = "keyword"; 


push @section,[length($`),length($&),$1]  
while ($txt =~ m/$regex/g); 

foreach $element(@section) 
{ 
print (join(", ",@$element), $regex, "\n");  
}

我該如何迭代循環中的關鍵字來獲取匹配的關鍵字和位置？

欣賞anyhelp。要做到這一點感謝

來源

2012-04-22 kleqkleq

如果你只需要匹配關鍵字全字.txt反對filestosearch.txt中的整個單詞，您可能不需要正則表達式。我只是創建一個關鍵字作爲鍵和1作爲值的散列。然後嘗試查找散列中filestosearchon.txt中的每個單詞。如果查找成功，則會有匹配。 – 2012-04-22 19:04:20

@BrianSwift：可能不是最有效的解決方案，因爲它需要對每個關鍵字的字符串進行一次傳遞。有限自動機方法（即正則表達式）只需要一次通過。 – 2012-04-22 19:34:05

@ Li-aung Yip：我的方法只需要一次通過輸入字符串/文件將其解析爲單詞，並嘗試查找使用關鍵字作爲關鍵字的散列中的每個單詞。你的方法的好處是關鍵字可以是正則表達式，而不僅僅是固定的字符串。但是，使用正則表達式可能需要語法才能匹配整個單詞，以便「性別」與「misexplain」不匹配。 – 2012-04-22 20:32:03

一種方法是隻構建包含每一個字正則表達式：

(alpha|bravo|charlie|delta|echo|foxtrot|...|zulu)

Perl的正則表達式編譯器是相當聰明，會smoosh下來一樣，因爲它可以，這樣的正則表達式會比你想象的更有效率。 See this answer by Tom Christiansen。例如，下面的正則表達式：

(cat|rat|sat|mat)

將編譯到：

(c|r|s|m)at

這是高效的運行。這種方法可能會擊敗「依次搜索每個關鍵字」的方法，因爲它只需要對輸入字符串進行一次遍歷;天真的方法需要一個關鍵字你想要搜索。

順便說一下;如果你正在構建一個褻瀆濾波器，因爲你的示例代碼表明，記得要佔故意錯誤拼寫：「PRON」，「p0rn」等Then there's the fun you can have with Unicode!

來源

2012-04-22 19:32:50

嘗試grep：

@words = split(/\s+/, $txt); 

for ($i = 0; $i < scalar(@words); ++$i) { 
    print "word \#$i\n" if grep(/$words[$i]/, @keywords); 
}

會給你在發現關鍵字的文本字符串中的單詞位置。這可能會或可能不會比基於角色的職位更有幫助。

來源

2012-04-23 11:59:27 mpe

我不確定輸出的結果是什麼，但是這樣的結果可能會有用。我將關鍵字保存在一個散列中，讀取下一個文件，將每一行分成單詞並在散列中搜索每一行。

內容script.pl：

use warnings; 
use strict; 

die qq[Usage: perl $0 <keyword-file> <search-file>\n] unless @ARGV == 2; 

open my $fh, q[<], shift or die $!; 

my %keyword = map { chomp; $_ => 1 } <$fh>; 

while (<>) { 
     chomp; 
     my @words = split; 
     for (my $i = 0; $i <= $#words; $i++) { 
       if ($keyword{ $words[ $i ] }) { 
         printf qq[Line: %4d\tWord position: %4d\tKeyword: %s\n], 
           $., $i, $words[ $i ]; 
       } 
     } 
}

運行它想：

perl script.pl keyword.txt filetosearchon.txt

和輸出應該類似於此：

Line: 7  Word position: 7  Keyword: will 
Line: 8  Word position: 8  Keyword: the 
Line: 8  Word position: 10  Keyword: will 
Line: 10  Word position: 4  Keyword: the 
Line: 14  Word position: 1  Keyword: compile 
Line: 18  Word position: 9  Keyword: the 
Line: 20  Word position: 2  Keyword: the 
Line: 20  Word position: 5  Keyword: the 
Line: 22  Word position: 1  Keyword: the 
Line: 22  Word position: 25  Keyword: the

來源

2012-04-24 13:33:36 Birei

Perl：從數組中搜索關鍵字的文本文件

回答

相關問題