2011-03-06 76 views
3

假設我在文件中有一行「這也許是添加新功能最簡單的地方。」我想把兩個詞貼近對方。我做grep兩個詞彼此靠近

grep -ERHn "\beasiest\W+(?:\w+\W+){1,6}?place\b" * 

工作,並給了我線。但是,當我做

grep -ERHn "\beasiest\W+(?:\w+\W+){1,10}?new\b" * 

失敗,擊敗{1,10}的整點? 這是在regular-expression.info網站上列出的,也是一些Regex書籍。雖然他們沒有用grep來描述它,但那應該不重要。

更新

我把正則表達式轉換爲Python腳本。作品,但沒有漂亮的grep -C事情...

#!/usr/bin/python 
import re 
import sys 
import os 

word1 = sys.argv[1] 
word2 = sys.argv[2] 
dist = sys.argv[3] 
regex_string = (r'\b(?:' 
    + word1 
    + r'\W+(?:\w+\W+){0,' 
    + dist 
    + '}?' 
    + word2 
    + r'|' 
    + word2 
    + r'\W+(?:\w+\W+){0,' 
      + dist 
    + '}?' 
    + word1 
    + r')\b') 

regex = re.compile(regex_string) 


def findmatches(PATH): 
for root, dirs, files in os.walk(PATH): 
    for filename in files: 
     fullpath = os.path.join(root,filename) 

     with open(fullpath, 'r') as f: 
      matches = re.findall(regex, f.read()) 
      for m in matches: 
       print "File:",fullpath,"\n\t",m 

if __name__ == "__main__": 
    findmatches(sys.argv[4])  

調用它作爲

python near.py charlie winning 6 path/to/charlie/sheen 

爲我工作。

回答

1

你真的需要展望結構嗎? 也許這就夠了:

grep -ERHn "\beasiest\W+(\w+\W+){1,10}new\b" * 

這裏是我所得到的:

echo "This is perhaps the easiest place to add new functionality." | grep -EHn "\beasiest\W+(\w+\W+){1,10}new\b" 

(標準輸入):1:這也許是增加新的功能 最容易的地方。

編輯

正如卡米爾Goudeseune說:

grepNear() { 
grep -EHn "\b$1\W+(\w+\W+){1,10}$2\b" 
}. 

然後,在一個bash提示:

爲了使其易於使用,這可以在一個.bashrc中加入: echo "..." | grepNear easiest new

+1

爲了方便,把這個在你的'.bashrc':'grepNear(){grep的-EHn 「\ b $ 1 \ W +(\ w + \ W +){1,10} $ 2 \ b」; }'。然後在bash提示符下:'echo「...」| grepNear最簡單的新「。 – 2017-01-05 22:09:28

0

grep不是su導出Python正則表達式的非捕獲組。當您編寫諸如(?:\w+\W+)之類的內容時,您要求grep匹配問號?,後跟冒號:後跟一個或多個單詞字符\w+,後跟一個或多個非單詞字符\W+?對於grep正則表達式來說是一個特殊字符,但是由於它跟在組的開頭,它會自動轉義(與正則表達式[?]與問號相匹配)。

讓我們來測試它嗎?我有以下文件:

$ cat file 
This is perhaps the easiest place to add new functionality. 

grep不與你使用的表達式匹配它:

$ grep -ERHn "\beasiest\W+(?:\w+\W+){1,10}?new\b" file 

然後,我創建了以下文件:

$ cat file2 
This is perhaps the easiest ?:place ?:to ?:add new functionality. 

注意,每個字之前是?:

$ grep -ERHn "\beasiest\W+(?:\w+\W+){1,10}?new\b" file2 
file2:1:This is perhaps the easiest ?:place ?:to ?:add new functionality. 

的解決方法是刪除表達式的?::在這種情況下,你的表達與文件匹配的

$ grep -ERHn "\beasiest\W+(\w+\W+){1,10}?new\b" file 
file:1:This is perhaps the easiest place to add new functionality. 

既然你甚至不需要非捕獲組(至少就我所見),它不會有任何問題。

獎勵點:您可以簡化表達改變{1,10}{0,10}並刪除以下?

$ grep -ERHn "\beasiest\W+(\w+\W+){0,10}new\b" file 
file:1:This is perhaps the easiest place to add new functionality.