正則表示刪除與第一個字符串匹配的行？

我有很多情況的線長列表，行有相同的第一個字（空間之前的第一個字符串），但其餘的是不同的。我只需要保留唯一的第一個字符串。正則表示刪除與第一個字符串匹配的行？

john jane 
john 123 
john jim jane 
jane john 
jane 123 
jane 456 
jim 
jim 1

產生這樣的結果：

john jane 
jane john 
jim

所以，如果行第一個字是匹配的，全部刪除，但一行。

我可以刪除所有重複的線條，但像上面例子線離開，

^(.*)(\r?\n\1)+$

此正則表達式刪除相同的行，不喜歡的例子。如果有正則表達式或記事本宏來解決這個問題？

來源

2016-07-15 Jim8645

不是最好Notepad ++的解決方案：'^（（\ w + \ b）。*）\ r？\ n \ 2. *' - >'$ 1'並多次點擊*全部替換。 –

具有相同第一個「單詞」的行是否總是連續的？如果您想要相關答案，請回答anubhava問題。 –

與記事本++ （假定具有相同的第一字線是連續的）：

搜索：^(\S++).*\K(?:\R\1(?:\h.*|$))+
替換：無關

demo

圖案細節：

^    # start of the line 
(\S++)  # the first "word" (all that isn't a whitespace) captured in group 1 
.*   # all characters until the end of the line 
\K   # remove characters matched before from the match result 
(?: 
    \R  # a newline 
    \1  # reference to the capture group 1 (same first word) 
    (?: 
     \h.* # a horizontal whitespace 
     |  # OR 
     $  # the end of the line 
    ) 
)+   # repeat one or more times

來源

2016-07-15 11:15:05

確認，它正在爲我的文件工作。在ultraedit中也可以使用，因爲Notepad ++不能處理非常大的文件。 – Jim8645

@ Jim8645：請注意，如果您使用unix/linux，sp asic awk方法對大文件很有意思，因爲它不需要加載內存中的所有文件。 –

，如果你有awk

awk '!seen[$1]++' infile.txt

改編自這個線程：Unix: removing duplicate lines without sorting

來源

2016-07-15 10:29:26 Sundeep

在Perl：

s/^((\w+).*)\n(?:(?:\2.*\n)*)/$1/gm

你可以給它這是一個嘗試：

#!/bin/usr/perl 

use warnings; 
use strict; 

my $file = "john jane 
john 123 
john jim jane 
jane john 
jane 123 
jane 456 
jim 
jim 1 
"; 

$file =~ s/^((\w+).*)\n(?:(?:\2.*\n)*)/$1\n/gm; 

print $file;

來源

2016-07-15 10:34:03

正則表示刪除與第一個字符串匹配的行？

回答

相關問題