合併2正則表達式模式

我有一個包含這樣的事情（這僅適用於節選）的文本文件：合併2正則表達式模式

Third Doctor 
Season 7 
051 Spearhead from Space 4 3—24 January 1970 
052 Doctor Who and the Silurians 7 31 January—14 March 1970 
053 The Ambassadors of Death 7 21 March—2 May 1970 
054 Inferno 7 9 May—20 June 1970 

Season 8 
055 Terror of the Autons 4 2—23 January 1971 
056 The Mind of Evil 6 30 January—6 March 1971 
057 The Claws of Axos 4 13 March—3 April 1971 
058 Colony in Space 6 10 April—15 May 1971 
059 The Dæmons 5 22 May—19 June 1971

注意，基本路線模式是^###\t.*\t?\t.*$（即幾乎每行有3個標籤\t）。

我想選段標題後刪除一切，所以它看起來像這樣：

Third Doctor 
Season 7 
051 Spearhead from Space 
052 Doctor Who and the Silurians 
053 The Ambassadors of Death 
054 Inferno 

Season 8 
055 Terror of the Autons 
056 The Mind of Evil 
057 The Claws of Axos 
058 Colony in Space 
059 The Dæmons

目前我測試中的gedit以下模式：

([^\t]*)$ # replaces not only everything after the last `\t', 
      # incl that `\t', but also lines that *does not* contain any `\t'

然後我試圖以'選擇'的行，這應該是(?=(?=^(?:(?!Season).)*$)(?=^(?:(?!Series).)*$)(?=^(?:(?!Doctor$).)*$)(?=^(?:(?!Title).)*$)(?=^(?:(?!Specials$).)*$)(?=^(?:(?!Mini).)*$)(?=^(?:(?!^\t).)*$)(?=^(?:(?!Anim).)*$)).*$ - 正常工作，但我不知道如何將它與([^\t]*)$相結合。

來源

2014-11-02 tukusejssirs

哪種語言？ – vks 2014-11-02 20:34:30

@vks：我會說bash，但真的不知道gedit 3.10.4使用什麼樣的正則表達式...但是bash（sed）正則表達式已經足夠了:) – tukusejssirs 2014-11-02 21:47:30

^(\d{3}\s+.*?)(?=\s*\d).*$

嘗試this.Replace通過$1。用標誌m或MULTILINE取決於你regex.See演示的味道。

http://regex101.com/r/jI8lV7/8

來源

2014-11-02 20:41:43 vks

雖然這在演示中確實有效，但是在gedit和ubuntu gnome 14.10 gnome-terminal（默認設置）'sed's/^（\ d {3} \ s +。*？）（？= \ s * \ d）。* $ // g'file'。 ...至於'替換爲'$ 1''和'使用標誌'm'或'MULTILINE'，我需要更詳細的說明，因爲我不明白:) – tukusejssirs 2014-11-02 21:45:33

@tukusejssirs你可以使用python或perl.Python代碼會是這樣的。 'import re' 'p = re.compile（ur'^（\ d {3} \ s +。*？）（？= \ s * \ d）。* $'，re.MULTILINE | re。 IGNORECASE）' 'test_str = <你的測試字符串>' 'SUBST = U 「$ 1」' '結果=應用re.sub（p，SUBST，test_str）' – vks 2014-11-03 04:55:59

既然是場由製表符分隔，您只需要使用cut獲得兩個第一場：

cut -f1,2 drwho.txt

的知識，使用awk一樣：

awk -F"\t" '$3{print $1"\t"$2}!$3{print $0}' drwho.txt

解釋：awk一行一行地工作，F參數定義了字段分隔符。

$3 {     # if field3 exists 
    print $1"\t"$2  # display field1, a tab, field2 
} 
!$3 {     # if field3 doesn't exist 
    print $0   # display the whole record (the line) 
}

來源

2014-11-02 21:08:30

@Casimir_et_Hippolyte：我看到ü知道誰博士:) ...但提供的第一個代碼U不適用於我...既不在gedit中也不在使用'sed'。但我在[regex101.com]（http://regex101.com/r/vG0aK9/1）嘗試過，但我留下了一些額外的字符... – tukusejssirs 2014-11-02 21:58:58

@tukusejssirs：我寫了一個sed版本。 – 2014-11-02 22:17:50

@Casimir_et_Hippolyte：我不知道爲什麼，但這也行不通。僅供參考，我使用sed（GNU sed）4.2.2。 – tukusejssirs 2014-11-02 23:49:59

合併2正則表達式模式

回答

相關問題