Linux文本文件操作

我有格式的文件：Linux文本文件操作

<a href="http://www.wowhead.com/?search=Superior Mana Oil"> 
<a href="http://www.wowhead.com/?search=Tabard of Brute Force"> 
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord"> 
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">

我需要選擇=之後，但在「，並在該行的末尾打印此之前的文本，將因此成爲例如：

<a href="http://www.wowhead.com/?search=Superior Mana Oil">Superior Mana Oil</a> 
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">Tabard of Brute Force</a> 
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">Tabard of the Wyrmrest Accord</a> 
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">Tattered Hexcloth Sack</a>

我不知道的通過Linux命令行來做到這一點（我猜大概SED/AWK但不與他們良好）的最佳方式，在理想情況下就像一個劇本，我可以只給文件名例如./fixlink.sh brokenlinks.txt

來源

2010-01-20 Darryl at NetHosted

嘗試編寫腳本並運行它。何時/如果您遇到錯誤，請將其發佈到此處並提供幫助。「請爲我寫腳本」類型的問題在這裏不是很受鼓勵。 – 2010-01-20 11:41:24

假設你可以有各地的=標誌之一或AFER <a更多的空間，以及零個或更多的空間，下面應該工作：

$ cat in.txt 
<a href="http://www.wowhead.com/?search=Superior Mana Oil"> 
<a href="http://www.wowhead.com/?search=Tabard of Brute Force"> 
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord"> 
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack"> 
# 
# The command to do the substitution 
# 
$ sed -e 's#<a[ \t][ \t]*href[ \t]*=[ \t]*".*search[ \t]*=[ \t]*\([^"]*\)">#&\1</a>#' in.txt 
<a href="http://www.wowhead.com/?search=Superior Mana Oil">Superior Mana Oil</a> 
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">Tabard of Brute Force</a> 
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">Tabard of the Wyrmrest Accord</a> 
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">Tattered Hexcloth Sack</a>

如果你確定你沒有多餘的空格中，圖案被簡化爲：

s#<a href=".*search=\([^"]*\)">#&\1</a>#

在sed，s後跟任意字符（#在這種情況下）開始替換。被替換的模式是直到第二次出現相同的字符。因此，在我們的第二個示例中，要替換的模式是：<a href=".*search=\([^"]*\)">。我使用\([^"]*\)來表示任何非"字符的序列，並將其保存在反向引用\1（\(\)對錶示反向引用）。最後，由#分隔的下一個令牌是替換。 &在sed代表「無論匹配」，在這種情況下是整條線，而\1只是與鏈接文本相匹配。

這裏再次模式：

's#<a[ \t][ \t]*href[ \t]*=[ \t]*".*search[ \t]*=[ \t]*\([^"]*\)">#&\1</a>#'

及其說明：

如果你真的確保總是會有search=其次是你想要的，你可以在文本做：

$ sed -e 's#.*search=\(.*\)">#&\1</a>#'

希望h ELPS。

來源

2010-01-20 11:49:44

因爲英勇的努力沒有倒下，但是當一行代碼需要14行解釋時，下一個人可能會很聰明地維護它。 – 2010-01-20 12:05:06

LOL @Adam：我假定OP不知道正則表達式。再加上做出「穩健」的模式，導致了長時間的解釋。哦，我試過了。希望他能學到一些東西*（如果他沒有厭倦我的帖子的三分之一，那就是！）。 :-) – 2010-01-20 12:08:52

當我試圖在這個詳細程度上解釋一些技術時，我通常會發現我自己學習了一些東西 - 所以這絕不是浪費精力。 – 2010-01-20 12:14:14

awk 'BEGIN{ FS="=" } 
{ 
    o=$NF 
    gsub(/\042>/,"",o) 
    print $0, o"</a>" 

}' file

輸出

$ ./shell.sh 
<a href="http://www.wowhead.com/?search=Superior Mana Oil"> Superior Mana Oil</a> 
<a href="http://www.wowhead.com/?search=Tabard of Brute Force"> Tabard of Brute Force</a> 
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord"> Tabard of the Wyrmrest Accord</a> 
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack"> Tattered Hexcloth Sack</a>

，如果你不擅長的東西，閱讀了文檔。這始終是解決方案的開始。要了解awk/gawk，請轉至doc。

來源

2010-01-20 11:40:27 ghostdog74

然後讓我們在sed中做。

replace.sh

#!/bin/bash 
#<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack"> 
# => 
#<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">Tattered Hexcloth Sack</a> 
sed -r -e 's|(<a href=".*search=(.*))">|\1">\2</a>|' $1

./replace.sh輸入。TXT

來源

2010-01-20 11:50:28

用sed：

sed 's/\(.*search=\)\(.*\)\(".*\)/\1\2\3\2<\/a>/' brokenlinks.txt

來源

2010-01-20 11:51:17 codeape

尼斯AWK！但

sed -n 's|=\([^"].*\)">|&\1</a>|p'

更短，會自動刪除不匹配的行。

來源

2010-01-20 12:15:22 martinwguy

+1使用'＆'。 – 2010-01-20 12:17:28

Linux文本文件操作

回答

相關問題