如何從文件中刪除所有的變音符號？

我有一個文件包含許多元音與變音符號。我需要做這些替換：如何從文件中刪除所有的變音符號？

用a替換ā，á，ǎ和à。
用e替換ē，é，ě和è。
用i代替ī，í，ì和ì。
用o代替ō，ó，ǒ，和ò。
用u代替ū，ú，ǔ，ù。
用ü替換ǖ，ǘ，ǚ和ǜ。
更換A，A，ǎ，以及與A
更換E，E，E和E E.
替換我，我，Ǐ，和我一
更換O ，Ó，Ǒ和Ò用O.
用ü替換Ū，Ú，Ǔ和Ù。
用Ü替換Ǖ，Ǘ，Ǚ和Ǜ。

我知道我可以用這個代替他們一次一個：

sed -i 's/ā/a/g' ./file.txt

是否有更換所有的這些更有效的方法？

來源

2012-04-18 Village

sed可能不是這份工作的最佳工具; iconv可能更好。請參閱：http://stackoverflow.com/questions/8562354/remove-unicode-characters-from-textfiles-sed-other-bash-shell-methods – geoffspear 2012-04-18 10:26:28

如果檢查工具iconv的手冊頁：

// TRANSLIT
當字符串「// TRANSLIT」附加到--to碼，音譯被激活。這意味着當一個字符不能在目標字符集中表示時，它可以通過一個或幾個相似的字符來近似。

，所以我們可以這樣做：

kent$ cat test1 
    Replace ā, á, ǎ, and à with a. 
    Replace ē, é, ě, and è with e. 
    Replace ī, í, ǐ, and ì with i. 
    Replace ō, ó, ǒ, and ò with o. 
    Replace ū, ú, ǔ, and ù with u. 
    Replace ǖ, ǘ, ǚ, and ǜ with ü. 
    Replace Ā, Á, Ǎ, and À with A. 
    Replace Ē, É, Ě, and È with E. 
    Replace Ī, Í, Ǐ, and Ì with I. 
    Replace Ō, Ó, Ǒ, and Ò with O. 
    Replace Ū, Ú, Ǔ, and Ù with U. 
    Replace Ǖ, Ǘ, Ǚ, and Ǜ with Ü. 


kent$ iconv -f utf8 -t ascii//TRANSLIT test1 
    Replace a, a, a, and a with a. 
    Replace e, e, e, and e with e. 
    Replace i, i, i, and i with i. 
    Replace o, o, o, and o with o. 
    Replace u, u, u, and u with u. 
    Replace u, u, u, and u with u. 
    Replace A, A, A, and A with A. 
    Replace E, E, E, and E with E. 
    Replace I, I, I, and I with I. 
    Replace O, O, O, and O with O. 
    Replace U, U, U, and U with U. 
    Replace U, U, U, and U with U.

來源

2012-04-18 10:35:36 Kent

這效果很好，除了我只希望標記從ü消失，但不是變音符號。 – Village 2012-04-18 11:07:05

肯特，我想爲「iconv」的「man」頁面添加一個直接鏈接 - 但是我沒有發現那些包含那個特別引用的鏈接。你想添加你從哪裏得到它？來自'man iconv'的 – usr2564301 2015-05-22 10:04:27

。在回答中，我還提到了iconv的man page。我目前的版本是'iconv（GNU libc）2.21'但是答案在3年前發佈，我不知道我當時使用了哪個版本。 @Jongware – Kent 2015-05-22 10:36:44

對於此tr（1）命令是用於。例如：

tr 'āáǎàēéěèīíǐì...' 'aaaaeeeeiii...' <infile >outfile

您可能需要檢查/更改您的LANG環境變量以匹配正在使用的字符集。

來源

2012-04-18 10:27:57 ktf

您可以使用這樣的事情：

sed -e 's/[àâ]/a/g;s/[ọõ]/o/g;s/[í,ì]/i/g;s/[ê,ệ]/e/g'

只需添加更多的字符[..]您的需要。

來源

2012-04-18 10:36:26 hungnv

這可能會爲你工作：

sed -i 'y/āáǎàēéěèīíǐìōóǒòūúǔùǖǘǚǜĀÁǍÀĒÉĚÈĪÍǏÌŌÓǑÒŪÚǓÙǕǗǙǛ/aaaaeeeeiiiioooouuuuüüüüAAAAEEEEIIIIOOOOUUUUÜÜÜÜ/' file

來源

2012-04-18 13:30:43 potong

這是唯一一個能夠運行_out-of-the-box_ – ATorras 2016-04-20 13:49:10

有趣的是，如果你是在Mac上，您必須將-e標誌添加到命令行。更多信息：http://stackoverflow.com/questions/16745988/sed-command-works-fine-on-ubuntu-but-not-mac – MrWashinton 2016-09-15 14:31:01

我喜歡iconv，因爲它可以處理所有的口音變化：

cat non-ascii.txt | iconv -f utf8 -t ascii//TRANSLIT//IGNORE > ascii.txt

來源

2013-09-02 15:56:35

這可能無法正常工作。僅僅因爲你的語言環境必須設置！

使用區域設置LC_ALL，例如：

export LC_ALL=en_US.iso88591

注意語言環境的完整列表可通過：

locale -a

來源

2013-12-02 16:23:00 Bruno

如果你和我一樣，只是需要更換口音在你的文件的一些特殊的地方，你可以使用這種正則表達式來實現

echo '{"doNotReplaceKey":"bábögêjírù","replaceValueKey":"bábögêjírù","anotherNotReplaceKey":"bábögêjírù"}' \ 
    | sed -e ':a;s/replaceValueKey":"\([a-zA-Z0-9 -_]*\)[áâàãä]/replaceValueKey":"\1a/g;ta' \ 
    | sed -e ':a;s/replaceValueKey":"\([a-zA-Z0-9 -_]*\)[éêèë]/replaceValueKey":"\1e/g;ta' \ 
    | sed -e ':a;s/replaceValueKey":"\([a-zA-Z0-9 -_]*\)[íîìï]/replaceValueKey":"\1i/g;ta' \ 
    | sed -e ':a;s/replaceValueKey":"\([a-zA-Z0-9 -_]*\)[óôòõö]/replaceValueKey":"\1o/g;ta' \ 
    | sed -e ':a;s/replaceValueKey":"\([a-zA-Z0-9 -_]*\)[úûùü]/replaceValueKey":"\1u/g;ta'

Ou tput的

{"doNotReplaceKey":"bábögêjírù","replaceValueKey":"babogejiru","anotherNotReplaceKey":"bábögêjírù"}

來源

2016-06-29 16:05:14

可以使用man iso_8859_1（或字符集）或od -bc識別區分符號的八進制表示。然後用gawk進行更換。

{ gsub(/\344/,"a"; print $0 }

這將用a代替ä。

來源

2016-07-09 21:57:44

如何從文件中刪除所有的變音符號？

回答

相關問題