如果檢查工具iconv
的手冊頁:
// TRANSLIT
當字符串 「// TRANSLIT」 附加到--to碼,音譯被激活。這意味着當一個字符不能在 目標字符集中表示時,它可以通過一個或幾個相似的字符來近似。
,所以我們可以這樣做:
kent$ cat test1
Replace ā, á, ǎ, and à with a.
Replace ē, é, ě, and è with e.
Replace ī, í, ǐ, and ì with i.
Replace ō, ó, ǒ, and ò with o.
Replace ū, ú, ǔ, and ù with u.
Replace ǖ, ǘ, ǚ, and ǜ with ü.
Replace Ā, Á, Ǎ, and À with A.
Replace Ē, É, Ě, and È with E.
Replace Ī, Í, Ǐ, and Ì with I.
Replace Ō, Ó, Ǒ, and Ò with O.
Replace Ū, Ú, Ǔ, and Ù with U.
Replace Ǖ, Ǘ, Ǚ, and Ǜ with Ü.
kent$ iconv -f utf8 -t ascii//TRANSLIT test1
Replace a, a, a, and a with a.
Replace e, e, e, and e with e.
Replace i, i, i, and i with i.
Replace o, o, o, and o with o.
Replace u, u, u, and u with u.
Replace u, u, u, and u with u.
Replace A, A, A, and A with A.
Replace E, E, E, and E with E.
Replace I, I, I, and I with I.
Replace O, O, O, and O with O.
Replace U, U, U, and U with U.
Replace U, U, U, and U with U.
sed可能不是這份工作的最佳工具; iconv可能更好。請參閱:http://stackoverflow.com/questions/8562354/remove-unicode-characters-from-textfiles-sed-other-bash-shell-methods – geoffspear 2012-04-18 10:26:28