無法用sed或vim替換Unicode字符

我有一個我認爲是unicode類型的文件，並想用sed或其他一些unix實用程序刪除它們。我嘗試了幾個選項，出於某種原因無法刪除這些字符。使用單線所示的試驗例（頭-n1）無法用sed或vim替換Unicode字符

嘗試1：

> head -n1 file1.txt | hexdump -C # Hexdump line 1 
output: 
00000000 47 72 6f 75 70 c2 a0 20 20 20 53 69 67 6e 61 6c |Group.. Signal| 
00000010 c2 a0 6e 61 6d 65 c2 a0 20 20 20 20 20 20 20 20 |..name..  | 
00000020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |    | 
00000030 55 6e 69 74 c2 a0 20 74 79 70 65 c2 a0 44 65 73 |Unit.. type..Des| 
00000040 63 72 69 70 74 69 6f 6e c2 a0 0d 0a    |cription....| 
0000004c

立即替換「C2 A0」以上

> head -n1 file1.txt | sed 's/\xc2\xa0//g' | hexdump -C 
or 
> head -n1 file1.txt | sed 's/\x{c2a0}//g | hexdump -C 
00000000 47 72 6f 75 70 c2 a0 20 20 20 53 69 67 6e 61 6c |Group.. Signal| 
00000010 c2 a0 6e 61 6d 65 c2 a0 20 20 20 20 20 20 20 20 |..name..  | 
00000020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |    | 
00000030 55 6e 69 74 c2 a0 20 74 79 70 65 c2 a0 44 65 73 |Unit.. type..Des| 
00000040 63 72 69 70 74 69 6f 6e c2 a0 0d 0a    |cription....|

否更換happend

嘗試2：使用vim

vim file1.txt 
:set nobomb 
:set fileencoding=utf-8 
:wq

再次使用sed並沒有發生替換。如何替換或刪除這些字符（十六進制「c2a0」）？

來源

2017-10-10 Shiva

我最終結束了使用Perl，它成功地刪除了Unicode字符。

> perl -v 
This is perl 5, version 18, subversion 2 (v5.18.2) built for darwin-thread-multi-2level 

> perl -pi -e 's/\x{c2}\x{a0}//g' file1.txt 
> head -n1 file1.txt | hexdump -C 
00000000 47 72 6f 75 70 20 20 20 53 69 67 6e 61 6c 6e 61 |Group Signalna| 
00000010 6d 65 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |me    | 
00000020 20 20 20 20 20 20 20 20 20 20 55 6e 69 74 20 74 |   Unit t| 
00000030 79 70 65 44 65 73 63 72 69 70 74 69 6f 6e 0d 0a |ypeDescription..| 
00000040

來源

2017-10-10 21:07:23 Shiva

無法用sed或vim替換Unicode字符

回答

相關問題