2016-11-29 75 views
0

我知道Ruby從網上拉東西時遇到了一個錯誤的包裝,並且出現了很多編碼錯誤等問題。我怎樣才能強制下面的數組編碼爲真正的形式?ruby​​中的IO編碼錯誤

["0x4E", "0x3C", "0x89", "0x50", "0xC3", "0x47", "0xFF", "0x70", "xFF", "0x2F", "0xA2", "0xB3", "0x98"] 

首先,我試圖編碼成UTF-8:

irb(main):012:0> data = ["0x4E", "0x3C", "0x89", "0x50", "0xC3", "0x47", "0xFF", "0x70", "xFF", "0x2F", "0xA2", "0xB3", "0x98"] 
irb(main):013:0> data.each do |char| 
irb(main):014:1* puts char.encode!("UTF-8", invalid: :replace, undef: :replace) 
irb(main):015:1> end 
0x4E 
0x3C 
0x89 
0x50 
0xC3 
0x47 
0xFF 
0x70 
xFF 
0x2F 
0xA2 
0xB3 
0x98 
=> ["0x4E", "0x3C", "0x89", "0x50", "0xC3", "0x47", "0xFF", "0x70", "xFF", "0x2F", "0xA2", "0xB3", "0x98"] 

如此看來,字符已經UTF-8,所以接下來我想ISO-8859-1:

irb(main):086:0> data.each { |char| 
irb(main):087:1* puts char.encode!("iso-8859-1", invalid: :replace, undef: :replace) 
irb(main):088:1> } 
x4E 
x3C 
x89 
x50 
xC3 
x47 
xFF 
x70 
xFF 
x2F 
xA2 
xB3 
x98 
=> ["x4E", "x3C", "x89", "x50", "xC3", "x47", "xFF", "x70", "xFF", "x2F", "xA2", "xB3", "x98"] 

這也沒有奏效,它似乎已經下降了0的。

所以我出去的肢體,並與URI.decode試了一下:

irb(main):093:0> require 'uri' 
=> true 
irb(main):094:0> data.each { |char| 
irb(main):095:1* puts URI.decode(char) 
irb(main):096:1> } 
x4E 
x3C 
x89 
x50 
xC3 
x47 
xFF 
x70 
xFF 
x2F 
xA2 
xB3 
x98 
=> ["x4E", "x3C", "x89", "x50", "xC3", "x47", "xFF", "x70", "xFF", "x2F", "xA2", "xB3", "x98"] 

而且你不知道嗎?它沒有工作。

有沒有辦法讓人物回到原來的形式?如果有幫助,這是來自一個URL,我沒有完整的URL了。

+0

你'data'是串WHE的陣列第一個是'0x4E'(一個零,一個小的x,一個4和一個E)。沒有特殊字符,是否有可能要檢查十六進制值?也許你在這裏得到一些幫助http://stackoverflow.com/questions/30563697/convert-a-hex-string-to-a-hex-int改善你的問題。 – knut

+0

你正在尋找'puts char.chr.encode('iso-8859-1',無效::替換,undef ::替換)' – knut

+0

@knut爲什麼網站URL會拉色十六進制? – papasmurf

回答

1

你的陣列

["0x4E", "0x3C", "0x89", "0x50", "0xC3", "0x47", "0xFF", "0x70", "xFF", "0x2F", "0xA2", "0xB3", "0x98"] 

是一個字符串數組,每串具有四個字符。第一個字符串是「0x4E」(零,小X,A 4和E)

可能要檢查的十六進制值像數組:

data = [0x4E, 0x3C, 0x89, 0x50, 0xC3, 0x47, 0xFF, 0x70, 0xFF, 0x2F, 0xA2, 0xB3, 0x98] 

要獲得字符值你可以使用Integer#chr

p data.map{|c|c.chr} #-> ["N", "<", "\x89", "P", "\xC3", "G", "\xFF", "p", "\xFF", "/", "\xA2", "\xB3", "\x98"] 

此字符可以是 「編碼」:

p data.map { |char| 
    char.chr.encode('utf-8', invalid: :replace, undef: :replace) 
} #["N", "<", "\uFFFD", "P", "\uFFFD", "G", "\uFFFD", "p", "\uFFFD", "/", "\uFFFD", "\uFFFD", "\uFFFD"] 


p data.map { |char| 
    char.chr.encode('iso-8859-1', invalid: :replace, undef: :replace) 
} #["N", "<", "?", "P", "?", "G", "?", "p", "?", "/", "?", "?", "?"]