我知道有關於這個錯誤的多個類似問題,並且我已經嘗試了很多,但都沒有運氣。我遇到的問題涉及到字節\xA1
並拋出字符串#編碼沒有修復「UTF-8中的無效字節序列」錯誤
ArgumentError: invalid byte sequence in UTF-8
我嘗試沒有成功如下:
"\xA1".encode('UTF-8', :undef => :replace, :invalid => :replace,
:replace => "").sub('', '')
"\xA1".encode('UTF-8', :undef => :replace, :invalid => :replace,
:replace => "").force_encoding('UTF-8').sub('', '')
"\xA1".encode('UTF-8', :undef => :replace, :invalid => :replace,
:replace => "").encode('UTF-8').sub('', '')
每一行引發錯誤我。我究竟做錯了什麼?
UPDATE:
上述線失敗僅在IRB。但是,我修改了我的應用程序以使用相同的String#編碼方法和參數對CVS文件的行進行編碼,並且從文件(請注意:如果您在相同的字符串W/O使用IO)。
bad_line = "col1\tcol2\tbad\xa1"
bad_line.sub('', '') # does NOT fail
puts bad_line # => col1 col2 bad?
tmp = Tempfile.new 'foo' # write the line to a file to emulate real problem
tmp.puts bad_line
tmp.close
tmp2 = Tempfile.new 'bar'
begin
IO.foreach tmp.path do |line|
line.encode!('UTF-8', :undef => :replace, :invalid => :replace, :replace => "")
line.sub('', '') # fail: invalid byte sequence in UTF-8
tmp2.puts line
end
tmp2.close
# this would fail if the above error didn't halt execution
CSV.foreach(tmp2.path) do |row|
puts row.inspect # fail: invalid byte sequence in UTF-8
end
ensure
tmp.unlink
tmp2.close
tmp2.unlink
end
這些行都不會在MRI 1.9.3p125的機器上發生錯誤。 – 2012-07-07 13:32:35
我使用MRI 1.9.3p194在IRB中得到這些錯誤。 – joshm1 2012-07-07 13:52:17