Dealing with Ruby 1.9.1 Encoding Hell in a Rails Application
Dec, 20, 2009
There are various indications that Rails 2.3.3 is not quite ready for Ruby 1.9.1 and the new encoding support built in to
the String class. Here is an example bug report filed against Rails 2.3:
Encoding error in Ruby1.9 for templates
.
You have to read a lot to understand all that can go wrong. Here are some sources to get you started. Go. Read.
The basic issue is you have to be concerned about and aware of what encoding
any library is using when creating their String objects before passing them back to the caller. Your Rails app can be UTF-8
through and through by using the magic comment in all source files, but if your ActiveRecord mysql adapter is returning you
ASCII_8BIT strings because one of your tables has a row with bad data, you will start getting encoding errors as you try
to combine those ASCII_8BIT strings with your app's UTF-8 strings. This manifests itself usually during template rendering
as a 500 Server Error caused by a IncompatibleEncoding exception in the Rails stack.
There is talk out there of fixes to the mysql ActiveRecord adapter to make sure it doesn't give you back ASCII_8BIT strings,
but until that time, you are out of luck. You either clean your database entirely, or you live with the odd 500 error
in your app. Or, thanks to the dynamic nature of Ruby, you can monkey patch the errors away. I consider this a valid use of monkey patching.
It can easily be backed out once the external libraries in question publish fixes.
There are two things (*) you have to fix.
Template loading
Database adapter
The following code, if placed in config/initializers/fix_encoding.rb, will do the trick. It makes sure that when templates
and partials are loaded, the file is read in UTF-8 mode. It also performs a "poor man's" scrub of any strings returned by
ActiveRecord attribute getters by forcing the encoding to UTF-8 and performing a replace by the empty string of
any invalid or undefined characters. It is a decent stopgap until the libraries are fixed to support encoding
better.
# encoding: UTF-8# This monkey patch forces the encoding of all templates loaded by Rails to UTF-8.# Based off Rails 2.3.3 and (may be) compatible with Rails 2.3.5moduleActionViewmoduleRenderable#:nodoc:privatedefcompile!(render_symbol,local_assigns)locals_code=local_assigns.keys.map{|key|"#{key} = local_assigns[:#{key}];"}.joinsource=<<-end_src
def #{render_symbol}(local_assigns)
old_output_buffer = output_buffer;#{locals_code};#{compiled_source}
ensure
self.output_buffer = old_output_buffer
end
end_srcsource.encode!('UTF-8',:invalid=>:replace,:undef=>:replace,:replace=>'');source.force_encoding('UTF-8')beginActionView::Base::CompiledTemplates.module_eval(source,filename,0)rescueErrno::ENOENT=>eraisee# Missing template file, re-raise for Base to rescuerescueException=>e# errors from template codeiflogger=defined?(ActionController)&&Base.loggerlogger.debug"ERROR: compiling #{render_symbol} RAISED #{e}"logger.debug"Function body: #{source}"logger.debug"Backtrace: #{e.backtrace.join("\n")}"endraiseActionView::TemplateError.new(self,{},e)endendendclassTemplatedefsourceFile.read(filename,:encoding=>'UTF-8')endendend# This monkey patch attempts to force the encoding of all non-UTF-8 strings to UTF-8moduleActiveRecordmoduleAttributeMethodsmoduleClassMethodsprivatedefdefine_read_method(symbol,attr_name,column)cast_code=column.type_cast_code('v')ifcolumnaccess_code=cast_code?"(v=@attributes['#{attr_name}']) && #{cast_code}":"@attributes['#{attr_name}'] "unlessattr_name.to_s==self.primary_key.to_saccess_code=access_code.insert(0,"missing_attribute('#{attr_name}', caller) unless @attributes.has_key?('#{attr_name}'); ")endifcache_attribute?(attr_name)access_code="@attributes_cache['#{attr_name}'] ||= (#{access_code})"endevaluate_attribute_methodattr_name,"def #{symbol}; x = (#{access_code}); if String === x then;
x.encode!('UTF-8', :invalid => :replace, :undef => :replace, :replace => '');
x.force_encoding('UTF-8'); end; x; end"endendendend
(*) There is one other thing you have to be concerned about possibly. If you do fragment caching, you may have to monkey patch
the fragment loading and saving code to open the file streams using UTF-8 encoding. The above two fixes should be enough,
however.