There are various indications that Rails 2.3.3 is not quite ready for Ruby 1.9.1 and the new encoding support built in to the String class. Here is an example bug report filed against Rails 2.3: Encoding error in Ruby1.9 for templates .
You have to read a lot to understand all that can go wrong. Here are some sources to get you started. Go. Read.
The basic issue is you have to be concerned about and aware of what encoding any library is using when creating their String objects before passing them back to the caller. Your Rails app can be UTF-8 through and through by using the magic comment in all source files, but if your ActiveRecord mysql adapter is returning you ASCII_8BIT strings because one of your tables has a row with bad data, you will start getting encoding errors as you try to combine those ASCII_8BIT strings with your app's UTF-8 strings. This manifests itself usually during template rendering as a 500 Server Error caused by a IncompatibleEncoding exception in the Rails stack.
There is talk out there of fixes to the mysql ActiveRecord adapter to make sure it doesn't give you back ASCII_8BIT strings, but until that time, you are out of luck. You either clean your database entirely, or you live with the odd 500 error in your app. Or, thanks to the dynamic nature of Ruby, you can monkey patch the errors away. I consider this a valid use of monkey patching. It can easily be backed out once the external libraries in question publish fixes.
There are two things (*) you have to fix.
The following code, if placed in config/initializers/fix_encoding.rb, will do the trick. It makes sure that when templates and partials are loaded, the file is read in UTF-8 mode. It also performs a "poor man's" scrub of any strings returned by ActiveRecord attribute getters by forcing the encoding to UTF-8 and performing a replace by the empty string of any invalid or undefined characters. It is a decent stopgap until the libraries are fixed to support encoding better.
# encoding: UTF-8 # This monkey patch forces the encoding of all templates loaded by Rails to UTF-8. # Based off Rails 2.3.3 and (may be) compatible with Rails 2.3.5 module ActionView module Renderable #:nodoc: private def compile!(render_symbol, local_assigns) locals_code = local_assigns.keys.map { |key| "#{key} = local_assigns[:#{key}];" }.join source = <<-end_src def #{render_symbol}(local_assigns) old_output_buffer = output_buffer;#{locals_code};#{compiled_source} ensure self.output_buffer = old_output_buffer end end_src source.encode!('UTF-8', :invalid => :replace, :undef => :replace, :replace => ''); source.force_encoding('UTF-8') begin ActionView::Base::CompiledTemplates.module_eval(source, filename, 0) rescue Errno::ENOENT => e raise e # Missing template file, re-raise for Base to rescue rescue Exception => e # errors from template code if logger = defined?(ActionController) && Base.logger logger.debug "ERROR: compiling #{render_symbol} RAISED #{e}" logger.debug "Function body: #{source}" logger.debug "Backtrace: #{e.backtrace.join("\n")}" end raise ActionView::TemplateError.new(self, {}, e) end end end class Template def source File.read(filename, :encoding => 'UTF-8') end end end # This monkey patch attempts to force the encoding of all non-UTF-8 strings to UTF-8 module ActiveRecord module AttributeMethods module ClassMethods private def define_read_method(symbol, attr_name, column) cast_code = column.type_cast_code('v') if column access_code = cast_code ? "(v=@attributes['#{attr_name}']) && #{cast_code}" : "@attributes['#{attr_name}']" unless attr_name.to_s == self.primary_key.to_s access_code = access_code.insert(0, "missing_attribute('#{attr_name}', caller) unless @attributes.has_key?('#{attr_name}'); ") end if cache_attribute?(attr_name) access_code = "@attributes_cache['#{attr_name}'] ||= (#{access_code})" end evaluate_attribute_method attr_name, "def #{symbol}; x = (#{access_code}); if String === x then; x.encode!('UTF-8', :invalid => :replace, :undef => :replace, :replace => ''); x.force_encoding('UTF-8'); end; x; end" end end end end
(*) There is one other thing you have to be concerned about possibly. If you do fragment caching, you may have to monkey patch the fragment loading and saving code to open the file streams using UTF-8 encoding. The above two fixes should be enough, however.