Dealing with Ruby 1.9.1 Encoding Hell in a Rails Application

Dec, 20, 2009

There are various indications that Rails 2.3.3 is not quite ready for Ruby 1.9.1 and the new encoding support built in to the String class. Here is an example bug report filed against Rails 2.3: Encoding error in Ruby1.9 for templates .

You have to read a lot to understand all that can go wrong. Here are some sources to get you started. Go. Read.

The basic issue is you have to be concerned about and aware of what encoding any library is using when creating their String objects before passing them back to the caller. Your Rails app can be UTF-8 through and through by using the magic comment in all source files, but if your ActiveRecord mysql adapter is returning you ASCII_8BIT strings because one of your tables has a row with bad data, you will start getting encoding errors as you try to combine those ASCII_8BIT strings with your app's UTF-8 strings. This manifests itself usually during template rendering as a 500 Server Error caused by a IncompatibleEncoding exception in the Rails stack.

There is talk out there of fixes to the mysql ActiveRecord adapter to make sure it doesn't give you back ASCII_8BIT strings, but until that time, you are out of luck. You either clean your database entirely, or you live with the odd 500 error in your app. Or, thanks to the dynamic nature of Ruby, you can monkey patch the errors away. I consider this a valid use of monkey patching. It can easily be backed out once the external libraries in question publish fixes.

There are two things (*) you have to fix.

Template loading
Database adapter

The following code, if placed in config/initializers/fix_encoding.rb, will do the trick. It makes sure that when templates and partials are loaded, the file is read in UTF-8 mode. It also performs a "poor man's" scrub of any strings returned by ActiveRecord attribute getters by forcing the encoding to UTF-8 and performing a replace by the empty string of any invalid or undefined characters. It is a decent stopgap until the libraries are fixed to support encoding better.


# encoding: UTF-8
# This monkey patch forces the encoding of all templates loaded by Rails to UTF-8.
# Based off Rails 2.3.3 and (may be) compatible with Rails 2.3.5
module ActionView
  module Renderable #:nodoc:
    private
      def compile!(render_symbol, local_assigns)
        locals_code = local_assigns.keys.map { |key| "#{key} = local_assigns[:#{key}];" }.join
     
        source = <<-end_src
          def #{render_symbol}(local_assigns)
            old_output_buffer = output_buffer;#{locals_code};#{compiled_source}
          ensure
            self.output_buffer = old_output_buffer
          end
        end_src
        source.encode!('UTF-8', :invalid => :replace, :undef => :replace, :replace => '');
        source.force_encoding('UTF-8')
     
        begin
          ActionView::Base::CompiledTemplates.module_eval(source, filename, 0)
        rescue Errno::ENOENT => e
          raise e # Missing template file, re-raise for Base to rescue
        rescue Exception => e # errors from template code
          if logger = defined?(ActionController) && Base.logger
            logger.debug "ERROR: compiling #{render_symbol} RAISED #{e}"
            logger.debug "Function body: #{source}"
            logger.debug "Backtrace: #{e.backtrace.join("\n")}"
          end
     
          raise ActionView::TemplateError.new(self, {}, e)
        end
      end
  end
     
  class Template
    def source
      File.read(filename, :encoding => 'UTF-8')
    end
  end
     
end
     
# This monkey patch attempts to force the encoding of all non-UTF-8 strings to UTF-8
module ActiveRecord
  module AttributeMethods
    module ClassMethods
      private
        def define_read_method(symbol, attr_name, column)
          cast_code = column.type_cast_code('v') if column
          access_code = cast_code ? "(v=@attributes['#{attr_name}']) && #{cast_code}" : "@attributes['#{attr_name}']     "
     
          unless attr_name.to_s == self.primary_key.to_s
            access_code = access_code.insert(0,
              "missing_attribute('#{attr_name}', caller) unless @attributes.has_key?('#{attr_name}'); ")
          end
          
          if cache_attribute?(attr_name)
            access_code = "@attributes_cache['#{attr_name}'] ||= (#{access_code})"
          end
          evaluate_attribute_method attr_name, "def #{symbol}; x = (#{access_code}); if String === x then;
             x.encode!('UTF-8', :invalid => :replace, :undef => :replace, :replace => '');
             x.force_encoding('UTF-8'); end; x; end"
        end
    end
  end
end

(*) There is one other thing you have to be concerned about possibly. If you do fragment caching, you may have to monkey patch the fragment loading and saving code to open the file streams using UTF-8 encoding. The above two fixes should be enough, however.