Dealing with Ruby 1.9.1 Encoding Hell in a Rails Application
There are various indications that Rails 2.3.3 is not quite ready for Ruby 1.9.1 and the new encoding support built in to the String class. Here is an example bug report filed against Rails 2.3: Encoding error in Ruby1.9 for templates .
You have to read a lot to understand all that can go wrong. Here are some sources to get you started. Go. Read.
- What Every Ruby Programmer Must Know About Encoding
- Illustration of Ruby 1.9 encoding complexity as a Test::Unit test case
The basic issue is you have to be concerned about and aware of what encoding any library is using when creating their String objects before passing them back to the caller. Your Rails app can be UTF-8 through and through by using the magic comment in all source files, but if your ActiveRecord mysql adapter is returning you ASCII_8BIT strings because one of your tables has a row with bad data, you will start getting encoding errors as you try to combine those ASCII_8BIT strings with your app's UTF-8 strings. This manifests itself usually during template rendering as a 500 Server Error caused by a IncompatibleEncoding exception in the Rails stack.
There is talk out there of fixes to the mysql ActiveRecord adapter to make sure it doesn't give you back ASCII_8BIT strings, but until that time, you are out of luck. You either clean your database entirely, or you live with the odd 500 error in your app. Or, thanks to the dynamic nature of Ruby, you can monkey patch the errors away. I consider this a valid use of monkey patching. It can easily be backed out once the external libraries in question publish fixes.
There are two things (*) you have to fix.
- Template loading
- Database adapter
The following code, if placed in config/initializers/fix_encoding.rb, will do the trick. It makes sure that when templates and partials are loaded, the file is read in UTF-8 mode. It also performs a "poor man's" scrub of any strings returned by ActiveRecord attribute getters by forcing the encoding to UTF-8 and performing a replace by the empty string of any invalid or undefined characters. It is a decent stopgap until the libraries are fixed to support encoding better.
# encoding: UTF-8
# This monkey patch forces the encoding of all templates loaded by Rails to UTF-8.
# Based off Rails 2.3.3 and (may be) compatible with Rails 2.3.5
module ActionView
module Renderable #:nodoc:
private
def compile!(render_symbol, local_assigns)
locals_code = local_assigns.keys.map { |key| "#{key} = local_assigns[:#{key}];" }.join
source = <<-end_src
def #{render_symbol}(local_assigns)
old_output_buffer = output_buffer;#{locals_code};#{compiled_source}
ensure
self.output_buffer = old_output_buffer
end
end_src
source.encode!('UTF-8', :invalid => :replace, :undef => :replace, :replace => '');
source.force_encoding('UTF-8')
begin
ActionView::Base::CompiledTemplates.module_eval(source, filename, 0)
rescue Errno::ENOENT => e
raise e # Missing template file, re-raise for Base to rescue
rescue Exception => e # errors from template code
if logger = defined?(ActionController) && Base.logger
logger.debug "ERROR: compiling #{render_symbol} RAISED #{e}"
logger.debug "Function body: #{source}"
logger.debug "Backtrace: #{e.backtrace.join("\n")}"
end
raise ActionView::TemplateError.new(self, {}, e)
end
end
end
class Template
def source
File.read(filename, :encoding => 'UTF-8')
end
end
end
# This monkey patch attempts to force the encoding of all non-UTF-8 strings to UTF-8
module ActiveRecord
module AttributeMethods
module ClassMethods
private
def define_read_method(symbol, attr_name, column)
cast_code = column.type_cast_code('v') if column
access_code = cast_code ? "(v=@attributes['#{attr_name}']) && #{cast_code}" : "@attributes['#{attr_name}']"
unless attr_name.to_s == self.primary_key.to_s
access_code = access_code.insert(0,
"missing_attribute('#{attr_name}', caller) unless @attributes.has_key?('#{attr_name}'); ")
end
if cache_attribute?(attr_name)
access_code = "@attributes_cache['#{attr_name}'] ||= (#{access_code})"
end
evaluate_attribute_method attr_name, "def #{symbol}; x = (#{access_code}); if String === x then;
x.encode!('UTF-8', :invalid => :replace, :undef => :replace, :replace => '');
x.force_encoding('UTF-8'); end; x; end"
end
end
end
end
(*) There is one other thing you have to be concerned about possibly. If you do fragment caching, you may have to monkey patch the fragment loading and saving code to open the file streams using UTF-8 encoding. The above two fixes should be enough, however.