Dealing with Ruby 1.9.1 Encoding Hell in a Rails Application
There are various indications that Rails 2.3.3 is not quite ready for Ruby 1.9.1 and the new encoding support built in to the String class. Here is an example bug report filed against Rails 2.3: Encoding error in Ruby1.9 for templates .
You have to read a lot to understand all that can go wrong. Here are some sources to get you started. Go. Read.
- What Every Ruby Programmer Must Know About Encoding
- Illustration of Ruby 1.9 encoding complexity as a Test::Unit test case
The basic issue is you have to be concerned about and aware of what encoding any library is using when creating their String objects before passing them back to the caller. Your Rails app can be UTF-8 through and through by using the magic comment in all source files, but if your ActiveRecord mysql adapter is returning you ASCII_8BIT strings because one of your tables has a row with bad data, you will start getting encoding errors as you try to combine those ASCII_8BIT strings with your app's UTF-8 strings. This manifests itself usually during template rendering as a 500 Server Error caused by a IncompatibleEncoding exception in the Rails stack.
There is talk out there of fixes to the mysql ActiveRecord adapter to make sure it doesn't give you back ASCII_8BIT strings, but until that time, you are out of luck. You either clean your database entirely, or you live with the odd 500 error in your app. Or, thanks to the dynamic nature of Ruby, you can monkey patch the errors away. I consider this a valid use of monkey patching. It can easily be backed out once the external libraries in question publish fixes.
There are two things (*) you have to fix.
- Template loading
- Database adapter
The following code, if placed in config/initializers/fix_encoding.rb, will do the trick. It makes sure that when templates and partials are loaded, the file is read in UTF-8 mode. It also performs a "poor man's" scrub of any strings returned by ActiveRecord attribute getters by forcing the encoding to UTF-8 and performing a replace by the empty string of any invalid or undefined characters. It is a decent stopgap until the libraries are fixed to support encoding better.
(*) There is one other thing you have to be concerned about possibly. If you do fragment caching, you may have to monkey patch the fragment loading and saving code to open the file streams using UTF-8 encoding. The above two fixes should be enough, however.