active_record, MySQL, and emoji

ruby taskrabbit 
Thu Apr 24 2014 - Originally posted at tech.taskrabbit.com


More and more, people are adopting emoji in their online communications. At TaskRabbit, we noticed that our users are starting to use emoji all over the place, from task descriptions to reviews.

There are some problems when supporting the emoji character set wit our stack, which includes Rails 4.0 and MySQL. The main problem is that MySQL’s utf8 encoding does not actually support multi-byte strings, which emoji relies on. In MySQL 5.5, the utf8mb4 encoding was introduced which allows for Multi-Byte (mb) strings… and therefore emoji would work! The MySQL gem introduced support for utf8mb4 about a year ago, but only recently did active_record (and rails) add support for this in rails 4.1.

Initially, we decided to ignore all emoji characters, literally stripping them out of strings with our demogi gem (Thanks Pablo!). However, with our new product launch in the UK, we thought it was time to actually address the problem. Here is what we learned:

Migrating MySQL from utf8 to utf8mb4

The good news is that the upgrade path from utf8 to utf8mb4 is easy. As we are adding bytes, the migration is really just a definition change at the table-level. Nothing has to change with your existing data. This is a non-blocking and non-downtime migration. If you are using normal rails migrations, all of your column types for VARCHAR columns will be based on the table’s encoding. Changing the table will change the column type. The bad news is that any text-type (or blob-type) columns will need to be explicitly changed.

Check out the migration steps:

  1. change the DB’s encoding entirely, so new tables will be created in utf8mb4
  2. alter all existing tables
  3. explicitly update text-type columns
1class Utf8mb4 < ActiveRecord::Migration 2 3 UTF8_PAIRS = { 4 'users' => 'notes', 5 'comments' => 'message' 6 # ... 7 } 8 9 def self.up 10 execute "ALTER DATABASE `#{ActiveRecord::Base.connection.current_database}` CHARACTER SET utf8mb4;" 11 12 ActiveRecord::Base.connection.tables.each do |table| 13 execute "ALTER TABLE `#{table}` CHARACTER SET = utf8mb4;" 14 end 15 16 UTF8_PAIRS.each do |table, col| 17 execute "ALTER TABLE `#{table}` CHANGE `#{col}` `#{col}` TEXT CHARACTER SET utf8mb4 NULL;" 18 end 19 20 end 21 22 def self.down 23 execute "ALTER DATABASE `#{ActiveRecord::Base.connection.current_database}` CHARACTER SET utf8;" 24 25 ActiveRecord::Base.connection.tables.each do |table| 26 execute "ALTER TABLE `#{table}` CHARACTER SET = utf8;" 27 end 28 29 UTF8_PAIRS.each do |table, col| 30 execute "ALTER TABLE `#{table}` CHANGE `#{col}` `#{col}` TEXT CHARACTER SET utf8 NULL;" 31 end 32 end 33end

database.yml

The only change here is to change the encoding:

1development: 2 adapter: mysql2 3 encoding: utf8mb4 # <--- HERE 4 database: my_db_name 5 username: root 6 password: my_password 7 host: 127.0.0.1 8 port: 3306

Index Lengths

The last step here is to worry about index lengths, as mentioned above. If you are on rails 4.1, you have nothing to worry about! The rest of us have a few options:

  1. monkeypatch activerecord
  2. change the index length within MySQL
  3. set the length to 191 within all index migrations

We chose #2 due to the simplicity of the solution. Check the links above for a detailed discussion of the problem.

1module ActiveRecord 2 module ConnectionAdapters 3 class AbstractMysqlAdapter 4 NATIVE_DATABASE_TYPES[:string] = { :name => "varchar", :limit => 191 } 5 end 6 end 7end

And now you can emoji to your ❤’s content!

Hi, I'm Evan

I write about Technology, Software, and Startups. I use my Product Management, Software Engineering, and Leadership skills to build teams that create world-class digital products.

Get in touch