MySQL and UTF-8
When working with UTF-8 on MySQL, it is not enough to define the CHARACTER SET
and the COLLATE
parameters to utf-8 when creating the database. You also have to tell MySQL that the queries you’ll be calling are utf-8. Indeed, by default the character set used by the connection and the result sets is latin-1:
mysql> SHOW VARIABLES LIKE 'character_set%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec)
When doing your queries yourself with mysql_query, this can be a source of confusion, as your data is stored properly in UTF-8, but still comes back funny. That’s something that recently bit me as I was fiddling with an old version of ezSQL which didn’t allow the user to change the encoding1.
You can force utf-8 by executing the following:
SET NAMES 'utf8';
Which is equivalent to:
SET character_set_client = utf8; SET character_set_results = utf8; SET character_set_connection = utf8;
In recent PHP (>= 5.2), you can also execute:
mysql_set_charset('utf8',$conn);
Libraries like Propel usually handle that quite well by specifying a configuration option, and relieving the developer from these worries. Typically, the runtime configuration settings for Propel would be:
<config> <propel> <datasources> <datasource> <connection> <!-- ... --> <settings> <setting id="charset">utf8</setting> </settings>
For Rails, it is also very similar. When defining your database instance in config/database.yml
, you can also give the encoding
parameter:
development: adapter: mysql encoding: utf8 reconnect: false database: pouet_dev pool: 5 username: root password: pouet host: localhost socket: /var/run/mysqld/mysqld.sock
For Hibernate, arbitrary connection properties can be passed by using the property name, with hibernate.connection
preprended to the name.
<property name="hibernate.connection.characterEncoding">UTF-8</property>
This parameter is the MySQL Connector/J parameters used by the driver to indicate the encoding (note that the documentation indicates that SET NAMES 'utf8'
would not work with Connector/J). Examples will probably follow…
1 Not sure recent versions do either?