Making use of Emojis in 2020 seems like the most normal thing right?
It turns out neither mysqljs
nor mysql
nor docker mysql
image are setup to support them by default.
While trying to set them up, I encountered a few errors (which by the end of this post we will solve):
ER_CANT_AGGREGATE_2COLLATIONS: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
ER_CANT_AGGREGATE_2COLLATIONS: Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
- Emojis appearing as question marks
????
in the database.
Moving from latin1 to utf8
In the first error Illegal mix of collations (utf8_general_ci,IMPLICIT) and (latin1_general_ci,COERCIBLE) near '='
, MySQL is telling us that we are passing a utf-8
and comparing it with a latin1
column in the database. There is a key difference between collation and character set: the former is for sorting sets of characters, and the other is for determining what character should be assumed from the current binary value.
Solution
You want to convert your database to utf-8
.
IMPORTANT: although you may be tempted to use a CHARSET utf8
, utf8
is a bad implementation of UTF-8
, instead you should be using utf8mb4
(which allows for 4 bytes to store characters instead of 3, and you need 4 for emojis).
Dump the database
If you are using Docker, you will have to export the database:
docker exec -it my-mysql-container sh
# within the container sh
mysqldump -u my-user -p my-db-name > /var/lib/mysql/my-dump.sql
Then you will want to vim into it and do a search and replace, by typing the following sequence gg:.,$s/latin1 /utf8mb4 /g
then enter, then :wq
then enter again.
Kill the docker mysql container and volumes
docker rm -f my-mysql-container
docker volume rm my-mysql-container-volume
Now you have no garbage defaults from the previous container. Let's recreate:
Override mysql docker defaults
You will want to override the default configuration for the mysql docker image. To do so, you need to make a my.cnf
accessible to mysql within the container. This is achieved by mounting the file as a volume (everywhere, mysql will look for it). So create our my.cnf
in the same directory as docker-compose.yml
.
my.cnf
:
[client]
default-character-set = utf8mb4$
[mysql]
default-character-set = utf8mb4
[mysqld]
init_connect='SET collation_connection = utf8mb4_unicode_ci'
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
Now we need to tell docker-compose where to mount it, so in docker-compose.yml
:
# ...
volumes:
- ./mymysql.cnf:/etc/mysql/conf.d/my.cnf
- ./mymysql.cnf:/etc/mysql/my.cnf
- ./mymysql.cnf:/etc/my.cnf
# ...
Notice above that we are mounting it in many places (that's because we do not have time to figure out where mysql will look for it, so we spread the good word).
Telling mysqljs collation connection
This above, may be enough. But if it is not (aka. you are getting question marks instead of emojis), try the following.
Now, you have been able to insert, but you are seeing ????
instead of the emoji. This is a sign that there is a discrepancy between the database charset or collation and the actual connection / client collation.
What we need to do is tell mysqljs to use the proper charset, in your .env
file or in the docker-compose.yml
:
environment:
DB_CHARSET: utf8mb4$
DB_COLLATION: utf8mb4_general_ci$
This assumes that you are using process.env
to pass those values to mysqljs
(this is done automagically by mysql-oh-wait
package).
Another approach
You can try a different way: by telling docker-compose
to start the mysql with a different command
. Add this to your docker-compose.yml
:
my-mysq-container-name:
restart: always
command: --default-authentication-plugin=mysql_native_password --character-set-server=utf8mb4 --collation-server=utf8mb4_general_ci
But again, this is just trying to achieve the same from a different angle than the initial my.cnf
file.
Troubelshooting
You can issue a few commands (which may throw if you have FOREIGN KEY
Constraints) to try to alter the database CHARSET
before using mysqldump:
ALTER DATABASE my_db_name CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
ALTER TABLE My_table_name,
CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
But if you have foreign key constraints errors, you can do the transformation altering the mysqldump
output file as in previous section.
NOTE: To check what the current character set / collations are, you can issue (from within the container):
mysql -p -e "show variables like '%collation_connection';"
Or within mysql client (alsow thin the container...):
SELECT @@character_set_database, @@collation_database;
SELECT * FROM INFORMATION_SCHEMA.SCHEMATA;
SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME LIKE '%';
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
References
- Charset v. Collation
- Most effective
- Standard
UTF-8
is namedutf8mb4
in mysql - Which
utf8
? - Partial solution
- Show collations
- About emojis
- Regex for validating emojis
- where mysql looks for .cnf