Metadata is “the data about the
data.” Anything that describes the
database — as opposed to being the
contents of the database — is metadata.
Thus column names, database names, usernames, version names, and
most of the string results from
metadata. This is also true of the contents of tables in
INFORMATION_SCHEMA, because those tables by
definition contain information about database objects.
Representation of metadata must satisfy these requirements:
All metadata must be in the same character set. Otherwise,
SHOW commands nor
SELECT statements for tables in
INFORMATION_SCHEMA would work properly
because different rows in the same column of the results of
these operations would be in different character sets.
Metadata must include all characters in all languages. Otherwise, users would not be able to name columns and tables using their own languages.
To satisfy both requirements, MySQL stores metadata in a Unicode character set, namely UTF-8. This does not cause any disruption if you never use accented or non-Latin characters. But if you do, you should be aware that metadata is in UTF-8.
The metadata requirements mean that the return values of the
VERSION() functions have the UTF-8
character set by default.
The server sets the
variable to the name of the metadata character set:
SHOW VARIABLES LIKE 'character_set_system';+----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | character_set_system | utf8 | +----------------------+-------+
Storage of metadata using Unicode does not
mean that the server returns headers of columns and the results of
DESCRIBE functions in the
character_set_system character set by default.
When you use
SELECT column1 FROM t, the name
column1 itself is returned from the server to
the client in the character set determined by the value of the
character_set_results system variable, which
has a default value of
latin1. If you want the
server to pass metadata results back in a different character set,
SET NAMES statement to force the server
to perform character set conversion.
character_set_results and other
related system variables. (See
Section 10.4, “Connection Character Sets and Collations”.) Alternatively, a client
program can perform the conversion after receiving the result from
the server. It is more efficient for the client perform the
conversion, but this option is not always available for all
character_set_results is set to
NULL, no conversion is performed and the server
returns metadata using its original character set (the set
Error messages returned from the server to the client are converted to the client character set automatically, as with metadata.
If you are using (for example) the
function for comparison or assignment within a single statement,
don't worry. MySQL performs some automatic conversion for you.
SELECT * FROM Table1 WHERE USER() = latin1_column;
This works because the contents of
latin1_column are automatically converted to
UTF-8 before the comparison.
INSERT INTO Table1 (latin1_column) SELECT USER();
This works because the contents of
automatically converted to
latin1 before the
assignment. Automatic conversion is not fully implemented yet, but
should work correctly in a later version.
Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a “subset” of Unicode. Because it is a well-known principle that “what applies to a superset can apply to a subset,” we believe that a collation for Unicode can apply for comparisons with non-Unicode strings.