The following sections various aspects of character set collations.
COLLATE clause, you can override
whatever the default collation is for a comparison.
COLLATE may be used in various parts of SQL
statements. Here are some examples:
SELECT k FROM t1 ORDER BY k COLLATE latin1_german2_ci;
SELECT k COLLATE latin1_german2_ci AS k1 FROM t1 ORDER BY k1;
SELECT k FROM t1 GROUP BY k COLLATE latin1_german2_ci;
With aggregate functions:
SELECT MAX(k COLLATE latin1_german2_ci) FROM t1;
SELECT DISTINCT k COLLATE latin1_german2_ci FROM t1;
SELECT * FROM t1 WHERE _latin1 'Müller' COLLATE latin1_german2_ci = k;
SELECT * FROM t1 WHERE k LIKE _latin1 'Müller' COLLATE latin1_german2_ci;
SELECT k FROM t1 GROUP BY k HAVING k = _latin1 'Müller' COLLATE latin1_german2_ci;
COLLATE clause has high precedence
||), so the following two
expressions are equivalent:
x || y COLLATE z x || (y COLLATE z)
BINARY operator casts the string
following it to a binary string. This is an easy way to force a
comparison to be done byte by byte rather than character by
BINARY also causes trailing spaces
to be significant.
SELECT 'a' = 'A';-> 1 mysql>
SELECT BINARY 'a' = 'A';-> 0 mysql>
SELECT 'a' = 'a ';-> 1 mysql>
SELECT BINARY 'a' = 'a ';-> 0
BINARY attribute in character column
definitions has a different effect. A character column defined
BINARY attribute is assigned the
binary collation of the column's character set. Every character
set has a binary collation. For example, the binary collation
latin1 character set is
latin1_bin, so if the table default character
latin1, these two column definitions
CHAR(10) BINARY CHAR(10) CHARACTER SET latin1 COLLATE latin1_bin
The effect of
BINARY as a column attribute
differs from its effect prior to MySQL 4.1. Formerly,
BINARY resulted in a column that was treated
as a binary string. A binary string is a string of bytes that
has no character set or collation, which differs from a
non-binary character string that has a binary collation. For
both types of strings, comparisons are based on the numeric
values of the string unit, but for non-binary strings the unit
is the character and some character sets allow multi-byte
characters. Section 11.4.2, “The
The use of
CHARACTER SET binary in the
definition of a
causes the column to be treated as a binary data type. For
example, the following pairs of definitions are equivalent:
CHAR(10) CHARACTER SET binary BINARY(10) VARCHAR(10) CHARACTER SET binary VARBINARY(10) TEXT CHARACTER SET binary BLOB
In the great majority of statements, it is obvious what
collation MySQL uses to resolve a comparison operation. For
example, in the following cases, it should be clear that the
collation is the collation of column
SELECT x FROM T ORDER BY x; SELECT x FROM T WHERE x = x; SELECT DISTINCT x FROM T;
However, when multiple operands are involved, there can be ambiguity. For example:
SELECT x FROM T WHERE x = 'Y';
Should this query use the collation of the column
x, or of the string literal
Standard SQL resolves such questions using what used to be
called “coercibility” rules. Basically, this means:
collations, so which collation takes precedence? This can be
difficult to resolve, but the following rules cover most
COLLATE clause has a
coercibility of 0. (Not coercible at all.)
The concatenation of two strings with different collations has a coercibility of 1.
The collation of a column or a stored routine parameter or local variable has a coercibility of 2.
A “system constant” (the string returned by
functions such as
VERSION()) has a coercibility of 3.
A literal's collation has a coercibility of 4.
NULL or an expression that is derived
NULL has a coercibility of 5.
The preceding coercibility values are current as of MySQL 5.0.3.
In MySQL 5.0 prior to 5.0.3, there is no system
constant or ignorable coercibility. Functions such as
USER() have a coercibility of 2 rather than
3, and literals have a coercibility of 3 rather than 4.
Those rules resolve ambiguities in the following manner:
Use the collation with the lowest coercibility value.
If both sides have the same coercibility, then it is an error if the collations aren't the same.
|Use collation of |
|Use collation of |
COERCIBILITY() function can be used to
determine the coercibility of a string expression:
SELECT COERCIBILITY('A' COLLATE latin1_swedish_ci);-> 0 mysql>
SELECT COERCIBILITY(VERSION());-> 3 mysql>
SELECT COERCIBILITY('A');-> 4
Each character set has one or more collations, but each
collation is associated with one and only one character set.
Therefore, the following statement causes an error message
latin2_bin collation is not legal
latin1 character set:
SELECT _latin1 'x' COLLATE latin2_bin;ERROR 1253 (42000): COLLATION 'latin2_bin' is not valid for CHARACTER SET 'latin1'
Suppose that column
X in table
T has these
Muffler Müller MX Systems MySQL
Suppose also that the column values are retrieved using the following statement:
SELECT X FROM T ORDER BY X COLLATE
The following table shows the resulting order of the values if
ORDER BY with different collations:
|Müller||MX Systems||MX Systems|
The character that causes the different sort orders in this
example is the U with two dots over it
ü), which the Germans call
The first column shows the result of the
SELECT using the Swedish/Finnish
collating rule, which says that U-umlaut sorts with Y.
The second column shows the result of the
SELECT using the German DIN-1 rule, which
says that U-umlaut sorts with U.
The third column shows the result of the
SELECT using the German DIN-2 rule, which
says that U-umlaut sorts with UE.