This section describes how to configure the server to use different character sets. It also discusses how to set the server's time zone and enable per-connection time zone support.
        By default, MySQL uses the latin1 (cp1252
        West European) character set and the
        latin1_swedish_ci collation that sorts
        according to Swedish/Finnish rules. These defaults are suitable
        for the United States and most of Western Europe.
      
        All MySQL binary distributions are compiled with
        --with-extra-charsets=complex. This adds code
        to all standard programs that enables them to handle
        latin1 and all multi-byte character sets
        within the binary. Other character sets are loaded from a
        character-set definition file when needed.
      
        The character set determines what characters are allowed in
        identifiers. The collation determines how strings are sorted by
        the ORDER BY and GROUP BY
        clauses of the SELECT statement.
      
        You can change the default server character set and collation
        with the --character-set-server and
        --collation-server options when you start the
        server. The collation must be a legal collation for the default
        character set. (Use the SHOW COLLATION
        statement to determine which collations are available for each
        character set.) See Section 5.2.1, “mysqld Command Options”.
      
        The character sets available depend on the
        --with-charset=
        and
        charset_name--with-extra-charsets= options to
        configure, and the character set
        configuration files listed in
        list-of-charsets
        | complex | all | noneSHAREDIR/charsets/Index
        If you change the character set when running MySQL, that may
        also change the sort order. Consequently, you must run
        myisamchk -r -q
        --set-collation=collation_name
        on all MyISAM tables, or your indexes may not
        be ordered correctly.
      
When a client connects to a MySQL server, the server indicates to the client what the server's default character set is. The client switches to this character set for this connection.
        You should use mysql_real_escape_string()
        when escaping strings for an SQL query.
        mysql_real_escape_string() is identical to
        the old mysql_escape_string() function,
        except that it takes the MYSQL connection
        handle as the first parameter so that the appropriate character
        set can be taken into account when escaping characters.
      
        If the client is compiled with paths that differ from where the
        server is installed and the user who configured MySQL didn't
        include all character sets in the MySQL binary, you must tell
        the client where it can find the additional character sets it
        needs if the server runs with a different character set from the
        client. You can do this by specifying a
        --character-sets-dir option to indicate the
        path to the directory in which the dynamic MySQL character sets
        are stored. For example, you can put the following in an option
        file:
      
[client] character-sets-dir=/usr/local/mysql/share/mysql/charsets
You can force the client to use specific character set as follows:
[client]
default-character-set=charset_name
This is normally unnecessary, however.
          In MySQL 5.0, character set and collation are
          specified separately. This means that if you want German sort
          order, you should select the latin1
          character set and either the
          latin1_german1_ci or
          latin1_german2_ci collation. For example,
          to start the server with the
          latin1_german1_ci collation, use the
          --character-set-server=latin1 and
          --collation-server=latin1_german1_ci options.
        
For information on the differences between these two collations, see Section 10.9.2, “West European Character Sets”.
By default, mysqld produces error messages in English, but they can also be displayed in any of these other languages: Czech, Danish, Dutch, Estonian, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Norwegian-ny, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, or Swedish.
        To start mysqld with a particular language
        for error messages, use the --language or
        -L option. The option value can be a language
        name or the full path to the error message file. For example:
      
shell> mysqld --language=swedish
Or:
shell> mysqld --language=/usr/local/share/swedish
The language name should be specified in lowercase.
        By default, the language files are located in the
        share/
        directory under the MySQL base directory.
      LANGUAGE
You can also change the content of the error messages produced by the server. Details can be found in the MySQL Internals manual, available at http://dev.mysql.com/doc/. If you upgrade to a newer version of MySQL after changing the error messages, remember to repeat your changes after the upgrade.
This section discusses the procedure for adding a new character set to MySQL. You must have a MySQL source distribution to use these instructions. To choose the proper procedure, determine whether the character set is simple or complex:
If the character set does not need to use special string collating routines for sorting and does not need multi-byte character support, it is simple.
If it needs either of those features, it is complex.
        For example, latin1 and
        danish are simple character sets, whereas
        big5 and czech are complex
        character sets.
      
        In the following instructions, the name of the character set is
        represented by MYSET.
      
For a simple character set, do the following:
            Add MYSET to the end of the
            sql/share/charsets/Index file. Assign a
            unique number to it.
          
            Create the file
            sql/share/charsets/.
            (You can use a copy of
            MYSET.confsql/share/charsets/latin1.conf as the
            basis for this file.)
          
The syntax for the file is very simple:
                Comments start with a ‘#’
                character and continue to the end of the line.
              
Words are separated by arbitrary amounts of whitespace.
When defining the character set, every word must be a number in hexadecimal format.
                The ctype array takes up the first
                257 words. The to_lower[],
                to_upper[] and
                sort_order[] arrays take up 256 words
                each after that.
              
            Add the character set name to the
            CHARSETS_AVAILABLE and
            COMPILED_CHARSETS lists in
            configure.in.
          
Reconfigure, recompile, and test.
For a complex character set, do the following:
            Create the file
            strings/ctype-
            in the MySQL source distribution.
          MYSET.c
            Add MYSET to the end of the
            sql/share/charsets/Index file. Assign a
            unique number to it.
          
            Look at one of the existing ctype-*.c
            files (such as strings/ctype-big5.c) to
            see what needs to be defined. Note that the arrays in your
            file must have names like
            ctype_,
            MYSETto_lower_,
            and so on. These correspond to the arrays for a simple
            character set. See Section 5.11.4, “The Character Definition Arrays”.
          MYSET
Near the top of the file, place a special comment like this:
/* * This comment is parsed by configure to create ctype.c, * so don't change it unless you know what you are doing. * * .configure. number_MYSET=MYNUMBER* .configure. strxfrm_multiply_MYSET=N* .configure. mbmaxlen_MYSET=N*/
The configure program uses this comment to include the character set into the MySQL library automatically.
            The strxfrm_multiply and
            mbmaxlen lines are explained in the
            following sections. You need include them only if you need
            the string collating functions or the multi-byte character
            set functions, respectively.
          
You should then create some of the following functions:
                my_strncoll_
              MYSET()
                my_strcoll_
              MYSET()
                my_strxfrm_
              MYSET()
                my_like_range_
              MYSET()
            Add the character set name to the
            CHARSETS_AVAILABLE and
            COMPILED_CHARSETS lists in
            configure.in.
          
Reconfigure, recompile, and test.
        The sql/share/charsets/README file includes
        additional instructions.
      
        If you want to have the character set included in the MySQL
        distribution, mail a patch to the MySQL
        internals mailing list. See
        Section 1.7.1, “MySQL Mailing Lists”.
      
        to_lower[] and to_upper[]
        are simple arrays that hold the lowercase and uppercase
        characters corresponding to each member of the character set.
        For example:
      
to_lower['A'] should contain 'a' to_upper['a'] should contain 'A'
        sort_order[] is a map indicating how
        characters should be ordered for comparison and sorting
        purposes. Quite often (but not for all character sets) this is
        the same as to_upper[], which means that
        sorting is case-insensitive. MySQL sorts characters based on the
        values of sort_order[] elements. For more
        complicated sorting rules, see the discussion of string
        collating in Section 5.11.5, “String Collating Support”.
      
        ctype[] is an array of bit values, with one
        element for one character. (Note that
        to_lower[], to_upper[],
        and sort_order[] are indexed by character
        value, but ctype[] is indexed by character
        value + 1. This is an old legacy convention for handling
        EOF.)
      
        You can find the following bitmask definitions in
        m_ctype.h:
      
#define _U 01 /* Uppercase */ #define _L 02 /* Lowercase */ #define _N 04 /* Numeral (digit) */ #define _S 010 /* Spacing character */ #define _P 020 /* Punctuation */ #define _C 040 /* Control character */ #define _B 0100 /* Blank */ #define _X 0200 /* heXadecimal digit */
        The ctype[] entry for each character should
        be the union of the applicable bitmask values that describe the
        character. For example, 'A' is an uppercase
        character (_U) as well as a hexadecimal digit
        (_X), so ctype['A'+1]
        should contain the value:
      
_U + _X = 01 + 0200 = 0201
        If the sorting rules for your language are too complex to be
        handled with the simple sort_order[] table,
        you need to use the string collating functions.
      
        The best documentation for this is the existing character sets.
        Look at the big5, czech,
        gbk, sjis, and
        tis160 character sets for examples.
      
        You must specify the
        strxfrm_multiply_
        value in the special comment at the top of the file.
        MYSET=NN should be set to the maximum ratio
        the strings may grow during
        my_strxfrm_
        (it must be a positive integer).
      MYSET
If you want to add support for a new character set that includes multi-byte characters, you need to use the multi-byte character functions.
        The best documentation for this is the existing character sets.
        Look at the euc_kr,
        gb2312, gbk,
        sjis, and ujis character
        sets for examples. These are implemented in the
        ctype-
        files in the charset_name.cstrings directory.
      
        You must specify the
        mbmaxlen_
        value in the special comment at the top of the source file.
        MYSET=NN should be set to the size in bytes
        of the largest character in the set.
      
If you try to use a character set that is not compiled into your binary, you might run into the following problems:
            Your program uses an incorrect path to determine where the
            character sets are stored. (Default
            /usr/local/mysql/share/mysql/charsets).
            This can be fixed by using the
            --character-sets-dir option when you run
            the program in question.
          
The character set is a multi-byte character set that cannot be loaded dynamically. In this case, you must recompile the program with support for the character set.
The character set is a dynamic character set, but you do not have a configure file for it. In this case, you should install the configure file for the character set from a new MySQL distribution.
            If your Index file does not contain the
            name for the character set, your program displays the
            following error message:
          
ERROR 1105: File '/usr/local/share/mysql/charsets/?.conf' not found (Errcode: 2)
            In this case, you should either get a new
            Index file or manually add the name of
            any missing character sets to the current file.
          
        For MyISAM tables, you can check the
        character set name and number for a table with
        myisamchk -dvv
        tbl_name.
      
The MySQL server maintains several time zone settings:
            The system time zone. When the server starts, it attempts to
            determine the time zone of the host machine and uses it to
            set the system_time_zone system variable.
            The value does not change thereafter.
          
            The server's current time zone. The global
            time_zone system variable indicates the
            time zone the server currently is operating in. The initial
            value for time_zone is
            'SYSTEM', which indicates that the server
            time zone is the same as the system time zone. The initial
            value can be specified explicitly with the
            --default-time-zone=
            option. If you have the timezoneSUPER privilege,
            you can set the global value at runtime with this statement:
          
mysql> SET GLOBAL time_zone = timezone;
            Per-connection time zones. Each client that connects has its
            own time zone setting, given by the session
            time_zone variable. Initially, the
            session variable takes its value from the global
            time_zone variable, but the client can
            change its own time zone with this statement:
          
mysql> SET time_zone = timezone;
The current values of the global and client-specific time zones can be retrieved like this:
mysql> SELECT @@global.time_zone, @@session.time_zone;
        timezone values can be given as
        strings indicating an offset from UTC, such as
        '+10:00' or '-6:00'. If
        the time zone information tables in the mysql
        database have been created and populated, you can also use named
        time zones, such as 'Europe/Helsinki',
        'US/Eastern', or 'MET'.
        The value 'SYSTEM' can be used to indicate
        that the time zone should be the same as the system time zone.
        Time zone names are not case sensitive.
      
        The MySQL installation procedure creates the time zone tables in
        the mysql database, but does not load them.
        You must do so manually. (If you are upgrading to MySQL 4.1.3 or
        later from an earlier version, you should create the tables by
        upgrading your mysql database. Use the
        instructions in Section 5.6.2, “mysql_upgrade — Check Tables for MySQL Upgrade”.)
      
        If your system has its own zoneinfo
        database (the set of files describing time zones), you should
        use the mysql_tzinfo_to_sql program for
        filling the time zone tables. Examples of such systems are
        Linux, FreeBSD, Sun Solaris, and Mac OS X. One likely location
        for these files is the /usr/share/zoneinfo
        directory. If your system does not have a zoneinfo database, you
        can use the downloadable package described later in this
        section.
      
The mysql_tzinfo_to_sql program is used to load the time zone tables. On the command line, pass the zoneinfo directory pathname to mysql_tzinfo_to_sql and send the output into the mysql program. For example:
shell> mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql
mysql_tzinfo_to_sql reads your system's time zone files and generates SQL statements from them. mysql processes those statements to load the time zone tables.
mysql_tzinfo_to_sql also can be used to load a single time zone file, and to generate leap second information:
            To load a single time zone file
            tz_file that corresponds to a
            time zone name tz_name, invoke
            mysql_tzinfo_to_sql like this:
          
shell> mysql_tzinfo_to_sql tz_file tz_name | mysql -u root mysql
            If your time zone needs to account for leap seconds,
            initialize the leap second information like this, where
            tz_file is the name of your time
            zone file:
          
shell> mysql_tzinfo_to_sql --leap tz_file | mysql -u root mysql
        If your system doesn't have a zoneinfo database (for example,
        Windows or HP-UX), you can use the package of pre-built time
        zone tables that is available for download at
        http://dev.mysql.com/downloads/timezones.html. This package
        contains .frm, .MYD,
        and .MYI files for the
        MyISAM time zone tables. These tables should
        be part of the mysql database, so you should
        place the files in the mysql subdirectory
        of your MySQL server's data directory. The server should be
        stopped while you do this.
      
Warning: Please don't use the downloadable package if your system has a zoneinfo database. Use the mysql_tzinfo_to_sql utility instead. Otherwise, you may cause a difference in datetime handling between MySQL and other applications on your system.
For information about time zone settings in replication setup, please see Section 6.7, “Replication Features and Known Problems”.