MySQL keeps row data and index data in separate files. Many (almost all) other database systems mix row and index data in the same file. We believe that the MySQL choice is better for a very wide range of modern systems.
Another way to store the row data is to keep the information for each column in a separate area (examples are SDBM and Focus). This causes a performance hit for every query that accesses more than one column. Because this degenerates so quickly when more than one column is accessed, we believe that this model is not good for general-purpose databases.
The more common case is that the index and data are stored together (as in Oracle/Sybase, et al). In this case, you find the row information at the leaf page of the index. The good thing with this layout is that it, in many cases, depending on how well the index is cached, saves a disk read. The bad things with this layout are:
Table scanning is much slower because you have to read through the indexes to get at the data.
You cannot use only the index table to retrieve data for a query.
You use more space because you must duplicate indexes from the nodes (you cannot store the row in the nodes).
Deletes degenerate the table over time (because indexes in nodes are usually not updated on delete).
It is more difficult to cache only the index data.
One of the most basic optimizations is to design your tables to take as little space on the disk as possible. This can result in huge improvements because disk reads are faster, and smaller tables normally require less main memory while their contents are being actively processed during query execution. Indexing also is a lesser resource burden if done on smaller columns.
MySQL supports many different storage engines (table types) and row formats. For each table, you can decide which storage and indexing method to use. Choosing the proper table format for your application may give you a big performance gain. See Chapter 14, Storage Engines and Table Types.
You can get better performance for a table and minimize storage space by using the techniques listed here:
Use the most efficient (smallest) data types possible. MySQL
has many specialized types that save disk space and memory.
For example, use the smaller integer types if possible to
get smaller tables. MEDIUMINT
is often a
better choice than INT
because a
MEDIUMINT
column uses 25% less space.
Declare columns to be NOT NULL
if
possible. It makes everything faster and you save one bit
per column. If you really need NULL
in
your application, you should definitely use it. Just avoid
having it on all columns by default.
For MyISAM
tables, if you do not have any
variable-length columns (VARCHAR
,
TEXT
, or BLOB
columns), a fixed-size row format is used. This is faster
but unfortunately may waste some space. See
Section 14.1.3, “MyISAM
Table Storage Formats”. You can hint that
you want to have fixed length rows even if you have
VARCHAR
columns with the CREATE
TABLE
option ROW_FORMAT=FIXED
.
Starting with MySQL 5.0.3, InnoDB
tables
use a more compact storage format. In earlier versions of
MySQL, InnoDB
rows contain some redundant
information, such as the number of columns and the length of
each column, even for fixed-size columns. By default, tables
are created in the compact format
(ROW_FORMAT=COMPACT
). If you wish to
downgrade to older versions of MySQL, you can request the
old format with ROW_FORMAT=REDUNDANT
.
The compact InnoDB
format also changes
how CHAR
columns containing UTF-8 data
are stored. With ROW_FORMAT=REDUNDANT
, a
UTF-8 CHAR(
occupies 3 × N
)N
bytes, given
that the maximum length of a UTF-8 encoded character is
three bytes. Many languages can be written primarily using
single-byte UTF-8 characters, so a fixed storage length
often wastes space. With
ROW_FORMAT=COMPACT
format,
InnoDB
allocates a variable amount of
storage in the range from N
to 3
× N
bytes for these columns
by stripping trailing spaces if necessary. The minimum
storage length is kept as N
bytes
to facilitate in-place updates in typical cases.
The primary index of a table should be as short as possible. This makes identification of each row easy and efficient.
Create only the indexes that you really need. Indexes are good for retrieval but bad when you need to store data quickly. If you access a table mostly by searching on a combination of columns, create an index on them. The first part of the index should be the column most used. If you always use many columns when selecting from the table, you should use the column with more duplicates first to obtain better compression of the index.
If it is very likely that a string column has a unique
prefix on the first number of characters, it's better to
index only this prefix, using MySQL's support for creating
an index on the leftmost part of the column (see
Section 13.1.4, “CREATE INDEX
Syntax”). Shorter indexes are faster,
not only because they require less disk space, but because
they give also you more hits in the index cache, and thus
fewer disk seeks. See Section 7.5.2, “Tuning Server Parameters”.
In some circumstances, it can be beneficial to split into two a table that is scanned very often. This is especially true if it is a dynamic-format table and it is possible to use a smaller static format table that can be used to find the relevant rows when scanning the table.
All MySQL data types can be indexed. Use of indexes on the
relevant columns is the best way to improve the performance of
SELECT
operations.
The maximum number of indexes per table and the maximum index length is defined per storage engine. See Chapter 14, Storage Engines and Table Types. All storage engines support at least 16 indexes per table and a total index length of at least 256 bytes. Most storage engines have higher limits.
With
syntax in an index specification, you can create an index that
uses only the first col_name
(N
)N
characters of a
string column. Indexing only a prefix of column values in this
way can make the index file much smaller. When you index a
BLOB
or TEXT
column, you
must specify a prefix length for the index.
For example:
CREATE TABLE test (blob_col BLOB, INDEX(blob_col(10)));
Prefixes can be up to 1000 bytes long (767 bytes for
InnoDB
tables). Note that prefix limits are
measured in bytes, whereas the prefix length in CREATE
TABLE
statements is interpreted as number of
characters. Be sure to take this into account when
specifying a prefix length for a column that uses a multi-byte
character set.
You can also create FULLTEXT
indexes. These
are used for full-text searches. Only the
MyISAM
storage engine supports
FULLTEXT
indexes and only for
CHAR
, VARCHAR
, and
TEXT
columns. Indexing always takes place
over the entire column and partial (column prefix) indexing is
not supported. For details, see
Section 12.7, “Full-Text Search Functions”.
You can also create indexes on spatial data types. Currently,
only MyISAM
supports R-tree indexes on
spatial types. As of MySQL 5.0.16, other storage engines use
B-trees for indexing spatial types (except for
ARCHIVE
and NDBCLUSTER
,
which do not support spatial type indexing).
The MEMORY
storage engine uses
HASH
indexes by default, but also supports
BTREE
indexes.
MySQL can create composite indexes (that is, indexes on multiple columns). An index may consist of up to 15 columns. For certain data types, you can index a prefix of the column (see Section 7.4.3, “Column Indexes”).
A multiple-column index can be considered a sorted array containing values that are created by concatenating the values of the indexed columns.
MySQL uses multiple-column indexes in such a way that queries
are fast when you specify a known quantity for the first column
of the index in a WHERE
clause, even if you
do not specify values for the other columns.
Suppose that a table has the following specification:
CREATE TABLE test ( id INT NOT NULL, last_name CHAR(30) NOT NULL, first_name CHAR(30) NOT NULL, PRIMARY KEY (id), INDEX name (last_name,first_name) );
The name
index is an index over the
last_name
and first_name
columns. The index can be used for queries that specify values
in a known range for last_name
, or for both
last_name
and first_name
.
Therefore, the name
index is used in the
following queries:
SELECT * FROM test WHERE last_name='Widenius'; SELECT * FROM test WHERE last_name='Widenius' AND first_name='Michael'; SELECT * FROM test WHERE last_name='Widenius' AND (first_name='Michael' OR first_name='Monty'); SELECT * FROM test WHERE last_name='Widenius' AND first_name >='M' AND first_name < 'N';
However, the name
index is
not used in the following queries:
SELECT * FROM test WHERE first_name='Michael'; SELECT * FROM test WHERE last_name='Widenius' OR first_name='Michael';
The manner in which MySQL uses indexes to improve query performance is discussed further in Section 7.4.5, “How MySQL Uses Indexes”.
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. If a table has 1,000 rows, this is at least 100 times faster than reading sequentially. If you need to access most of the rows, it is faster to read sequentially, because this minimizes disk seeks.
Most MySQL indexes (PRIMARY KEY
,
UNIQUE
, INDEX
, and
FULLTEXT
) are stored in B-trees. Exceptions
are that indexes on spatial data types use R-trees, and that
MEMORY
tables also support hash indexes.
Strings are automatically prefix- and end-space compressed. See
Section 13.1.4, “CREATE INDEX
Syntax”.
In general, indexes are used as described in the following
discussion. Characteristics specific to hash indexes (as used in
MEMORY
tables) are described at the end of
this section.
MySQL uses indexes for these operations:
To find the rows matching a WHERE
clause
quickly.
To eliminate rows from consideration. If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows.
To retrieve rows from other tables when performing joins.
To find the MIN()
or
MAX()
value for a specific indexed column
key_col
. This is optimized by a
preprocessor that checks whether you are using
WHERE
on all key
parts that occur before key_part_N
=
constant
key_col
in the index. In this case, MySQL does a single key lookup
for each MIN()
or
MAX()
expression and replaces it with a
constant. If all expressions are replaced with constants,
the query returns at once. For example:
SELECT MIN(key_part2
),MAX(key_part2
) FROMtbl_name
WHEREkey_part1
=10;
To sort or group a table if the sorting or grouping is done
on a leftmost prefix of a usable key (for example,
ORDER BY
). If all key
parts are followed by key_part1
,
key_part2
DESC
, the key is
read in reverse order. See
Section 7.2.12, “ORDER BY
Optimization”.
In some cases, a query can be optimized to retrieve values without consulting the data rows. If a query uses only columns from a table that are numeric and that form a leftmost prefix for some key, the selected values may be retrieved from the index tree for greater speed:
SELECTkey_part3
FROMtbl_name
WHEREkey_part1
=1
Suppose that you issue the following SELECT
statement:
mysql> SELECT * FROM tbl_name
WHERE col1=val1
AND col2=val2
;
If a multiple-column index exists on col1
and
col2
, the appropriate rows can be fetched
directly. If separate single-column indexes exist on
col1
and col2
, the
optimizer tries to find the most restrictive index by deciding
which index finds fewer rows and using that index to fetch the
rows.
If the table has a multiple-column index, any leftmost prefix of
the index can be used by the optimizer to find rows. For
example, if you have a three-column index on (col1,
col2, col3)
, you have indexed search capabilities on
(col1)
, (col1, col2)
, and
(col1, col2, col3)
.
MySQL cannot use a partial index if the columns do not form a
leftmost prefix of the index. Suppose that you have the
SELECT
statements shown here:
SELECT * FROMtbl_name
WHERE col1=val1
; SELECT * FROMtbl_name
WHERE col1=val1
AND col2=val2
; SELECT * FROMtbl_name
WHERE col2=val2
; SELECT * FROMtbl_name
WHERE col2=val2
AND col3=val3
;
If an index exists on (col1, col2, col3)
,
only the first two queries use the index. The third and fourth
queries do involve indexed columns, but
(col2)
and (col2, col3)
are not leftmost prefixes of (col1, col2,
col3)
.
A B-tree index can be used for column comparisons in expressions
that use the =
, >
,
>=
, <
,
<=
, or BETWEEN
operators. The index also can be used for
LIKE
comparisons if the argument to
LIKE
is a constant string that does not start
with a wildcard character. For example, the following
SELECT
statements use indexes:
SELECT * FROMtbl_name
WHEREkey_col
LIKE 'Patrick%'; SELECT * FROMtbl_name
WHEREkey_col
LIKE 'Pat%_ck%';
In the first statement, only rows with 'Patrick' <=
are
considered. In the second statement, only rows with
key_col
< 'Patricl''Pat' <=
are considered.
key_col
<
'Pau'
The following SELECT
statements do not use
indexes:
SELECT * FROMtbl_name
WHEREkey_col
LIKE '%Patrick%'; SELECT * FROMtbl_name
WHEREkey_col
LIKEother_col
;
In the first statement, the LIKE
value begins
with a wildcard character. In the second statement, the
LIKE
value is not a constant.
If you use ... LIKE
'%
and
string
%'string
is longer than three
characters, MySQL uses the Turbo Boyer-Moore
algorithm to initialize the pattern for the string
and then uses this pattern to perform the search more quickly.
A search using
employs indexes if
col_name
IS
NULLcol_name
is indexed.
Any index that does not span all AND
levels
in the WHERE
clause is not used to optimize
the query. In other words, to be able to use an index, a prefix
of the index must be used in every AND
group.
The following WHERE
clauses use indexes:
... WHEREindex_part1
=1 ANDindex_part2
=2 ANDother_column
=3 /*index
= 1 ORindex
= 2 */ ... WHEREindex
=1 OR A=10 ANDindex
=2 /* optimized like "index_part1
='hello'" */ ... WHEREindex_part1
='hello' ANDindex_part3
=5 /* Can use index onindex1
but not onindex2
orindex3
*/ ... WHEREindex1
=1 ANDindex2
=2 ORindex1
=3 ANDindex3
=3;
These WHERE
clauses do
not use indexes:
/*index_part1
is not used */ ... WHEREindex_part2
=1 ANDindex_part3
=2 /* Index is not used in both parts of the WHERE clause */ ... WHEREindex
=1 OR A=10 /* No index spans all rows */ ... WHEREindex_part1
=1 ORindex_part2
=10
Sometimes MySQL does not use an index, even if one is available.
One circumstance under which this occurs is when the optimizer
estimates that using the index would require MySQL to access a
very large percentage of the rows in the table. (In this case, a
table scan is likely to be much faster because it requires fewer
seeks.) However, if such a query uses LIMIT
to retrieve only some of the rows, MySQL uses an index anyway,
because it can much more quickly find the few rows to return in
the result.
Hash indexes have somewhat different characteristics from those just discussed:
They are used only for equality comparisons that use the
=
or <=>
operators (but are very fast). They are
not used for comparison operators such as
<
that find a range of values.
The optimizer cannot use a hash index to speed up
ORDER BY
operations. (This type of index
cannot be used to search for the next entry in order.)
MySQL cannot determine approximately how many rows there are
between two values (this is used by the range optimizer to
decide which index to use). This may affect some queries if
you change a MyISAM
table to a
hash-indexed MEMORY
table.
Only whole keys can be used to search for a row. (With a B-tree index, any leftmost prefix of the key can be used to find rows.)
To minimize disk I/O, the MyISAM
storage
engine exploits a strategy that is used by many database
management systems. It employs a cache mechanism to keep the
most frequently accessed table blocks in memory:
For index blocks, a special structure called the key cache (or key buffer) is maintained. The structure contains a number of block buffers where the most-used index blocks are placed.
For data blocks, MySQL uses no special cache. Instead it relies on the native operating system filesystem cache.
This section first describes the basic operation of the
MyISAM
key cache. Then it discusses features
that improve key cache performance and that enable you to better
control cache operation:
Access to the key cache no longer is serialized among threads. Multiple threads can access the cache concurrently.
You can set up multiple key caches and assign table indexes to specific caches.
To control the size of the key cache, use the
key_buffer_size
system variable. If this
variable is set equal to zero, no key cache is used. The key
cache also is not used if the key_buffer_size
value is too small to allocate the minimal number of block
buffers (8).
When the key cache is not operational, index files are accessed using only the native filesystem buffering provided by the operating system. (In other words, table index blocks are accessed using the same strategy as that employed for table data blocks.)
An index block is a contiguous unit of access to the
MyISAM
index files. Usually the size of an
index block is equal to the size of nodes of the index B-tree.
(Indexes are represented on disk using a B-tree data structure.
Nodes at the bottom of the tree are leaf nodes. Nodes above the
leaf nodes are non-leaf nodes.)
All block buffers in a key cache structure are the same size. This size can be equal to, greater than, or less than the size of a table index block. Usually one these two values is a multiple of the other.
When data from any table index block must be accessed, the server first checks whether it is available in some block buffer of the key cache. If it is, the server accesses data in the key cache rather than on disk. That is, it reads from the cache or writes into it rather than reading from or writing to disk. Otherwise, the server chooses a cache block buffer containing a different table index block (or blocks) and replaces the data there by a copy of required table index block. As soon as the new index block is in the cache, the index data can be accessed.
If it happens that a block selected for replacement has been modified, the block is considered “dirty.” In this case, prior to being replaced, its contents are flushed to the table index from which it came.
Usually the server follows an LRU (Least Recently Used) strategy: When choosing a block for replacement, it selects the least recently used index block. To make this choice easier, the key cache module maintains a special queue (LRU chain) of all used blocks. When a block is accessed, it is placed at the end of the queue. When blocks need to be replaced, blocks at the beginning of the queue are the least recently used and become the first candidates for eviction.
Threads can access key cache buffers simultaneously, subject to the following conditions:
A buffer that is not being updated can be accessed by multiple threads.
A buffer that is being updated causes threads that need to use it to wait until the update is complete.
Multiple threads can initiate requests that result in cache block replacements, as long as they do not interfere with each other (that is, as long as they need different index blocks, and thus cause different cache blocks to be replaced).
Shared access to the key cache enables the server to improve throughput significantly.
Shared access to the key cache improves performance but does not eliminate contention among threads entirely. They still compete for control structures that manage access to the key cache buffers. To reduce key cache access contention further, MySQL also provides multiple key caches. This feature enables you to assign different table indexes to different key caches.
Where there are multiple key caches, the server must know
which cache to use when processing queries for a given
MyISAM
table. By default, all
MyISAM
table indexes are cached in the
default key cache. To assign table indexes to a specific key
cache, use the CACHE INDEX
statement (see
Section 13.5.5.1, “CACHE INDEX
Syntax”). For example, the following
statement assigns indexes from the tables
t1
, t2
, and
t3
to the key cache named
hot_cache
:
mysql> CACHE INDEX t1, t2, t3 IN hot_cache;
+---------+--------------------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+--------------------+----------+----------+
| test.t1 | assign_to_keycache | status | OK |
| test.t2 | assign_to_keycache | status | OK |
| test.t3 | assign_to_keycache | status | OK |
+---------+--------------------+----------+----------+
The key cache referred to in a CACHE INDEX
statement can be created by setting its size with a
SET GLOBAL
parameter setting statement or
by using server startup options. For example:
mysql> SET GLOBAL keycache1.key_buffer_size=128*1024;
To destroy a key cache, set its size to zero:
mysql> SET GLOBAL keycache1.key_buffer_size=0;
Note that you cannot destroy the default key cache. Any attempt to do this will be ignored:
mysql>SET GLOBAL key_buffer_size = 0;
mysql>SHOW VARIABLES LIKE 'key_buffer_size';
+-----------------+---------+ | Variable_name | Value | +-----------------+---------+ | key_buffer_size | 8384512 | +-----------------+---------+
Key cache variables are structured system variables that have
a name and components. For
keycache1.key_buffer_size
,
keycache1
is the cache variable name and
key_buffer_size
is the cache component. See
Section 5.2.3.1, “Structured System Variables”, for a
description of the syntax used for referring to structured key
cache system variables.
By default, table indexes are assigned to the main (default) key cache created at the server startup. When a key cache is destroyed, all indexes assigned to it are reassigned to the default key cache.
For a busy server, we recommend a strategy that uses three key caches:
A “hot” key cache that takes up 20% of the space allocated for all key caches. Use this for tables that are heavily used for searches but that are not updated.
A “cold” key cache that takes up 20% of the space allocated for all key caches. Use this cache for medium-sized, intensively modified tables, such as temporary tables.
A “warm” key cache that takes up 60% of the key cache space. Employ this as the default key cache, to be used by default for all other tables.
One reason the use of three key caches is beneficial is that access to one key cache structure does not block access to the others. Statements that access tables assigned to one cache do not compete with statements that access tables assigned to another cache. Performance gains occur for other reasons as well:
The hot cache is used only for retrieval queries, so its contents are never modified. Consequently, whenever an index block needs to be pulled in from disk, the contents of the cache block chosen for replacement need not be flushed first.
For an index assigned to the hot cache, if there are no queries requiring an index scan, there is a high probability that the index blocks corresponding to non-leaf nodes of the index B-tree remain in the cache.
An update operation most frequently executed for temporary tables is performed much faster when the updated node is in the cache and need not be read in from disk first. If the size of the indexes of the temporary tables are comparable with the size of cold key cache, the probability is very high that the updated node is in the cache.
CACHE INDEX
sets up an association between
a table and a key cache, but the association is lost each time
the server restarts. If you want the association to take
effect each time the server starts, one way to accomplish this
is to use an option file: Include variable settings that
configure your key caches, and an init-file
option that names a file containing CACHE
INDEX
statements to be executed. For example:
key_buffer_size = 4G hot_cache.key_buffer_size = 2G cold_cache.key_buffer_size = 2G init_file=/path
/to
/data-directory
/mysqld_init.sql
The statements in mysqld_init.sql
are
executed each time the server starts. The file should contain
one SQL statement per line. The following example assigns
several tables each to hot_cache
and
cold_cache
:
CACHE INDEX db1.t1, db1.t2, db2.t3 IN hot_cache CACHE INDEX db1.t4, db2.t5, db2.t6 IN cold_cache
By default, the key cache management system uses the LRU strategy for choosing key cache blocks to be evicted, but it also supports a more sophisticated method called the midpoint insertion strategy.
When using the midpoint insertion strategy, the LRU chain is
divided into two parts: a hot sub-chain and a warm sub-chain.
The division point between two parts is not fixed, but the key
cache management system takes care that the warm part is not
“too short,” always containing at least
key_cache_division_limit
percent of the key
cache blocks. key_cache_division_limit
is a
component of structured key cache variables, so its value is a
parameter that can be set per cache.
When an index block is read from a table into the key cache, it is placed at the end of the warm sub-chain. After a certain number of hits (accesses of the block), it is promoted to the hot sub-chain. At present, the number of hits required to promote a block (3) is the same for all index blocks.
A block promoted into the hot sub-chain is placed at the end
of the chain. The block then circulates within this sub-chain.
If the block stays at the beginning of the sub-chain for a
long enough time, it is demoted to the warm chain. This time
is determined by the value of the
key_cache_age_threshold
component of the
key cache.
The threshold value prescribes that, for a key cache
containing N
blocks, the block at
the beginning of the hot sub-chain not accessed within the
last
hits is to be moved to
the beginning of the warm sub-chain. It then becomes the first
candidate for eviction, because blocks for replacement always
are taken from the beginning of the warm sub-chain.
N
×
key_cache_age_threshold / 100
The midpoint insertion strategy allows you to keep more-valued
blocks always in the cache. If you prefer to use the plain LRU
strategy, leave the
key_cache_division_limit
value set to its
default of 100.
The midpoint insertion strategy helps to improve performance
when execution of a query that requires an index scan
effectively pushes out of the cache all the index blocks
corresponding to valuable high-level B-tree nodes. To avoid
this, you must use a midpoint insertion strategy with the
key_cache_division_limit
set to much less
than 100. Then valuable frequently hit nodes are preserved in
the hot sub-chain during an index scan operation as well.
If there are enough blocks in a key cache to hold blocks of an entire index, or at least the blocks corresponding to its non-leaf nodes, it makes sense to preload the key cache with index blocks before starting to use it. Preloading allows you to put the table index blocks into a key cache buffer in the most efficient way: by reading the index blocks from disk sequentially.
Without preloading, the blocks are still placed into the key cache as needed by queries. Although the blocks will stay in the cache, because there are enough buffers for all of them, they are fetched from disk in random order, and not sequentially.
To preload an index into a cache, use the LOAD INDEX
INTO CACHE
statement. For example, the following
statement preloads nodes (index blocks) of indexes of the
tables t1
and t2
:
mysql> LOAD INDEX INTO CACHE t1, t2 IGNORE LEAVES;
+---------+--------------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+--------------+----------+----------+
| test.t1 | preload_keys | status | OK |
| test.t2 | preload_keys | status | OK |
+---------+--------------+----------+----------+
The IGNORE LEAVES
modifier causes only
blocks for the non-leaf nodes of the index to be preloaded.
Thus, the statement shown preloads all index blocks from
t1
, but only blocks for the non-leaf nodes
from t2
.
If an index has been assigned to a key cache using a
CACHE INDEX
statement, preloading places
index blocks into that cache. Otherwise, the index is loaded
into the default key cache.
It is possible to specify the size of the block buffers for an
individual key cache using the
key_cache_block_size
variable. This permits
tuning of the performance of I/O operations for index files.
The best performance for I/O operations is achieved when the size of read buffers is equal to the size of the native operating system I/O buffers. But setting the size of key nodes equal to the size of the I/O buffer does not always ensure the best overall performance. When reading the big leaf nodes, the server pulls in a lot of unnecessary data, effectively preventing reading other leaf nodes.
Currently, you cannot control the size of the index blocks in
a table. This size is set by the server when the
.MYI
index file is created, depending on
the size of the keys in the indexes present in the table
definition. In most cases, it is set equal to the I/O buffer
size.
A key cache can be restructured at any time by updating its parameter values. For example:
mysql> SET GLOBAL cold_cache.key_buffer_size=4*1024*1024;
If you assign to either the key_buffer_size
or key_cache_block_size
key cache component
a value that differs from the component's current value, the
server destroys the cache's old structure and creates a new
one based on the new values. If the cache contains any dirty
blocks, the server saves them to disk before destroying and
re-creating the cache. Restructuring does not occur if you
change other key cache parameters.
When restructuring a key cache, the server first flushes the contents of any dirty buffers to disk. After that, the cache contents become unavailable. However, restructuring does not block queries that need to use indexes assigned to the cache. Instead, the server directly accesses the table indexes using native filesystem caching. Filesystem caching is not as efficient as using a key cache, so although queries execute, a slowdown can be anticipated. After the cache has been restructured, it becomes available again for caching indexes assigned to it, and the use of filesystem caching for the indexes ceases.
Storage engines collect statistics about tables for use by the optimizer. Table statistics are based on value groups, where a value group is a set of rows with the same key prefix value. For optimizer purposes, an important statistic is the average value group size.
MySQL uses the average value group size in the following ways:
To estimate how may rows must be read for each
ref
access
To estimate how many row a partial join will produce; that is, the number of rows that an operation of this form will produce:
(...) JOINtbl_name
ONtbl_name
.key
=expr
As the average value group size for an index increases, the index is less useful for those two purposes because the average number of rows per lookup increases: For the index to be good for optimization purposes, it is best that each index value target a small number of rows in the table. When a given index value yields a large number of rows, the index is less useful and MySQL is less likely to use it.
The average value group size is related to table cardinality,
which is the number of value groups. The SHOW
INDEX
statement displays a cardinality value based on
N
/S
, where
N
is the number of rows in the table
and S
is the average value group
size. That ratio yields an approximate number of value groups in
the table.
For a join based on the <=>
comparison
operator, NULL
is not treated differently
from any other value: NULL <=> NULL
,
just as
for any other
N
<=>
N
N
.
However, for a join based on the =
operator,
NULL
is different from
non-NULL
values:
is not true when
expr1
=
expr2
expr1
or
expr2
(or both) are
NULL
. This affects ref
accesses for comparisons of the form
: MySQL will not access
the table if the current value of
tbl_name.key
=
expr
expr
is NULL
,
because the comparison cannot be true.
For =
comparisons, it does not matter how
many NULL
values are in the table. For
optimization purposes, the relevant value is the average size of
the non-NULL
value groups. However, MySQL
does not currently allow that average size to be collected or
used.
For MyISAM
tables, you have some control over
collection of table statistics by means of the
myisam_stats_method
system variable. This
variable has two possible values, which differ as follows:
When myisam_stats_method
is
nulls_equal
, all NULL
values are treated as identical (that is, they all form a
single value group).
If the NULL
value group size is much
higher than the average non-NULL
value
group size, this method skews the average value group size
upward. This makes index appear to the optimizer to be less
useful than it really is for joins that look for
non-NULL
values. Consequently, the
nulls_equal
method may cause the
optimizer not to use the index for ref
accesses when it should.
When myisam_stats_method
is
nulls_unequal
, NULL
values are not considered the same. Instead, each
NULL
value forms a separate value group
of size 1.
If you have many NULL
values, this method
skews the average value group size downward. If the average
non-NULL
value group size is large,
counting NULL
values each as a group of
size 1 causes the optimizer to overestimate the value of the
index for joins that look for non-NULL
values. Consequently, the nulls_unequal
method may cause the optimizer to use this index for
ref
lookups when other methods may be
better.
If you tend to use many joins that use
<=>
rather than =
,
NULL
values are not special in comparisons
and one NULL
is equal to another. In this
case, nulls_equal
is the appropriate
statistics method.
The myisam_stats_method
system variable has
global and session values. Setting the global value affects
MyISAM
statistics collection for all
MyISAM
tables. Setting the session value
affects statistics collection only for the current client
connection. This means that you can force a table's statistics
to be regenerated with a given method without affecting other
clients by setting the session value of
myisam_stats_method
.
To regenerate table statistics, you can use any of the following methods:
Set myisam_stats_method
, and then issue a
CHECK TABLE
statement
Execute myisamchk
--stats_method=method_name
--analyze
Change the table to cause its statistics to go out of date
(for example, insert a row and then delete it), and then set
myisam_stats_method
and issue an
ANALYZE TABLE
statement
Some caveats regarding the use of
myisam_stats_method
:
You can force table statistics to be collected explicitly,
as just described. However, MySQL may also collect
statistics automatically. For example, if during the course
of executing statements for a table, some of those
statements modify the table, MySQL may collect statistics.
(This may occur for bulk inserts or deletes, or some
ALTER TABLE
statements, for example.) If
this happens, the statistics are collected using whatever
value myisam_stats_method
has at the
time. Thus, if you collect statistics using one method, but
myisam_stats_method
is set to the other
method when a table's statistics are collected automatically
later, the other method will be used.
There is no way to tell which method was used to generate
statistics for a given MyISAM
table.
myisam_stats_method
applies only to
MyISAM
tables. Other storage engines have
only one method for collecting table statistics. Usually it
is closer to the nulls_equal
method.
When you execute a mysqladmin status command, you should see something like this:
Uptime: 426 Running threads: 1 Questions: 11082 Reloads: 1 Open tables: 12
The Open tables
value of 12 can be somewhat
puzzling if you have only six tables.
MySQL is multi-threaded, so there may be many clients issuing
queries for a given table simultaneously. To minimize the
problem with multiple client threads having different states on
the same table, the table is opened independently by each
concurrent thread. This uses additional memory but normally
increases performance. With MyISAM
tables,
one extra file descriptor is required for the data file for each
client that has the table open. (By contrast, the index file
descriptor is shared between all threads.)
The table_cache
,
max_connections
, and
max_tmp_tables
system variables affect the
maximum number of files the server keeps open. If you increase
one or more of these values, you may run up against a limit
imposed by your operating system on the per-process number of
open file descriptors. Many operating systems allow you to
increase the open-files limit, although the method varies widely
from system to system. Consult your operating system
documentation to determine whether it is possible to increase
the limit and how to do so.
table_cache
is related to
max_connections
. For example, for 200
concurrent running connections, you should have a table cache
size of at least 200 ×
, where
N
N
is the maximum number of tables per
join in any of the queries which you execute. You must also
reserve some extra file descriptors for temporary tables and
files.
Make sure that your operating system can handle the number of
open file descriptors implied by the
table_cache
setting. If
table_cache
is set too high, MySQL may run
out of file descriptors and refuse connections, fail to perform
queries, and be very unreliable. You also have to take into
account that the MyISAM
storage engine needs
two file descriptors for each unique open table. You can
increase the number of file descriptors available to MySQL using
the --open-files-limit
startup option to
mysqld. See
Section A.2.17, “File Not Found”.
The cache of open tables is kept at a level of
table_cache
entries. The default value is 64;
this can be changed with the --table_cache
option to mysqld. Note that MySQL may
temporarily open more tables than this to execute queries.
MySQL closes an unused table and removes it from the table cache under the following circumstances:
When the cache is full and a thread tries to open a table that is not in the cache.
When the cache contains more than
table_cache
entries and a table in the
cache is no longer being used by any threads.
When a table flushing operation occurs. This happens when
someone issues a FLUSH TABLES
statement
or executes a mysqladmin flush-tables or
mysqladmin refresh command.
When the table cache fills up, the server uses the following procedure to locate a cache entry to use:
Tables that are not currently in use are released, beginning with the table least recently used.
If a new table needs to be opened, but the cache is full and no tables can be released, the cache is temporarily extended as necessary.
When the cache is in a temporarily extended state and a table goes from a used to unused state, the table is closed and released from the cache.
A table is opened for each concurrent access. This means the
table needs to be opened twice if two threads access the same
table or if a thread accesses the table twice in the same query
(for example, by joining the table to itself). Each concurrent
open requires an entry in the table cache. The first open of any
MyISAM
table takes two file descriptors: one
for the data file and one for the index file. Each additional
use of the table takes only one file descriptor for the data
file. The index file descriptor is shared among all threads.
If you are opening a table with the HANDLER
statement, a
dedicated table object is allocated for the thread. This table
object is not shared by other threads and is not closed until
the thread calls tbl_name
OPENHANDLER
or the
thread terminates. When this happens, the table is put back in
the table cache (if the cache is not full). See
Section 13.2.3, “tbl_name
CLOSEHANDLER
Syntax”.
You can determine whether your table cache is too small by
checking the mysqld status variable
Opened_tables
:
mysql> SHOW STATUS LIKE 'Opened_tables';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Opened_tables | 2741 |
+---------------+-------+
If the value is very large, even when you have not issued many
FLUSH TABLES
statements, you should increase
the table cache size. See
Section 5.2.2, “Server System Variables”, and
Section 5.2.4, “Server Status Variables”.
If you have many MyISAM
tables in the same
database directory, open, close, and create operations are slow.
If you execute SELECT
statements on many
different tables, there is a little overhead when the table
cache is full, because for every table that has to be opened,
another must be closed. You can reduce this overhead by making
the table cache larger.