This section describes group (aggregate) functions that operate
on sets of values. Unless otherwise stated, group functions
ignore NULL values.
If you use a group function in a statement containing no
GROUP BY clause, it is equivalent to grouping
on all rows.
The SUM() and AVG()
aggregate functions do not work with temporal values. (They
convert the values to numbers, which loses the part after the
first non-numeric character.) To work around this problem, you
can convert to numeric units, perform the aggregate operation,
and convert back to a temporal value. Examples:
SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(time_col))) FROMtbl_name; SELECT FROM_DAYS(SUM(TO_DAYS(date_col))) FROMtbl_name;
Returns the average value of
. The
exprDISTINCT option can be used as of MySQL
5.0.3 to return the average of the distinct values of
expr.
AVG() returns NULL if
there were no matching rows.
mysql>SELECT student_name, AVG(test_score)->FROM student->GROUP BY student_name;
Returns the bitwise AND of all bits in
expr. The calculation is
performed with 64-bit (BIGINT) precision.
This function returns
18446744073709551615 if there were no
matching rows. (This is the value of an unsigned
BIGINT value with all bits set to 1.)
Returns the bitwise OR of all bits in
expr. The calculation is
performed with 64-bit (BIGINT) precision.
This function returns 0 if there were no
matching rows.
Returns the bitwise XOR of all bits in
expr. The calculation is
performed with 64-bit (BIGINT) precision.
This function returns 0 if there were no
matching rows.
Returns a count of the number of non-NULL
values in the rows retrieved by a SELECT
statement. The result is a BIGINT value.
COUNT() returns 0 if
there were no matching rows.
mysql>SELECT student.student_name,COUNT(*)->FROM student,course->WHERE student.student_id=course.student_id->GROUP BY student_name;
COUNT(*) is somewhat different in that it
returns a count of the number of rows retrieved, whether or
not they contain NULL values.
COUNT(*) is optimized to return very
quickly if the SELECT retrieves from one
table, no other columns are retrieved, and there is no
WHERE clause. For example:
mysql> SELECT COUNT(*) FROM student;
This optimization applies only to MyISAM
tables only, because an exact row count is stored for this
storage engine and can be accessed very quickly. For
transactional storage engines such as
InnoDB and BDB,
storing an exact row count is more problematic because
multiple transactions may be occurring, each of which may
affect the count.
COUNT(DISTINCT
expr,[expr...])
Returns a count of the number of different
non-NULL values.
COUNT(DISTINCT) returns
0 if there were no matching rows.
mysql> SELECT COUNT(DISTINCT results) FROM student;
In MySQL, you can obtain the number of distinct expression
combinations that do not contain NULL by
giving a list of expressions. In standard SQL, you would
have to do a concatenation of all expressions inside
COUNT(DISTINCT ...).
This function returns a string result with the concatenated
non-NULL values from a group. It returns
NULL if there are no
non-NULL values. The full syntax is as
follows:
GROUP_CONCAT([DISTINCT]expr[,expr...] [ORDER BY {unsigned_integer|col_name|expr} [ASC | DESC] [,col_name...]] [SEPARATORstr_val])
mysql>SELECT student_name,->GROUP_CONCAT(test_score)->FROM student->GROUP BY student_name;
Or:
mysql>SELECT student_name,->GROUP_CONCAT(DISTINCT test_score->ORDER BY test_score DESC SEPARATOR ' ')->FROM student->GROUP BY student_name;
In MySQL, you can get the concatenated values of expression
combinations. You can eliminate duplicate values by using
DISTINCT. If you want to sort values in
the result, you should use ORDER BY
clause. To sort in reverse order, add the
DESC (descending) keyword to the name of
the column you are sorting by in the ORDER
BY clause. The default is ascending order; this
may be specified explicitly using the ASC
keyword. SEPARATOR is followed by the
string value that should be inserted between values of
result. The default is a comma
(‘,’). You can eliminate the
separator altogether by specifying SEPARATOR
''.
You can set a maximum allowed length with the
group_concat_max_len system variable.
(The default value is 1024.) The syntax to do this at
runtime is as follows, where val
is an unsigned integer:
SET [SESSION | GLOBAL] group_concat_max_len = val;
If a maximum length has been set, the result is truncated to this maximum length.
Beginning with MySQL 5.0.19, the type returned by
GROUP_CONCAT() is always
VARCHAR unless
group_concat_max_len is greater than 512,
in which case, it returns a BLOB.
(Previously, it returned a BLOB with
group_concat_max_len greater than 512
only if the query included an ORDER BY
clause.)
See also CONCAT() and
CONCAT_WS():
Section 12.3, “String Functions”.
MIN([DISTINCT]
,
expr)MAX([DISTINCT]
expr)
Returns the minimum or maximum value of
expr. MIN()
and MAX() may take a string argument; in
such cases they return the minimum or maximum string value.
See Section 7.4.5, “How MySQL Uses Indexes”. The
DISTINCT keyword can be used to find the
minimum or maximum of the distinct values of
expr, however, this produces the
same result as omitting DISTINCT.
MIN() and MAX() return
NULL if there were no matching rows.
mysql>SELECT student_name, MIN(test_score), MAX(test_score)->FROM student->GROUP BY student_name;
For MIN(), MAX(), and
other aggregate functions, MySQL currently compares
ENUM and SET columns
by their string value rather than by the string's relative
position in the set. This differs from how ORDER
BY compares them. This is expected to be rectified
in a future MySQL release.
Returns the population standard deviation of
expr. This is an extension to
standard SQL. The STDDEV() form of this
function is provided for compatibility with Oracle. As of
MySQL 5.0.3, the standard SQL function
STDDEV_POP() can be used instead.
These functions return NULL if there were
no matching rows.
Returns the population standard deviation of
expr (the square root of
VAR_POP()). This function was added in
MySQL 5.0.3. Before 5.0.3, you can use
STD() or STDDEV(),
which are equivalent but not standard SQL.
STDDEV_POP() returns
NULL if there were no matching rows.
Returns the sample standard deviation of
expr (the square root of
VAR_SAMP(). This function was added in
MySQL 5.0.3.
STDDEV_SAMP() returns
NULL if there were no matching rows.
Returns the sum of expr. If the
return set has no rows, SUM() returns
NULL. The DISTINCT
keyword can be used in MySQL 5.0 to sum only
the distinct values of expr.
SUM() returns NULL if
there were no matching rows.
Returns the population standard variance of
expr. It considers rows as the
whole population, not as a sample, so it has the number of
rows as the denominator. This function was added in MySQL
5.0.3. Before 5.0.3, you can use
VARIANCE(), which is equivalent but is
not standard SQL.
VAR_POP() returns NULL
if there were no matching rows.
Returns the sample variance of
expr. That is, the denominator is
the number of rows minus one. This function was added in
MySQL 5.0.3.
VAR_SAMP() returns
NULL if there were no matching rows.
Returns the population standard variance of
expr. This is an extension to
standard SQL. As of MySQL 5.0.3, the standard SQL function
VAR_POP() can be used instead.
VARIANCE() returns
NULL if there were no matching rows.
The GROUP BY clause allows a WITH
ROLLUP modifier that causes extra rows to be added to
the summary output. These rows represent higher-level (or
super-aggregate) summary operations. ROLLUP
thus allows you to answer questions at multiple levels of
analysis with a single query. It can be used, for example, to
provide support for OLAP (Online Analytical Processing)
operations.
Suppose that a table named sales has
year, country,
product, and profit
columns for recording sales profitability:
CREATE TABLE sales
(
year INT NOT NULL,
country VARCHAR(20) NOT NULL,
product VARCHAR(32) NOT NULL,
profit INT
);
The table's contents can be summarized per year with a simple
GROUP BY like this:
mysql> SELECT year, SUM(profit) FROM sales GROUP BY year;
+------+-------------+
| year | SUM(profit) |
+------+-------------+
| 2000 | 4525 |
| 2001 | 3010 |
+------+-------------+
This output shows the total profit for each year, but if you also want to determine the total profit summed over all years, you must add up the individual values yourself or run an additional query.
Or you can use ROLLUP, which provides both
levels of analysis with a single query. Adding a WITH
ROLLUP modifier to the GROUP BY
clause causes the query to produce another row that shows the
grand total over all year values:
mysql> SELECT year, SUM(profit) FROM sales GROUP BY year WITH ROLLUP;
+------+-------------+
| year | SUM(profit) |
+------+-------------+
| 2000 | 4525 |
| 2001 | 3010 |
| NULL | 7535 |
+------+-------------+
The grand total super-aggregate line is identified by the value
NULL in the year column.
ROLLUP has a more complex effect when there
are multiple GROUP BY columns. In this case,
each time there is a “break” (change in value) in
any but the last grouping column, the query produces an extra
super-aggregate summary row.
For example, without ROLLUP, a summary on the
sales table based on year,
country, and product might
look like this:
mysql>SELECT year, country, product, SUM(profit)->FROM sales->GROUP BY year, country, product;+------+---------+------------+-------------+ | year | country | product | SUM(profit) | +------+---------+------------+-------------+ | 2000 | Finland | Computer | 1500 | | 2000 | Finland | Phone | 100 | | 2000 | India | Calculator | 150 | | 2000 | India | Computer | 1200 | | 2000 | USA | Calculator | 75 | | 2000 | USA | Computer | 1500 | | 2001 | Finland | Phone | 10 | | 2001 | USA | Calculator | 50 | | 2001 | USA | Computer | 2700 | | 2001 | USA | TV | 250 | +------+---------+------------+-------------+
The output indicates summary values only at the
year/country/product level of analysis. When
ROLLUP is added, the query produces several
extra rows:
mysql>SELECT year, country, product, SUM(profit)->FROM sales->GROUP BY year, country, product WITH ROLLUP;+------+---------+------------+-------------+ | year | country | product | SUM(profit) | +------+---------+------------+-------------+ | 2000 | Finland | Computer | 1500 | | 2000 | Finland | Phone | 100 | | 2000 | Finland | NULL | 1600 | | 2000 | India | Calculator | 150 | | 2000 | India | Computer | 1200 | | 2000 | India | NULL | 1350 | | 2000 | USA | Calculator | 75 | | 2000 | USA | Computer | 1500 | | 2000 | USA | NULL | 1575 | | 2000 | NULL | NULL | 4525 | | 2001 | Finland | Phone | 10 | | 2001 | Finland | NULL | 10 | | 2001 | USA | Calculator | 50 | | 2001 | USA | Computer | 2700 | | 2001 | USA | TV | 250 | | 2001 | USA | NULL | 3000 | | 2001 | NULL | NULL | 3010 | | NULL | NULL | NULL | 7535 | +------+---------+------------+-------------+
For this query, adding ROLLUP causes the
output to include summary information at four levels of
analysis, not just one. Here's how to interpret the
ROLLUP output:
Following each set of product rows for a given year and
country, an extra summary row is produced showing the total
for all products. These rows have the
product column set to
NULL.
Following each set of rows for a given year, an extra
summary row is produced showing the total for all countries
and products. These rows have the country
and products columns set to
NULL.
Finally, following all other rows, an extra summary row is
produced showing the grand total for all years, countries,
and products. This row has the year,
country, and products
columns set to NULL.
Other Considerations When using
ROLLUP
The following items list some behaviors specific to the MySQL
implementation of ROLLUP:
When you use ROLLUP, you cannot also use an
ORDER BY clause to sort the results. In other
words, ROLLUP and ORDER BY
are mutually exclusive. However, you still have some control
over sort order. GROUP BY in MySQL sorts
results, and you can use explicit ASC and
DESC keywords with columns named in the
GROUP BY list to specify sort order for
individual columns. (The higher-level summary rows added by
ROLLUP still appear after the rows from which
they are calculated, regardless of the sort order.)
LIMIT can be used to restrict the number of
rows returned to the client. LIMIT is applied
after ROLLUP, so the limit applies against
the extra rows added by ROLLUP. For example:
mysql>SELECT year, country, product, SUM(profit)->FROM sales->GROUP BY year, country, product WITH ROLLUP->LIMIT 5;+------+---------+------------+-------------+ | year | country | product | SUM(profit) | +------+---------+------------+-------------+ | 2000 | Finland | Computer | 1500 | | 2000 | Finland | Phone | 100 | | 2000 | Finland | NULL | 1600 | | 2000 | India | Calculator | 150 | | 2000 | India | Computer | 1200 | +------+---------+------------+-------------+
Using LIMIT with ROLLUP
may produce results that are more difficult to interpret,
because you have less context for understanding the
super-aggregate rows.
The NULL indicators in each super-aggregate
row are produced when the row is sent to the client. The server
looks at the columns named in the GROUP BY
clause following the leftmost one that has changed value. For
any column in the result set with a name that is a lexical match
to any of those names, its value is set to
NULL. (If you specify grouping columns by
column number, the server identifies which columns to set to
NULL by number.)
Because the NULL values in the
super-aggregate rows are placed into the result set at such a
late stage in query processing, you cannot test them as
NULL values within the query itself. For
example, you cannot add HAVING product IS
NULL to the query to eliminate from the output all but
the super-aggregate rows.
On the other hand, the NULL values do appear
as NULL on the client side and can be tested
as such using any MySQL client programming interface.
MySQL extends the use of GROUP BY so that you
can use non-aggregated columns or calculations in the
SELECT list that do not appear in the
GROUP BY clause. You can use this feature to
get better performance by avoiding unnecessary column sorting
and grouping. For example, you do not need to group on
customer.name in the following query:
SELECT order.custid, customer.name, MAX(payments) FROM order,customer WHERE order.custid = customer.custid GROUP BY order.custid;
In standard SQL, you would have to add
customer.name to the GROUP
BY clause. In MySQL, the name is redundant.
Do not use this feature if the columns you
omit from the GROUP BY part are not constant
in the group. The server is free to return any value from the
group, so the results are indeterminate unless all values are
the same.
A similar MySQL extension applies to the
HAVING clause. The SQL standard does not
allow the HAVING clause to name any column
that is not found in the GROUP BY clause if
it is not enclosed in an aggregate function. MySQL allows the
use of such columns to simplify calculations. This extension
assumes that the non-grouped columns will have the same
group-wise values. Otherwise, the result is indeterminate.
If the ONLY_FULL_GROUP_BY SQL mode is
enabled, the MySQL extension to GROUP BY does
not apply. That is, columns not named in the GROUP
BY clause cannot be used in the
SELECT list or HAVING
clause if not used in an aggregate function.
In some cases, you can use MIN() and
MAX() to obtain a specific column value even
if it isn't unique. The following gives the value of
column from the row containing the smallest
value in the sort column:
SUBSTR(MIN(CONCAT(RPAD(sort,6,' '),column)),7)
See Section 3.6.4, “The Rows Holding the Group-wise Maximum of a Certain Field”.
Note that if you are trying to follow standard SQL, you can't
use expressions in GROUP BY clauses. You can
work around this limitation by using an alias for the
expression:
SELECT id,FLOOR(value/100) AS val
FROM tbl_name
GROUP BY id, val;
MySQL does allow expressions in GROUP BY
clauses. For example:
SELECT id,FLOOR(value/100)
FROM tbl_name
GROUP BY id, FLOOR(value/100);