SQL supports data summarization through **column functions** and the GROUP BY and HAVING clauses.

### Column Functions

A SQL column function takes an entire column of data and produces a single data item that summarizes the column.

The most important column functions are:

SUM() computes the total of a column.

AVG() computes the average value in a column.

MIN() finds the smallest value in a column.

MAX() finds the largest value in a column.

COUNT() counts the number of values in a column.

COUNT(*) counts rows of query results.

The argument to a column function can be a *simple column name* or an *SQL expression*. We will see an example later.

#### Computing a Column Total (SUM)

The SUM() column function computes the sum of a column of data values.

The data in the column must have a numeric type (integer, decimal, floating point, or money).

For instance:

*What is the total amount for the order 10100? *

1 2 3 |
SELECT orderNumber, SUM(quantityOrdered*priceEach) FROM OrderDetails WHERE orderNumber = 10100; |

The result of the SUM() function should have the same basic data type as the data in the column, but sometimes in order to “fit” a result, SQL can give us a number with higher precision. For instance, if we apply the SUM() function to a column of 16-bit integers, we can obtain a 32-bit integer as a result.

#### Computing a Column Average (AVG)

The AVG() column function computes the average of a column of data values.

The result of the AVG() function could have a different data type.

*How many products *S18_174*, in average, have been ordered?*

1 2 3 |
SELECT AVG(quantityOrdered) FROM OrderDetails WHERE productCode = "S18_1749"; |

#### Finding Extreme Values (MIN and MAX)

The MIN() and MAX() column functions find the smallest and largest values in a column, respectively.

*What is the most expensive product? *

1 2 |
SELECT productCode, MAX(buyPrice) FROM Products; |

The data in the column can contain numeric, string, or date/time information.

#### Counting Data Values (COUNT)

The COUNT() column function counts the number of data values in a column.

*How many orders (in total) have been placed? *

1 2 |
SELECT COUNT(orderNumber) FROM Orders; |

* **The **COUNT() **function always returns an integer.*

Because, it doesn’t really matter which column you specify as the argument of the COUNT() function, SQL supports a special COUNT(*) column function.

COUNT(*) counts rows rather than data values.

Here is the same query, rewritten once again to use the COUNT(*) function:

1 2 |
SELECT COUNT(*) FROM Orders; |

##### NULL Values and Column Functions

The column functions listed above, each take a column of data values and produce a single data value as a result.

*If one or more of the data values in the column is a **NULL **value, COUNT() will ignore them.*

Normally we don’t want this effect. Then, we should use the COUNT(*) function.

##### When (SUM(A)-SUM(B)) is not equal to SUM(A-B)

We know, or at least we expect, that (SUM(A)–SUM(B)) and SUM(A-B) produce identical results.

But in SQL it could not be true. Let’s create a simple example:

1 2 3 4 5 6 |
USE test; CREATE TABLE T_SUM (A int, B int); INSERT INTO T_SUM (A,B) VALUES (2,5); INSERT INTO T_SUM (A,B) VALUES (3,2); INSERT INTO T_SUM (A,B) VALUES (5,1); INSERT INTO T_SUM (A) VALUES (8); |

We should have a table like this:

A |
B |

2 | 5 |

3 | 2 |

5 | 1 |

8 | Null |

Now, let’s calculate:

1 |
SELECT SUM(A) FROM T_SUM; |

SUM(A) = 18

1 |
SELECT SUM(B) FROM T_SUM; |

SUM(B) = 8

1 |
SELECT (SUM(A)-SUM(B)) FROM T_SUM; |

(SUM(A)-SUM(B)) = 10

Now, if we calculate

1 |
SELECT SUM(A-B) FROM T_SUM; |

SUM(A-B) = 2.

Why?

Because in the last row in B (with a NULL value), the subtraction produces a NULL, which is ignored by the SUM() function.

A |
B |

2 | 5 |

3 | 2 |

5 | 1 |

8 | Null |

(8-Null) is not considered in the final calculus.

#### Duplicate Row Elimination (DISTINCT)

DISTINCT can be used to eliminate duplicate values from a column **before applying** a column function to it.

For instance:

*In how many countries do we have sales rep? *

1 2 |
SELECT COUNT(DISTINCT country) FROM Customers; |

*Tip: we can’t use DISTINCT with the MIN() and MAX() column functions because it has no impact on their results; or with COUNT(*) function.*