SQL: Summary Queries


SQL supports data summarization through column functions and the GROUP BY and HAVING clauses.

Column Functions

A SQL column function takes an entire column of data and produces a single data item that summarizes the column.

The most important column functions are:

SUM() computes the total of a column.
AVG() computes the average value in a column.
MIN() finds the smallest value in a column.
MAX() finds the largest value in a column.
COUNT() counts the number of values in a column.
COUNT(*) counts rows of query results.

The argument to a column function can be a simple column name or an SQL expression. We will see an example later.

Computing a Column Total (SUM)

The SUM() column function computes the sum of a column of data values.

The data in the column must have a numeric type (integer, decimal, floating point, or money).

For instance:

What is the total amount for the order 10100?

The result of the SUM() function should have the same basic data type as the data in the column, but sometimes in order to “fit” a result, SQL can give us a number with higher precision. For instance, if we apply the SUM() function to a column of 16-bit integers, we can obtain a 32-bit integer as a result.

Computing a Column Average (AVG)

The AVG() column function computes the average of a column of data values.

The result of the AVG() function could have a different data type.

How many products S18_174, in average, have been ordered?

Finding Extreme Values (MIN and MAX)

The MIN() and MAX() column functions find the smallest and largest values in a column, respectively.

What is the most expensive product?

The data in the column can contain numeric, string, or date/time information.

Counting Data Values (COUNT)

The COUNT() column function counts the number of data values in a column.

How many orders (in total) have been placed?

 The COUNT() function always returns an integer.

Because, it doesn’t really matter which column you specify as the argument of the COUNT() function, SQL supports a special COUNT(*) column function.

COUNT(*) counts rows rather than data values.

Here is the same query, rewritten once again to use the COUNT(*) function:

NULL Values and Column Functions

The column functions listed above, each take a column of data values and produce a single data value as a result.

If one or more of the data values in the column is a NULL value, COUNT() will ignore them.

Normally we don’t want this effect. Then, we should use the COUNT(*) function.

When (SUM(A)-SUM(B)) is not equal to SUM(A-B)

We know, or at least we expect, that (SUM(A)–SUM(B)) and SUM(A-B) produce identical results.

But in SQL it could not be true. Let’s create a simple example:

We should have a table like this:

A B
2 5
3 2
5 1
8 Null

 

Now, let’s calculate:

SUM(A) = 18

SUM(B) = 8

(SUM(A)-SUM(B)) = 10

Now, if we calculate

SUM(A-B) = 2.

Why?

Because in the last row in B (with a NULL value), the subtraction produces a NULL, which is ignored by the SUM() function.

A B
2 5
3 2
5 1
8 Null

 

(8-Null) is not considered in the final calculus.

Duplicate Row Elimination (DISTINCT)

DISTINCT can be used to eliminate duplicate values from a column before applying a column function to it.

For instance:

In how many countries do we have sales rep?

Tip: we can’t use DISTINCT with the MIN() and MAX() column functions because it has no impact on their results; or with COUNT(*) function.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS