SAND CDBMS SQL Reference - SQL Commands (SELECT: Optional Clauses)

SAND CDBMS SQL Reference Guide
SELECT

Previous Topic:
Description of SELECT Clauses

Chapter Index

Next Topic:
The UNION Operator

Optional Clauses

Note: When used together in a single statement, these clauses must be specified in the following order: WHERE, then GROUP BY, then HAVING, then WITH SAMPLE OF, then ORDER BY, then FETCH FIRST...ONLY.

WHERE
The optional WHERE clause sets conditions that each record must meet before being retrieved by a query. If the SELECT statement does not include a WHERE clause, every row in the table(s) being queried will appear in the result.

In the WHERE clause, a Boolean value expression defines a selection criterion (or criteria) applied to each row of data values in the table being queried by the SELECT statement. If the data in a particular record matches the selection criteria, then that record is retrieved by the query; otherwise the record is passed over. Refer to Selection Criteria for WHERE and HAVING Clauses for more information.

WHERE clause selection criteria may include nested subqueries, or may connect two or more tables together in a join condition.

GROUP BY
The GROUP BY clause allows query results from aggregate functions on columns to be grouped in terms of the distinct values appearing in specified columns. The GROUP BY clause collects the rows that meet the specified search criteria into groups containing common values in the specified columns. There must be at least one expression in the SELECT column list that represents a group sharing a single value.

The grouping-column argument(s) must consist of a column name (or names) from among those specified in the column list following the SELECT command keyword. Multiple grouping-column arguments may be specified, separated by commas. A direct column reference need not correspond to a column in the projection list; the query output can be grouped by any column from a table in the FROM list.

An alternative to specifying a column by name is to use a positive integer to identify the column by its position in the projection list of the SELECT statement. That is, the first item in the projection list is 1, the second is 2, and so on. One restriction, however, is that a position number cannot refer to a constant, or an expression composed entirely of constants, in the projection list.

Additionally, grouping can be done by specifying a correlation name from the projection list, which is useful for grouping by a named value expression. For instance, if the SELECT list contains the item "col1*col2 AS derived_col", the GROUP BY clause would reference derived_col to group by that derived column.

Even if a value expression in the projection list does not have an alias, grouping by this value expression can still be accomplished by specifying the same expression in the GROUP BY clause. The expression in the GROUP BY clause must correspond exactly to the one in the projection list, excluding whitespace differences and extraneous parentheses, otherwise an error will be returned.

For example:

SELECT RIGHT(dept, 2) AS Subdept, e_no, AVG(commission) AS Avg_Commission FROM employee GROUP BY Subdept, e_no;

The results of the above query will be grouped first by the last two characters of the department code (the "subdepartment"), then by the employee number; and for each employee, the average sale amount will be calculated. Note that the first grouping (RIGHT(dept, 2)) is identified by alias ("Subdept"), while the second one (e_no) is identified directly by the column name.

An example of the same query, but using a value expression and position number in the GROUP BY clause, is the following:

SELECT RIGHT(dept, 2), e_no, AVG(commission) AS Avg_Commission FROM employee GROUP BY RIGHT(dept, 2), 2;

In this query, the results are grouped by an expression from the SELECT list, RIGHT(dept, 2), and by the e_no field, which is referenced by its ordinal position in the projection list (2).

A HAVING clause can be applied to qualify further the results of a GROUP BY clause.

HAVING
The HAVING clause further qualifies groups in the same way that the WHERE clause qualifies rows. Groups of rows from the tables referenced in the FROM clause that satisfy all the selection-criteria arguments are used in composing the query result. If no GROUP BY clause is included in the statement, the entire table is treated as a single group.

A HAVING clause may contain Boolean value expressions, simply or compounded together using AND or OR operators. Each Boolean predicate value expression must consist of a grouped column (that is, a column referenced in the GROUP BY clause), an aggregate function, a constant, or some arithmetic combination of grouped columns, aggregate functions, and constants. Subqueries may not appear in the HAVING clause.

For instance, we might wish to refine the GROUP BY example above, using the HAVING clause:

SELECT dept, COUNT(*) AS Number, AVG(commission) AS Avg_Commission FROM employee GROUP BY dept HAVING COUNT(*) >= 50;

The HAVING clause in this query limits the result set of the original GROUP BY example to those departments with 50 or more employees. The conditional statement in the HAVING clause is tested against each group defined in the GROUP BY clause. In this case, the COUNT(*) function counts the total number of rows per department.

Note that unlike the HAVING clause, the WHERE clause cannot use aggregate functions in its conditional expressions.

WITH SAMPLE OF
The WITH SAMPLE OF clause returns a random sampling of the result set produced by the SELECT statement. That is, if the same SELECT...WITH SAMPLE OF statement is issued multiple times, a different subset of the total result set is returned to the caller each time.

The number of rows in the returned sample can either be a fixed number, or proportional to the total number of rows in the result set. When the "IN factor"clause is used, the number of rows returned from the result set is in the same proportion as number:factor. The following SELECT statement, for example, returns a 10% subset of the entire result set:

SELECT p_no FROM products WITH SAMPLE OF 1 IN 10;

If 100 rows were contained in the result set, then 10 of those rows, determined randomly, would be returned to the caller. Note that if the designated proportion of the result set is a fractional number, the number will be rounded down to the nearest integer. For example, if the above SELECT statement produced a result set of 85 rows, the random sample of 10% would return 8 rows.

When the "IN factor"clause is omitted, a sample of exactly number rows will be returned. For instance, the following statement returns a 10-row subset of the result set produced by the query:

SELECT p_no FROM products WITH SAMPLE OF 10;

If fewer than number rows are contained in the result set, then all rows are returned.

The syntax rules for number and factor are as follows:

number and factor must be integers greater than zero.

number must be less than or equal to factor.

If number is equal to factor, then no sampling is performed: the entire result set is returned. If more than zero rows are selected but the sample size is calculated as zero, then one row will be randomly chosen. The sampling logic will never return zero rows, unless the result set itself contains zero rows.

The WITH SAMPLE OF clause is also supported in INSERT...SELECT operations. This provides a way to select a random subset from large tables for subsequent aggregation operations.

ORDER BY
The ORDER BY clause sorts the rows returned by the query according to the values in the specified columns, in the order in which they are specified. That is, results are sorted according to the values in the first column argument first; the values in the second column argument next, and so on. The column argument can be a column from among those returned by the query (that is, one of the select list expressions), or it can be a column from any table referenced in the FROM clause (regardless of whether the column is included in the select list). Multiple column arguments may be specified, separated by commas. Results can be sorted in ascending order (the default) or descending order by specifying the ASC or DESC keywords, respectively.

The results of the ORDER BY can also have null values sorted before non-null values or after (the default) by specifying the NULLS FIRST or NULLS LAST option, respectively. Note that the sorting of nulls is independent of the ASC/DESC option, so NULLS LAST will put the null values at the bottom, and NULLS FIRST will put the null values at the beginning, regardless of whether the non-null values are in ascending or descending order.

As an alternative to specifying a column by name, a positive integer may be used to identify the column by its position in the projection list of the SELECT statement. That is, the first item in the projection list is 1, the second is 2, and so on. In the case of SELECT *, the positive integer refers to the position of a column in the actual tables referenced in the FROM list. For multiple tables, the columns are counted sequentially in the order that the tables appear in the FROM list.

Ordering can also be specified using a correlation name from the projection list, which is useful for ordering results by derived column. For example, if the SELECT list contains the item "col1*col2 AS derived_col", the ORDER BY clause would reference derived_col to order the results by that derived column.

Another way to sort by a value expression from the projection list is to include the same expression in the ORDER BY clause. The expression in the ORDER BY clause must correspond exactly to the one in the projection list, excluding whitespace differences and extraneous parentheses, otherwise an error will be returned.

Note that an ORDER BY clause can be used in the following SQL if a FETCH FIRST...ONLY clause is included as well:

the query part of a CREATE VIEW statement

subqueries

nested table expressions

INSERT...SELECT statements

CREATE TABLE...SELECT statements.

If a FETCH FIRST...ONLY clause is missing from the above SQL, the ORDER BY clause does nothing: it is simply ignored.

The following query demonstrates the use of the ORDER BY clause:

SELECT DISTINCT P.p_name, P.price FROM products P, orders O WHERE P.p_no = O.p_no ORDER BY 2 DESC NULLS FIRST;

In this example there are two tables, products and orders. The query returns the product name (p_name) and price of each product in the product table that has a corresponding entry in the orders table. The DISTINCT keyword ensures that duplicate rows will not be included in the results. The ORDER BY clause dictates that the results be ordered by the second argument in the projection list (P.price), in descending order. Here, the numeric position of P.price in the SELECT list is specified, but a direct column reference could also have been used. Because of the NULLS FIRST option, if there are any null values in the P.price column, those rows will appear first in the ordering.

A slightly different query, one that sorts by a value expression in the projection list, is the following:

SELECT DISTINCT P.p_name, P.price*0.07 AS price_tax FROM products P, orders O WHERE P.p_no = O.p_no ORDER BY price_tax;

In the above example, the ORDER BY clause references the projection list value expression (P.price*0.07) using its alias, price_tax. The same result can also be achieved by specifying the full value expression, P.price*0.07, in the ORDER BY clause (and note that the ORDER BY expression must include the table qualifier, since it appears in the projection list expression).

FETCH FIRST...ONLY
The FETCH FIRST...ONLY clause limits the results of a query to either a specified number of rows or a fraction of the total result set. In this respect, the FETCH FIRST...ONLY clause is similar to the WITH SAMPLE OF clause, except that the records returned by FETCH FIRST...ONLY are not random.

If FETCH FIRST number ONLY is included in the SELECT statement, the query will return only the first number rows of the result set. For example, the following query will return, at most, the first 10 rows from the result set:

SELECT p_no FROM products FETCH FIRST 10 ONLY;

If FETCH FIRST number IN factor ONLY is included, the query will return a fraction of the total result set calculated as number divided by factor. For example, the following SELECT statement returns the first 1/10th of the records from the result set:

SELECT p_no FROM products FETCH FIRST 1 IN 10 ONLY;

If the query without a FETCH FIRST...ONLY clause returns 100 rows, including the clause above limits the results to the first 10 rows (1/10th, or 10%, of the total results). As with WITH SAMPLE OF, the number of rows is rounded down to the nearest integer when calculating the fraction of the result set to return.

The syntax rules for number and factor are as follows:

number and factor must be integers greater than zero

number must be less than or equal to factor (that is, at most 100% of a result set can be returned).

If fewer than number rows are contained in the result set, all rows are returned. If number is equal to factor, then there is no filtering: the entire result set is returned. If the specified percentage of the result set is effectively zero rows, the first row of the result set is returned. The FETCH FIRST...ONLY clause will never return zero rows, unless the result set itself contains zero rows.

If included, the FETCH FIRST...ONLY clause must appear at the end of a SELECT statement.

Previous Topic:
Description of SELECT Clauses

Chapter Index

Next Topic:
The UNION Operator

SAND CDBMS SQL Reference Guide SELECT

SAND CDBMS SQL Reference Guide
SELECT