SQL Reports

All Things Being SQL – SQL Reports and SQL Reporting Resources

SELECTING CONDITIONALLY PART – 5 SQL Database

 SQL Database

SQL Database – GROUPING

Grouping, as its name suggests, is the process of combining two or more rows together based on value(s) that is/are common to one or more rows in a specified column. Let’s look at the below mentioned query to understand better.

  1. GROUP THE INVOICES TABLE BY SUPPLIER_ID AND SORT IT IN DESCENDING ORDER BY INVOICE AMOUNT.

SELECT * FROM INVOICES
GROUP BY SUPPLIER_ID
ORDER BY AMOUNT DESC;

INVOICE_NO

SUPPLIER_ID

INVOICE_DATE

AMOUNT

3

3

24-Feb-2011

3455

2

2

24-Feb-2011

2345

1

1

24-Feb-2011

1234

FIGURE 33: The grouped result, notice that some information is lost

The GROUP BY clause ignores rows containing duplicate entries for the column specified (in our case the SUPPLIER_ID column). For the time being, it can be considered that random removal takes place. There is no set rule that dictates which column would be ignored by the GROUP BY clause when displaying the result.

The true power of the GROUP BY clause is realized when computed columns are involved in the SQL Database query.

  1. WHAT IS THE TOTAL INVOICE AMOUNT ISSUED TO US BY EACH SUPPLIER? ORDER THE RESULT IN DESCENDING ORDER.

SELECT SUPPLIER_ID, SUM(AMOUNT) AS TOTAL FROM INVOICES
GROUP BY SUPPLIER_ID
ORDER BY TOTAL DESC;

SUPPLIER_IDTOTAL2122213576915734FIGURE 34: The proper use of GROUP BYAs can be seen above the GROUP BY clause can be used to create row wise summary (or as in our case – sub totals) of related rows. Calculations applied to a grouped row are actually applied to all the rows in the group. The result of each row is then combined as specified in the query (we used SUM here). THE “HAVING” CLAUSE WHICH SUPPLIERS HAVE ISSUED US INVOICES, WHOSE TOTAL AMOUNT EXCEEDS 6000$?

The query in example 2 above can be applied, but as always, it won’t be efficient. As seen in figure 34, we somehow need to filter out two rows (for SUPPLIER_ID 3 and 1), to get the result. This is where the HAVING clause is useful. A simple HAVING clause when added to the above query will fetch the required result for us

SELECT * FROM INVOICES
GROUP BY SUPPLIER_ID
HAVING TOTAL > 6000
ORDER BY AMOUNT DESC;

SUPPLIER_ID

AMOUNT

2

12221

FIGURE 35: “HAVING” clause

SQL Database – Remember the WHERE clause: It filtered rows based on criteria specified by the user. The HAVING clause is to “groups of rows” what the WHERE clause is to simple rows. The HAVING clause simply filters “groups of rows” based on user criteria.

Another thing to note is the use of HAVING clause, right after the GROUP BY clause. This is important. Had we written SELECT SUPPLIER_ID, SUM(AMOUNT) AS TOTAL FROM INVOICES GROUP BY TOTAL, ORDER BY TOTAL DESC HAVING TOTAL > 6000; The query would have failed. The HAVING clause, if specified. must always is written after the GROUP BY clause in any SQL query.

THE USE OF THE NULL KEYWORD IN SQL DATABASE

The “NULL” keyword is a special keyword is programming jargon that is used to indicate that “no data exists”. Let’s us take a look at the Suppliers table once more.

SUPPLIER_ID

NAME

CITY

1

ABC Corp

New York

2

DEF Inc

 

3

GHI Associates

San Fransisco

4

JKL Industries

Washington

Figure 36: The “Suppliers” table (with a little change)

SQL Database – As you can see the CITY column for SUPPLIER_ID – 2 contains nothing. Such situations occur in practical life as well. For example, imagine a new recruit in an organization who is yet to be assigned an employee ID or position. In a theoretical “EMPLOYEES” table, the columns “EMPLOYEE_ID” and “DEPARTMENT” might contain blank spaces temporarily, until the employee is assigned these, at a later date. Technically such blank spaces in a table are called “nulls” – they contain nothing.

  1. WHICH SUPPLIER HAS NOT BEEN ASSIGNED A CITY YET! (PUN INTENDED)?

SELECT * FROM SUPPLIERSWHERE CITYIS NULL;

SUPPLIER_ID

NAME

CITY

2

DEF Inc

 

FIGURE 37: The use of NULL

Had the CITY column contained something like “NA”, “N/A”, “na”, “-”, “Unavailable” etc, this query would have failed. NULL means a blank space and nothing more. Anything else is simply “NOT NULL“. Also, as NULL denotes nothing, it is not a text and/or a numerical value. The query SELECT * FROM SUPPLIERS WHERE CITY = NULL; is therefore, incorrect as per the definition of NULL.”IS NULL” clause is the proper way to reference a NULL value.

A SPECIAL WARNING ON THE USE OF NULL WITH AGGREGATE FUNCTIONS

 SQL Database – Let’s go back, and look at the AVG(AMOUNT) example – FIGURE 28 , what would have happened if one (or more) row of the invoice table contained a blank space in the AMOUNT column? How would it have affected this function?

INV_NO

SUP_ID

INV_DATE

INV_AMOUNT

1

1

24-Feb-2011

1234

2

2

24-Feb-2011

 

3

3

24-Feb-2011

3455

4

3

25-Feb-2011

 

5

2

26-Feb-2011

9876

6

1

27-Feb-2011

4500

FIGURE 38: The modified Invoices table

SELECT

AVG(AMOUNT)

FROM INVOICES

 

AVG(AMOUNT)

3177.5

FIGURE 39: The average of all invoice amounts is incorrect.

Why did this occur? The actual answer should be:

(1234 + 3455 + 9876 + 4500) / 4 = 19065 / 4 = 4766.25

The result returned was actually calculated by SQL in the following manner:

(1234 + 3455 + 9876 + 4500) / 6 = 19065 / 6 = 3177.5

As you can see, even the rows with NULL values are counted while averaging the total. Such a situation too might occur under special circumstances in real life. NULL values should always be considered when statistical functions are applied.

In the next section we will learn, one of the most important features that SQL database provides – the joining of two tables.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>