Page 1 of 1

Simple SQL

PostPosted: 22 Feb 2013, 08:17
by Ursego
Divide one complicated SQL query into a few simpler sub-queries.

The "divide and conquer" method works well in SQL queries. If you can take a part of a query into a sub-query - do that (unless it can impair performance)! From the readability viewpoint, it's always better to write a number of simple SELECT statements than mix everything into one unreadable heap. Let's compare two the approaches.

The situation:
Country names are stored in country table (PK country_id).
Departments are in different countries (i.e. dept table has country_id FK).
Each employee is assigned to a department (i.e. emp table has dept_id FK).

The question: what is the country name of employe #12345?

*** BAD code (the "all in one heap" approach), the regular tables join syntax: ***

Code: Select all
 SELECT country.country_name
   INTO v_country_name
   FROM country,
  WHERE country.country_id = dept.country_id
    AND dept.dept_id = emp.dept_id
    AND emp.emp_id = 12345;

*** Another BAD code (the "all in one heap" approach), the ANSI tables join syntax: ***

Code: Select all
 SELECT country.country_name
   INTO v_country_name
   FROM country
   JOIN dept
     ON dept.country_id = country.country_id
   JOIN emp
     ON emp.dept_id = dept.dept_id
    AND emp.emp_id = 12345;

*** GOOD code (output of each sub-query serves as the input for its enclosing query): ***

Code: Select all
 SELECT country_name
   INTO v_country_name
   FROM country
  WHERE country_id =

        (SELECT country_id
           FROM dept
          WHERE dept_id =

                (SELECT dept_id
                   FROM emp
                  WHERE emp_id = 12345));

In the second method, the deepest sub-query returns a single dept_id (for example, 7), and that converts the next-level query to something like "SELECT country_id FROM dept WHERE dept_id = 7". And so on, and so on. If the overall query has not returned the expected result (the country name), you definitely know where to start the investigation. Maybe, you will find the problem simply looking at the code since each WHERE clause is very simple and you can concentrate on each table rather than see the forest but not to see the trees.
This example is very primitive (only to illustrate the idea), so breaking it down into sub-queries doesn't really make the life much easier, but SQLs in real applications can be much more complicated.

There is also third method - SELECT each time INTO a variable and pass that variable to the subsequent query as retrieval argument. It is the best practice from the viewpoint of readability:

*** VERY GOOD code (using interim variables): ***

Code: Select all
 SELECT dept_id
   INTO v_dept_id -- this variable will be used as the parameter of the next query
   FROM emp
  WHERE emp_id = 12345;

 SELECT country_id
   INTO v_country_id -- this variable will be used as the parameter of the next query
   FROM dept
  WHERE dept_id = v_dept_id;

 SELECT country_name
   FROM country
  WHERE country_id = v_country_id;

The advantages of this approach:

1. Clarity of presentation - it's easy to see what happens.

2. You can check each variable for NULL and raise an error if any of them failed to be obtained - in accordance with the best practices of defensive programming. In the previous two methods, you can validate only the final result (v_country_name).

3. If the final result is wrong (even though NOT NULL), you can see in the debugger which step exactly was incorrect. In the real life, you can investigate issues with very complicated SELECTs involved, so you want the bug search area to be as narrowed down as possible.

The VERY GOOD method (with interim variables) should be used only in stored procedures. Using it in SQL, embedded in front end, will worsen performance because the application travels twice (to the database server and back) for each SELECT (it means 6 trips instead of 2 in the given example).

Embed a sub-query in the SELECT statement.

Another useful life-hack is to embed a sub-query in the SELECT statement (so, that sub-query will supply the value for exactly one field of the main SELECT). Since such a sub-query will be invoked once per each record of the recordset, it's named a correlated sub-query (in contrast to regular one in the previous example, which is invoked only one time regardless the number of rows, retrieved by the main SELECT). That approach is very handy when you need to grab the description of an application code, stored in a catalog table (like order status), for example:

*** BAD code (the "all in one heap" approach): ***

Code: Select all
SELECT orders.order_id,
  FROM orders,
 WHERE order_status.order_status_id = orders.order_status_id
   AND orders.order_id = 12345;

*** GOOD code (the main query stays very primitive): ***

Code: Select all
SELECT order_id,
       (SELECT order_status_desc FROM order_status WHERE order_status.order_status_id = orders.order_status_id) AS order_status_desc
  FROM orders
 WHERE order_id = 12345;

Here is a real-life example. Imagine which mess would you see in the WHERE clause if it will be one complicated query rather than five simple!

Code: Select all
SELECT co_cd,
       (SELECT prod_desc FROM cd_prod WHERE cd_prod.prod_cd = br_item_cov_par.prod_cd) AS prod_desc,
       (SELECT plan_desc FROM cd_plan WHERE cd_plan.plan_cd = br_item_cov_par.plan_cd) AS plan_desc,
       (SELECT item_ins_desc FROM cd_item_ins WHERE cd_item_ins.item_ins_typ = br_item_cov_par.item_ins_typ) AS item_ins_desc,
       (SELECT cov_desc FROM cd_cov WHERE cd_cov.cov_cd = br_item_cov_par.cov_cd) AS cov_desc,
       Count(1) AS row_count
  FROM br_item_cov_par
 WHERE row_sta = 'A'
   AND tbl_exp_dt IS NULL
 GROUP BY co_cd, prov_cd, prod_cd, plan_cd, rider_no, item_ins_typ, cov_cd
HAVING Count(1) > 1

:arrow: :arrow: :arrow: :arrow: :arrow: :arrow: :arrow:

Caution! The method of breaking of one complicated query into simpler sub-queries must be used carefully when you work with tables having a huge amount of records. The method really makes the code more elegant and easy, but you should make sure it doesn't affect the execution speed, especially when you deal with correlated sub-queries (just to remind - they are executed for each row, returned by the main SELECT). If you have any concern regarding performance, check the execution plan. There is a good chance that the smart database optimizer will build the same plan for both the methods (the "braking into sub-queries" and the "everything in one unreadable heap").