Simple SQL

Link to this posting

Postby Ursego » 22 Feb 2013, 08:17

Divide one complicated SQL query into a few simpler sub-queries.

The "divide and conquer" method works well in SQL queries. If you can take a part of a query into a sub-query - do that (unless it can impair performance)! From the readability viewpoint, it's always better to write a number of simple SELECT statements than mix everything into one unreadable heap. Let's compare two the approaches in a simple query, reporting the country name by a city appearing in an address:

*** BAD code (the "all in one heap" approach): ***

Code: Select all
 SELECT @country_name = country.country_name
   FROM country,
  WHERE country.country_id = city.country_id
    AND city.city_id = address.city_id
    AND address.address_id = @address_id;

*** GOOD code (output of each sub-query serves as the input for its enclosing query): ***

Code: Select all
 SELECT @country_name = country_name
   FROM country
  WHERE country_id =

        (SELECT country_id
           FROM city
          WHERE city_id =

                (SELECT  city_id
                   FROM address
                  WHERE address_id = @address_id))

In the second method, the deepest sub-query returns a single city_id, and that converts the next-level query to something like "SELECT country_id FROM city WHERE city_id = 12345". And so on, and so on. If the overall query has not returned the expected result (the country name), you definitely know where to start the investigation. Maybe, you will find the problem simply looking at the code since each WHERE clause is very simple and you can concentrate on each table rather than see the forest but not to see the trees.

This example is very primitive (only to illustrate the idea), so breaking it down into sub-queries doesn't really make the life much easier, but SQLs in real applications can be much more complicated.

There is also third method - SELECT each time INTO a variable and pass that variable to the subsequent query as retrieval argument. It is the best practice from the viewpoint of readability:

*** VERY GOOD code (using interim variables): ***

Code: Select all
 SELECT @city_id = city_id
   FROM address
  WHERE address_id = @address_id

 SELECT @country_id = country_id
   FROM city
  WHERE city_id = @city_id

 SELECT @country_name = country_name
   FROM country
  WHERE country_id = @country_id

The advantages of this approach:

1. Clarity of presentation - it's easy to see what happens.

2. You can check each variable for NULL and raise an error if any of them failed to be obtained - in accordance with the best practices of defensive programming. In the previous two methods, you can validate only the final result (@country_name).

3. If the final result is wrong (even though NOT NULL), you can see in the debugger which step exactly was incorrect. In the real life, you can investigate issues with very complicated SELECTs involved, so you want the bug search area to be as narrowed down as possible.

The VERY GOOD method (with interim variables) should be used only in stored procedures. Using it in SQL, embedded in front end, will worsen performance because the application travels twice (to the database server and back) for each SELECT (it means 6 trips instead of 2 in the given example).

Caution! This tip must be used carefully when you work with tables having a huge amount of records. It really makes the code more elegant, but you should make sure it doesn't decrease the query execution speed. Be especially careful with correlated sub-queries - ones where a value of the sub-query field is compared to a value from the outer query field (in this situation the sub-query is executed once per each row of the main query). If you have any concern regarding performance, check the execution plan.
User avatar
Site Admin
Posts: 128
Joined: 19 Feb 2013, 20:33

IF you want to ((lose weight) OR (have unbelievable (brain function AND mental clarity))) THEN click:

free counters

eXTReMe Tracker