Page 1 of 1

Only surrogate primary keys

PostPosted: 22 Feb 2013, 08:10
by Ursego
Field(s), used as primary keys in tables of major entities, should be simple numerators (1, 2, 3 etc.), with no other real-life context (like Person ID card number or SIM card number - even if you are sure that the values are unique and required). Keys, having no other context besides being records' unique identifiers, are named "surrogate (synthetic) keys".

For catalog tables (like order status or customer types), mnemonic codes can be used (like "A" or "ACT" for Active, "D" or "DEL" for Deleted etc.) to make viewing data in tables and variables values in debugger more enjoyable.

For example, you are creating an information system for an existing company where each worker has an employee number. Don't use that number as the employees table's PK! Instead, create a field like emp_id. In addition, you can create a field to store the existing employee number (with NOT NULL and UNIQUE constraints if needed) - as well as for other "real-life" fields like Social Insurance Number. The database theory calls such fields "candidate keys" - they are like candidates to be elected as primary keys, but don't allow them to win the elections!

Many data architects don't realize which problems are waiting for developers if natural keys are used.

PROBLEM #1: combination of many fields (sometimes 5-8!!!) in a primary key. As a result, the developers are forced to:

1. Declare and pass between objects multiple variables or even whole structures/classes while ONE PRIMITIVE VARIABLE would be enough to carry out the work - to identify the record.
2. Write SQL queries with extra-inflated, haircurling WHERE clauses having massifs of additional lines written only for the sake of joining tables. Important business conditions, restricting the retrieved/updated population, can simply be lost in that mess - bugs are welcome!

PROBLEM #2: the information system needs to be changed to support a change in the business. Examples of such a situation:

1. The company changes employees numbering after a merge with another company.
2. Duplicate Person ID cards numbers found (that happened in one of the countries I have worked in).
3. An entity was defined badly during the initial system analysis. Later, when a part of the system has already been developed, the correct solution is found, and it is related to the real-life fields, used as PKs.

In any of those cases, you will change the functionality (relationships between tables as well as database and GUI application objects) spending time to rebuild and retest the system. But if you have used surrogate PKs then you don't need to perform that time-consuming and bugs-prone work. You are OK because the primary key's meaning has not been changed: it was a primitive numerator yesterday, and it will stay it after years through any business changes!

I wonder why in developers' forums there are discussions which primary keys - surrogate or natural - should be used. We have always to think what will happen in the worst case, and I have described above what will happen if you use a natural key - a lot of headache and more chances of bugs. But what will happen in the worst case if surrogate keys are used? In that case the table will have one extra field. Not a big trouble.