Unless you are a seasoned database pro,
"relational integrity" might be justanother fancy marketing phrase that you don't really understandbut are vaguely suspicious of.If the following explanation helps you, I will be very, very happy -- you don't even have to tell me; I'll know.And if you'd like more information, please contact me; I do database work for a living,answer email for free, and would love to arrange some onsite training or consulting at your company.To illustrate relational integrity concepts,
I use the following sample database:This database could be for an ezine (an electronic magazine web site),
a famous quotes search engine, an online library,or any application which has to keep track of content (articles)by owner (author).If you already know what RI is, please feel free to
skip over the following basic explanations.A relational database
contains tables of data which are related to each other.For example, articles are related to the author who wrote them,and conversely, authors are related to the articles they wrote. (Sounds obvious, and it is.)That's the relational part.When we work with a relational database,
we naturally expect that data in related tables stay related. For example, articles written by MartinB should alwaysbe related to MartinB and never be confused witharticles written by any other author.That's the integrity part.Relational Integrity is also called
Referential Integrity, perhaps because the mechanism for ensuring integrity is implementedby the way the tables reference each other.Both terms are okay with me. I usuallyjust say RI anyway.So let's say MartinB is an author in the
database and has author number 32. Further, let's say that for some reason (which we won't go into here, as it's not important) you need to change MartinB's author number from 32 to 45678. The way it works is that you change his author number on his author record in the author table.And then you had better also changeany references to author 32 in all related tables --like the articles table --because if you don't, they'll stillrefer to author number 32, which no longer exists.Mercifully, you don't have to do this yourself.
This is where RI comes in. RI can do it for you.The underlying structure and enforcement of RIis usually handled by the relational database system.I say usually because in all databasesystems I've ever worked with, RI is optional,absurd as that seems.Database developers have to turn RI on,or else ask the developers to provide external enforcementprocedures outside the database system.So if you don't have RI turned on for your particular
database, then when the day comes -- and you know it will --when you have to make a change and you forget the external procedure(s) to change all thenecessary related data values, the database will be said to have lost integrity. You will have "broken links" in your tables.You will be up the creek. (Next April 1, mention to a DBA that there are broken links in his database, and watch his face...)You will, I am certain, by this point have guessed
that RI can be a very useful feature to employ in your database.Let us take one more brief
moment to cover some basic definitionsbefore getting down to business.Again, please feel free toskip over these definitions.Every database relation is a one-to-many
relation involving just two tables, a parent table and a child table.Yes, you may hear other terms like one-to-one, many-to-many, and so on, butthese are all just special cases of the one-to-many relation.Don't let them fool you. Understanding databases is easy if you keep it simple.So each relation is one-to-many, involving a parent table
and a child table. This terminology is not standard, merely convenient, if somewhat unfortunate.Some people say master and detail; sometimes, I'llsay owner and member, a flashbackfrom pre-relational CODASYL databasesas out of date as bell-bottom pants (except that unlike pre-relational databases, bell-bottoms may one day be back in style).As you might guess, the parent is the "one" and
the child is the "many."Each parent can have many children.The reverse, I am afraid, is not true, not in a relational database.A child can have only one parent.Please don'tdwell too long on the obvious fact that in real life a childhas two parents -- you will only get confused. In a relational database,in a one-to-many relation, each child has at mostone parent. This is important. This is also why theparent-child terminology is unfortunate.Let's look at our sample database.
Each author is related one or more articles, the articles written by that author.The author table is the parent table, andthe article table is the child table.Note that an author can write many articles, but an article has only one author.
This is a natural consequence of it being a one-to-many relation.The scenario where an articleis co-authored by more than one authorinvolves a many-to-many relation,the contemplation of which, like the reality thatin real life a child has more than one parent,is beyond the scope of this discussion; it will only confuse younow.Relational databases implement RI
using primary keys and foreign keys.Other than their special role in RI,these "keys" are like any other database fields -- theycontain data values.The parent table contains the primary key and
the child table contains the foreign key which references (or "points" to, if you prefer) the parent table.Because each child relates to only one parent,
the child table needs only one foreign key.So where are the "many" child records in aone-to-many relation? They'rein the multiple rows of the child tablethat all have the same data value intheir foreign key, all referencing the same parent.For example, consider the following articles:
Author | Article |
---|---|
32 | Your clients need a CMS The Tao of Testing |
88 | 640 x 480 Isn't Dead Yet Inside the evolt.org Rebuild |
20 | Liquid Tables |
Above, you can see the one-to-many relation clearly.
Now look at it the way you normally seeit actually stored in the article table:Article | Author |
---|---|
Your clients need a CMS | 32 |
640 x 480 Isn't Dead Yet | 88 |
The Tao of Testing | 32 |
Inside the evolt.org Rebuild | 88 |
Liquid Tables | 20 |
Now you can see more clearly that each article is
related to only one author. This is the "many-to-one"reversal of the one-to-many relationship.It amounts to the same thing. You can always say somethingabout a one-to-many relation in two different ways.Finally, we get down to business.
In a relational database,
there are four basic actions you can perform on a table:These actions, known as SIUD, are carried out
by the -- wait for it -- Select, Insert, Update, and Delete SQL statements. (These actions are sometimes, as in business process modelling, alsocalled CRUD, using the initials for Create, Reference, Update, and Delete.CRUD is easier to remember, but not as helpful as SIUD, becauseyou may end up getting confused. For example, there's no Reference SQL statement, but there is a Create SQL statement, although it doesn't create any data. I'm sorry I evenmentioned CRUD. I just wanted to set the record straight, point out again that simplicity rules,and make sure you were awake.)Three of these actions -- Insert, Update, and Delete --
change the contents of the database table and therefore have RI implications.Remember, you want to stay on top of changes, preferably lettingthe database itself handle them, to ensure younever have "broken links" in your data.Now it gets a wee bit complicated.
For every relation, the Insert, Update, and Delete actions
can be performed on either the primary key (in the parent table), or the foreignkey (in the child table). There are thus a total of six possible RI actions forevery one-to-many relation:Parent Insert | Child Insert | |
---|---|---|
Parent Update | Child Update | |
Parent Delete | Child Delete |
Each of these six possible RI actions
needs to be examined when youdesign your database, because what you want to dois turn over as much of the enforcementof RI as possible to the database itself.Now it gets a little more complicated.
For each of the six RI actions mentioned above,
one of five RI rules can be implemented:Cascade | perform same action on related keys | |
---|---|---|
Restrict | disallow action | |
Set Null | set related keys to null | |
Set Default | set related keys to their default value | |
None | allow action, do nothing to related keys |
Does this mean that implementing RI
requires choosing one out of a possible thirty differentcombinations of actions for each one-to-many relation??The answer is yes -- but I hasten to add that it isn't really as bad or as muchwork as it may seem!!In fact, most database systems do not let you
define all of the above rules. Where you cannotdefine the rule, it typically defaults either to Restrict or None.Let's say you want to insert a new article, i.e. a child insert action.
In most databases, the Restrict rule would bein effect (assuming you have defined the foreign key properly,i.e. RI has been turned on).What this means is that you cannot insert an article for an author that doesn't exist; you are restricted from doing so.This is typical behaviour
for most one-to-many relations. You have to add the parent tothe database first, and then you can add child recordsthat reference that parent.In fact, if you wanted some other rule for Child Insert, you would find it extremely tricky to implement. Trustme, it's messy. Luckily, this requirement hardly evercomes up. I shouldn't even have mentioned it.Of the five possible actions for Child Insert, only Restrict really makes sense.Okay, I lied. Actually, there is one other action that
makes sense, in a macabre sort of way, only because you see it all the time,not that you would really want to define itthis way on purpose (assuming you have been paying attention).If you do not turn on RI for the author-article
one-to-many relation, the Child Insert action defaults to None.In other words, you can add an article that references any author numberyou like, the database will not check it for you,and who knows, it might result in a broken link.So leaving RI undefined for a given relationresults in Child Insert: None.Maintaining the author-article relation involves six different actions.
Below, I have chosen the RI rules that "make the most sense" to me.See if you agree or disagree with my comments:Author Insert: None
Author Update: Cascade
Author Delete: Restrict
Article Insert: Restrict
Article Update: Restrict
Article Delete: None
If you made it this far, congratulations --
things are about to get a whole lot simpler.Standard SQL-92 syntax provides a mechanism
for setting RI rules only for the parent delete and parent update actions.All others take a default. The rules are defined whendeclaring the foreign key on the child table:As mentioned, all other rules take defaults, and
if you do not define the parent delete and parent update rules,they take defaults too:Action | Default Rule |
---|---|
Parent Insert | None |
Parent Update | Restrict |
Parent Delete | Restrict |
Child Insert | Restrict |
Child Update | Restrict |
Child Delete | None |
None of these defaults is very surprising, when you think about it.
In fact, the only action for the author-article relationthat we thought made sense to be different from the defaultis Author Update, which we decided we wanted as Cascade.Luckily, that is one of the rules that can be easily set by the SQL syntax.So what's the big deal, then? If all the defaults are good,
and we can change the ones we want, how will RIsimplify your database project,shorten development time,and improve the quality of your life?The answer comes when you consider how some
applications go about maintaining relational integritywhen RI is not turned on in the database.Suppose the application wanted to insert an
article, and also wanted to make sure the author already exists in the database. It isnot difficult to imagine a developercoding up something like this --There is nothing inherently wrong with this approach,
other than thatWith RI turned on and Article Insert set to Restrict (the default),
the developer simply codesIt is not even always necessary to
test the return code and issue a friendly error message.Assuming the foreign key constraint is named properly,the database error message will basically say somethinglike "ERROR: integrity constraint ARTICLEAUTHOR violated - parent key notfound" and this is often enough to indicate what's going on.When designing a database, after all the
entities, tables, columns and data types have been identified,you will have a number of one-to-many relations.Examine each relationin light of the six possible actions,and the five possible rules for each.Start by considering all the default rules.Pay special attention to the parent updateand parent delete rules, because those canbe changed easily, and often are.Then prepare a rule table for eachrelation, similar to the samples we described earlierfor the author-article relation:Action | Rule |
---|---|
Author Insert | None |
Author Update | Cascade |
Author Delete | Restrict |
Article Insert | Restrict |
Article Update | Restrict |
Article Delete | None |
Highlight the rules that are different
from the default, as above.Then pass these rule tables as specifications to your developer.You'll save a lot of hassles, development time,
and unnecessary coding. And you might just seean increase in the quality of your life.I am indebted to ERwin, a popular data modelling tool,
for first introducing me to the concept of RI actions outside the scope of what isnormally defined by SQL syntax. You wouldnot believe what you have to do, and what ERwincan generate in the way of triggers and stored procedures,to achieve non-default RI rules for actions other than parent update and parent delete.If your project requires these, give me a shout."Improving the quality of life" is a
requirement that should be part of every databasespecification. I was taught thisin my very first database design coursefor IDMS in 1977 by William Casey of Cullinane Corporation,and it's as true today as it was then.