Ri To The Rescue
Posted on 11 Aug 2001
by rudy limeback (r937)
Rated 4.22 (Ratings: 15)
- More articles in Backend
How Relational Integritycan simplify your database project,shorten development time,and improve the quality of your life.
Unless you are a seasoned database pro,"relational integrity" might be justanother fancy marketing phrase that you don't really understandbut are vaguely suspicious of.If the following explanation helps you, I will be very, very happy -- you don't even have to tell me; I'll know.And if you'd like more information, please contact me; I do database work for a living,answer email for free, and would love to arrange some onsite training or consulting at your company.
To illustrate relational integrity concepts,I use the following sample database:
- articles such as stories, columns, opinions, recipes, quotes, ideas, or similar items of written content
- authors who write articles
This database could be for an ezine (an electronic magazine web site),a famous quotes search engine, an online library,or any application which has to keep track of content (articles)by owner (author).
If you already know what RI is, please feel free toskip over the following basic explanations.
What is a Relational Integrity?
A relational databasecontains tables of data which are related to each other.For example, articles are related to the author who wrote them,and conversely, authors are related to the articles they wrote. (Sounds obvious, and it is.)That's the relational part.
When we work with a relational database,we naturally expect that data in related tables stay related. For example, articles written by MartinB should alwaysbe related to MartinB and never be confused witharticles written by any other author.That's the integrity part.
Relational Integrity is also calledReferential Integrity, perhaps because the mechanism for ensuring integrity is implementedby the way the tables reference each other.Both terms are okay with me. I usuallyjust say RI anyway.
How RI WorksTables are related using data values.Usually, these are the values of numeric keys insteadof text fields. (Database designers love assigningnumeric keys, mainly because it's moreefficient than relating tables through text fields, but also because most databasescan generate autonumbers or sequence numbersfor this purpose.)
So let's say MartinB is an author in thedatabase and has author number 32. Further, let's say that for some reason (which we won't go into here, as it's not important) you need to change MartinB's author number from 32 to 45678. The way it works is that you change his author number on his author record in the author table.And then you had better also changeany references to author 32 in all related tables --like the articles table --because if you don't, they'll stillrefer to author number 32, which no longer exists.
Mercifully, you don't have to do this yourself.This is where RI comes in. RI can do it for you.The underlying structure and enforcement of RIis usually handled by the relational database system.I say usually because in all databasesystems I've ever worked with, RI is optional,absurd as that seems.Database developers have to turn RI on,or else ask the developers to provide external enforcementprocedures outside the database system.
So if you don't have RI turned on for your particulardatabase, then when the day comes -- and you know it will --when you have to make a change and you forget the external procedure(s) to change all thenecessary related data values, the database will be said to have lost integrity. You will have "broken links" in your tables.You will be up the creek. (Next April 1, mention to a DBA that there are broken links in his database, and watch his face...)
You will, I am certain, by this point have guessedthat RI can be a very useful feature to employ in your database.
Let us take one more briefmoment to cover some basic definitionsbefore getting down to business.Again, please feel free toskip over these definitions.
Parent and Child
Every database relation is a one-to-manyrelation involving just two tables, a parent table and a child table.Yes, you may hear other terms like one-to-one, many-to-many, and so on, butthese are all just special cases of the one-to-many relation.Don't let them fool you. Understanding databases is easy if you keep it simple.
So each relation is one-to-many, involving a parent tableand a child table. This terminology is not standard, merely convenient, if somewhat unfortunate.Some people say master and detail; sometimes, I'llsay owner and member, a flashbackfrom pre-relational CODASYL databasesas out of date as bell-bottom pants (except that unlike pre-relational databases, bell-bottoms may one day be back in style).
As you might guess, the parent is the "one" andthe child is the "many."Each parent can have many children.The reverse, I am afraid, is not true, not in a relational database.A child can have only one parent.Please don'tdwell too long on the obvious fact that in real life a childhas two parents -- you will only get confused. In a relational database,in a one-to-many relation, each child has at mostone parent. This is important. This is also why theparent-child terminology is unfortunate.
Let's look at our sample database.Each author is related one or more articles, the articles written by that author.The author table is the parent table, andthe article table is the child table.
Note that an author can write many articles, but an article has only one author.This is a natural consequence of it being a one-to-many relation.The scenario where an articleis co-authored by more than one authorinvolves a many-to-many relation,the contemplation of which, like the reality thatin real life a child has more than one parent,is beyond the scope of this discussion; it will only confuse younow.
Primary and Foreign Keys
Relational databases implement RIusing primary keys and foreign keys.Other than their special role in RI,these "keys" are like any other database fields -- theycontain data values.
The parent table contains the primary key andthe child table contains the foreign key which references (or "points" to, if you prefer) the parent table.
Because each child relates to only one parent,the child table needs only one foreign key.So where are the "many" child records in aone-to-many relation? They'rein the multiple rows of the child tablethat all have the same data value intheir foreign key, all referencing the same parent.
For example, consider the following articles:
|32||Your clients need a CMS |
The Tao of Testing
|88||640 x 480 Isn't Dead Yet |
Inside the evolt.org Rebuild
Above, you can see the one-to-many relation clearly.Now look at it the way you normally seeit actually stored in the article table:
|Your clients need a CMS||32|
|640 x 480 Isn't Dead Yet||88|
|The Tao of Testing||32|
|Inside the evolt.org Rebuild||88|
Now you can see more clearly that each article isrelated to only one author. This is the "many-to-one"reversal of the one-to-many relationship.It amounts to the same thing. You can always say somethingabout a one-to-many relation in two different ways.
Finally, we get down to business.
In a relational database,there are four basic actions you can perform on a table:
- Select one or more existing rows from the table
- Insert one or more new rows into the table
- Update one or more rows in the table
- Delete one or more rows from the table
These actions, known as SIUD, are carried outby the -- wait for it -- Select, Insert, Update, and Delete SQL statements. (These actions are sometimes, as in business process modelling, alsocalled CRUD, using the initials for Create, Reference, Update, and Delete.CRUD is easier to remember, but not as helpful as SIUD, becauseyou may end up getting confused. For example, there's no Reference SQL statement, but there is a Create SQL statement, although it doesn't create any data. I'm sorry I evenmentioned CRUD. I just wanted to set the record straight, point out again that simplicity rules,and make sure you were awake.)
Three of these actions -- Insert, Update, and Delete --change the contents of the database table and therefore have RI implications.Remember, you want to stay on top of changes, preferably lettingthe database itself handle them, to ensure younever have "broken links" in your data.
Now it gets a wee bit complicated.
For every relation, the Insert, Update, and Delete actionscan be performed on either the primary key (in the parent table), or the foreignkey (in the child table). There are thus a total of six possible RI actions forevery one-to-many relation:
|Parent Insert||Child Insert|
|Parent Update||Child Update|
|Parent Delete||Child Delete|
Each of these six possible RI actionsneeds to be examined when youdesign your database, because what you want to dois turn over as much of the enforcementof RI as possible to the database itself.
Now it gets a little more complicated.
For each of the six RI actions mentioned above,one of five RI rules can be implemented:
|Cascade||perform same action on related keys|
|Set Null||set related keys to null|
|Set Default||set related keys to their default value|
|None||allow action, do nothing to related keys|
Does this mean that implementing RIrequires choosing one out of a possible thirty differentcombinations of actions for each one-to-many relation??The answer is yes -- but I hasten to add that it isn't really as bad or as muchwork as it may seem!!
In fact, most database systems do not let youdefine all of the above rules. Where you cannotdefine the rule, it typically defaults either to Restrict or None.
Sample RI Action Rules
Let's say you want to insert a new article, i.e. a child insert action.In most databases, the Restrict rule would bein effect (assuming you have defined the foreign key properly,i.e. RI has been turned on).What this means is that you cannot insert an article for an author that doesn't exist; you are restricted from doing so.
This is typical behaviourfor most one-to-many relations. You have to add the parent tothe database first, and then you can add child recordsthat reference that parent.In fact, if you wanted some other rule for Child Insert, you would find it extremely tricky to implement. Trustme, it's messy. Luckily, this requirement hardly evercomes up. I shouldn't even have mentioned it.Of the five possible actions for Child Insert, only Restrict really makes sense.
Okay, I lied. Actually, there is one other action thatmakes sense, in a macabre sort of way, only because you see it all the time,not that you would really want to define itthis way on purpose (assuming you have been paying attention).
If you do not turn on RI for the author-articleone-to-many relation, the Child Insert action defaults to None.In other words, you can add an article that references any author numberyou like, the database will not check it for you,and who knows, it might result in a broken link.So leaving RI undefined for a given relationresults in Child Insert: None.
Maintaining the author-article relation involves six different actions.Below, I have chosen the RI rules that "make the most sense" to me.See if you agree or disagree with my comments:
Author Insert: None
insert new author, do nothing to related articles
This is fairly trivial, since new authors wouldn't have any articles yet (assuming other sensible RI rules are in effect).
Author Update: Cascade
update the author's primary key and also all matching article foreign keys
This is the crucial action we've been talking about that you want the database to do for you, and while you might not needto change a record's primary key often, when you need to do it,you want Cascade.
Author Delete: Restrict
delete author only if no articles have been written by this author
If you allowed the author delete to happen and left thearticles in there, they would all have broken links; Restrict prevents this.Alternatively, if you specified the Author Delete rule as Set Null, the author would be gone and the articles would be left behind but you wouldn't know who wrote them; this is the only other choice thatmakes sense to me, but maybe not as much as Restrict.(And I'm sorry, I really cannot get intoa discussion of nulls at this point.)
Article Insert: Restrict
insert new article only if the author already exists
This is the example discussed in detail earlier.
Article Update: Restrict
update article to a different author but only if the new author also already exists
Sometimes this is required to correct an input error.
Article Delete: None
delete article and do nothing to author
This makes sense, right? Note that if the article being deleted is the author's last or only one, this may result in an author with no articles -- but that's not thesame thing as a broken link.
Options for Implementing RI
If you made it this far, congratulations --things are about to get a whole lot simpler.
Standard SQL-92 syntax provides a mechanismfor setting RI rules only for the parent delete and parent update actions.All others take a default. The rules are defined whendeclaring the foreign key on the child table:
CREATE TABLE child ( columndefinitions,
[constraintname] FOREIGN KEY (fkcolumns)
REFERENCES parent [(pkcolumns)]
CASCADE SET NULL SET DEFAULT NO ACTION
CASCADE SET NULL SET DEFAULT NO ACTION
As mentioned, all other rules take defaults, andif you do not define the parent delete and parent update rules,they take defaults too:
None of these defaults is very surprising, when you think about it.In fact, the only action for the author-article relationthat we thought made sense to be different from the defaultis Author Update, which we decided we wanted as Cascade.Luckily, that is one of the rules that can be easily set by the SQL syntax.
So what's the big deal, then? If all the defaults are good,and we can change the ones we want, how will RIsimplify your database project,shorten development time,and improve the quality of your life?
The answer comes when you consider how someapplications go about maintaining relational integritywhen RI is not turned on in the database.
Suppose the application wanted to insert anarticle, and also wanted to make sure the author already exists in the database. It isnot difficult to imagine a developercoding up something like this --
- do a SELECT on the author table using the author numberfor the proposed article insert as the search key
- if the select fails, issue an error messagesuch as "author does not exist; article insert rejected"
- otherwise, proceed to INSERT the article
There is nothing inherently wrong with this approach,other than that
- it requires an additional database call, which is often much less efficient, and
- it takes extra time to do this programming
With RI turned on and Article Insert set to Restrict (the default),the developer simply codes
- INSERT the article
- test the database return code, and if it failedon an RI exception, issue the error message
It is not even always necessary totest the return code and issue a friendly error message.Assuming the foreign key constraint is named properly,the database error message will basically say somethinglike "ERROR: integrity constraint ARTICLEAUTHOR violated - parent key notfound" and this is often enough to indicate what's going on.
When designing a database, after all theentities, tables, columns and data types have been identified,you will have a number of one-to-many relations.Examine each relationin light of the six possible actions,and the five possible rules for each.Start by considering all the default rules.Pay special attention to the parent updateand parent delete rules, because those canbe changed easily, and often are.Then prepare a rule table for eachrelation, similar to the samples we described earlierfor the author-article relation:
Highlight the rules that are differentfrom the default, as above.Then pass these rule tables as specifications to your developer.
You'll save a lot of hassles, development time,and unnecessary coding. And you might just seean increase in the quality of your life.
I am indebted to ERwin, a popular data modelling tool,for first introducing me to the concept of RI actions outside the scope of what isnormally defined by SQL syntax. You wouldnot believe what you have to do, and what ERwincan generate in the way of triggers and stored procedures,to achieve non-default RI rules for actions other than parent update and parent delete.If your project requires these, give me a shout.
"Improving the quality of life" is arequirement that should be part of every databasespecification. I was taught thisin my very first database design coursefor IDMS in 1977 by William Casey of Cullinane Corporation,and it's as true today as it was then.