Beginning Database Design Part I

Posted on 16 May 2002

in Backend

by Joel D Canfield (spinhead)

Rated 3.9 (Ratings: 10)

Want more?

More articles in Backend

Joel D Canfield

Member info

User since: 18 Apr 2001

Articles written: 8

Creating and Naming Database Tables and Fields

This series of articles is based on an outline I made while reading "Database Design for Mere Mortals" by Michael J. Hernandez. Mike graciously gave permission for its use in this derived work. I highly recommend the book to anyone who wants a good grasp of database fundamentals. The series is intended to familiarize the uninitiated with the concepts of database design. I'll include an extensive bibliography for those interested in more technical or in-depth coverage.

Why Use A Database?

Data overload is a common problem in business today. Corporations and individuals have plenty of raw data, but can't always find it or aren't aware that they even have it. Raw data must be filtered and organized to become useful information. Databases are a primary tool for the task; a tool which takes advantage of the speed and power of modern computers.

Goals

Creating a database can be a daunting task to a beginner. As with most complex projects, good database design can be broken down into discrete steps which are easier to manage.

In a future article, we'll discuss the information-gathering process necessary before database design even begins. For now, let's talk about something less abstract — tables and fields. For the examples in this article I'll try to provide samples which reflect real-world situations.

Terminology

First, some terminology. We won't be using the highly technical terms "relations" and "tuples"; even in the manuals they introduce the terms and then go right back to "tables" and "rows" so we'll skip the interim step. However, the technical term "attributes" works well to describe the concept embodied in "columns" so we'll use whichever term seems more understandable in context. And each "record" makes up one "row," another familiar term. Again, we'll use whichever fits best conceptually. Finally, for each row there will be an "attribute" in each row. This is a "field."

A database is made up of one or more tables. Each table is made up of one or more rows of information. Each row contains various attributes, one in each column.

From Flat Spreadsheet to Relational Database

If you're familiar with spreadsheet software, you're familiar with tables of rows and columns. For instance, you might have columns called "Name," "Company," "Address," "Specialty," and "Phone." This works fine for a while, but you soon realize that if you have multiple entries from the same company, the same specialty, or perhaps even the same name, you're duplicating entries. One of the geeky things database administrators get excited about is efficiency. It's not efficient to store the company name "Spinhead Web Design" twelve times; it's the same every time, right? And if you have two contacts at Spinhead who answer the main phone number, you've entered that twice, haven't you? And look; you have "Database Administrator" listed for three different contacts. This obviously isn't a big issue with small databases, but in your plans for world domination, small databases will rapidly be replaced by huge databases. Plan ahead. Build efficiently. [1]

How can we make the spreadsheet more efficient?

If there were a way to use multiple spreadsheets, one for each type of data, and have the spreadsheets share information, you could have the "People" spreadsheet linked to the "Skills" and "Companies" spreadsheets. Each spreadsheet would only contain unique records, not duplicates [2]. If two people had the same skill or worked for the same company, they could both refer to the same record the appropriate spreadsheet.

Congratulations. You now understand the basic concept behind a relational database.

Tables and Fields

Each of our spreadsheets compares to a table in our "Contacts" database. But how exactly do we decide what each field should contain? And which fields go in which table?

In the real world, a certain amount of research precedes any real database design work. We'll discuss it further in a future article, but essentially, through interviews with the intended users and analysis of the existing processes, we have collected a long and probably disorganized list of possible candidates. Through two processes, we'll decide which will become fields, which will become tables, and which fields go in which tables.

First, look for nouns which identify an object or event but do not modify some other object or event. This list is our preliminary list of Tables. At the same time, we're doing the opposite; looking for nouns which do modify another object or event. This list becomes our preliminary list of Fields. For example, "vendor" would go on the "Tables" list, but "phone number" is likely a characteristic of "vendor" and belongs on the "Fields" list. "Tables" usually refer to collections of things; something that can be broken down into smaller components. "Fields" would be those components. Don't worry about precise names yet. We'll stick to simple descriptive words or phrases for now.

So far we haven't done anything to organize our fields into tables.

Assigning Fields to Tables

Once we're certain the our "Tables" and "Fields" lists are complete, it's time to assign each field to the appropriate table. For each entry on the preliminary field list, choose the table which it describes or to which it applies. If some fields don't seem to fit any table, set them aside for the moment. We'll be altering our lists as we proceed, so don't worry about orphaned tables or fields.

Now, let's take a closer look to ensure that our tables and fields are as efficient as possible at this stage. A short checklist for an ideal table:

Represents only a single subject

Has a field (or pair of fields) which uniquely identify each row (the Primary Key; more on that later)

Does not contain unnecessary duplicate fields

No repetition of the same type of value

No fields which belong in other tables (we'll address links between tables later)

And for an ideal field:

Represents a characteristic of a table subject

Contains a single value

Is atomic; that is, not multi-part [3]

Is not calculated or concatenated

Is unique throughout the database structure

Naming Tables and Fields

Once we've reached this point, let's establish our table and field names. Names for all database elements should be:

Unique

Clear, meaningful, unambiguous

Short

Some restrictions for naming tables:

No acronyms or abbreviations

No proper names or words which unduly limit the data which can be entered

Should not imply more than one subject

Should be plural

And for naming fields:

No acronyms

Use abbreviations only if clear and meaningful

Should not imply more than one subject

Should be singular

You'll probably see some duplicate field names, such as "Name" in both the "Companies" and "People" tables. Let's make them unique across the database. You might choose "ContactName" and "CompanyName" but whatever you use, stick to the guidelines above, and be consistent.

Review

Database tables and fields take the simple spreadsheet concept to another dimension. Establishing tables which refer to a single subject, populated with fields which refer specifically to that subject, will provide a sound foundation for future growth.

You'll be using table and field names frequently in your database work. Make sure they're succinct, but not arcane.

Give it a try. On paper, make up some sample lists of subjects and characteristics. Group the characteristics by subject, and try your hand at meaningful names.

Next Steps

We still don't have our tables talking to each other. Where's that efficient information sharing we talked about? And what prevents us from entering "Database Administrator" in the "Skills" table twice, detracting from our neat and tidy structure?

Up Next: The Keys to Data Validation, Integrity, and Efficiency

Notes

[1] From a practical perspective, effiency of this depth adds little value to small, lightly queried databases. It is presented with the hope that if we strive for perfection our results will surpass those achieved by striving for mediocrity. (return)

[2] Well, if you want to split hairs, of course there's duplication every time you create a reference to the same data. But the repetition of the simple reference is more efficient than the repetition of the potentially complex data itself. (return)

[3] This is, to a degree, subjective. Is a phone number atomic, or is it composed of area code, exchange, and suffix? General usage may suggest an answer, but how the data will be used should be the determining factor; not our preconceived ideas about what a phone number is. (return)

I'm a writer. Non-fiction, fiction, songs. Sometimes code, too.

Start of page header

Other Fine Evolt.org Sites

Navigation Starts

Submit

Article Categories

Highest rated articles

Help Support evolt.org

Main Page Content