First Things First – What Is Database Normalization?
Database normalization is a process by which you ensure the integrity of a relational database by preventing problems such as redundancies. Simply put, if you don’t efficiently organize your data (data modeling), you will be in for very unpleasant surprises when it comes time to retrieve, update or delete that data. I’m going to omit the cryptic definitions of the "low level" forms I’ll cover: first normal form, second normal form, and third normal form. Simple explanations that lead to sound understanding are better than technical definitions that only add to confusion.
First, you’ll want to consider what information (data) you’ll need to store and what questions (queries) you’ll need to ask the database. You should also know what a primary and foreign key are and how to assign them whether you’re setting up tables in a graphical user interface (GUI) such as phpMyAdmin or on the command line. A primary key is a unique identifier for a record; it must always have a static value. A foreign key in one table represents the primary key in another.
Remember, we’re talking about relational databases; there are relationships between tables in this kind of structure. I strongly suggest getting an understanding of the three types of relationships that can exist between any two tables: one-to-one, one-to-many, and any-to-many. If only one item in table X applies to only one item in table Y, you have a one-to-one relationship. For example, in the United States, one social security number should apply to only one unique individual. If one item in table X can apply to multiple items in table Y, you have a one-to-many relationship. For example, one birth date can apply to several people who were all born on the same day. Finally, if more than one item in table X can apply to more than one item in table Y, you have a many-to-many relationship; try to totally eliminate this third type of relationship because of the problem of redundancy.
First Normal Form
Database normalization begins with adjusting your database schema (structure) comply with the first normal form which dictates that every column must be atomic or indivisible (storing only one value) and that no table can have groups of related data that are repeated. You can see that the fictitious customers table below (click on image) is not in first normal form; there are two values (first and last name) in a single column (name) and there are multiple values in the address column.
You can see the redundancy problem as well; what if you had two customers with the same name or two customers with the same address? This problem can easily be fixed by creating columns to hold only one value. For example, instead of one column for the name, have two columns, one for the first name and one for the last. The same solution of creating more columns, each to hold only one value, works for the multiple-value address column as well to comply with the first normal form in your database normalization.
Second Normal Form
First, make sure your database is at least in first normal form. Next, make sure every column in your table that isn’t a key itself depends on the table’s primary key. Basically, you’re creating as many tables as are needed to avoid potential redundant data. The tendency to lump everything together in as few tables as possible is probably one of the worst obstacles to achieving database normalization.
The books table below is an example of a table that is not in the second normal form. Why? Because the author column is not a key; yet, it does not depend on the table’s primary key (bookID). You can see the redundancy problem; the author John Doe, if he has written several books, could be repeated multiple times. A solution to this problem would simply be to create a separate authors table; "authorID" would make a good primary key column. Remember to keep author last name and author first name in two separate columns to remain in first normal form.
bookID title year published author
850 Best Seller1 2000 John Doe
851 Best Seller2 1988 Peter Pan
852 Best Seller3 2004 John Doe
Third Normal Form
Database normalization to the third normal form might or might not be necessary in all situations. However, you should at least be familiar with what it means to have 3NF (third normal form) compliant tables. Also, keep in mind that intense database normalization does not end at the third normal form. Going beyond this point can become very complicated and require the services of a database specialist. It’s very important to know when you have the knowledge to achieve database normalization and when you need to leave the process to a specialist.
You have achieved third normal form if you’ve achieved second normal form and every column that is not a key is independent. What does this mean? Let’s say you have one table that holds data for customers: first and last name, address, city, state, province, postal code, and maybe even country. If the name of a country changes (it does happen from time to time), all records for customers living in that nation would have to be updated.
It’s very possible to "automatically" achieve third normal form simply by having achieved first normal form and second normal form. If you don’t see issues, don’t create them. Also, keep in mind that sometimes you might choose to cease the database normalization process after tables are in the second normal form because 3NF compliancy calls for the creation of more tables and slows down the running of queries. The decision is yours or that of the specialist who must decide if stopping at the second normal form could come back to haunt you.