18

Learn about the First Normal Form and Database Design

First Normal Form

This is the second in a series of posts teaching normalization.

The first post introduced database normalization, its importance, and the types of issues it solves.

In this article we’ll explore the first normal form. For our examples, we’ll use the Sales Staff Information shown below as our starting point. As we described in the last article, there are several problems preserving the information in this form. Normalizing the data enables us to eliminate any duplicate data as well as modification anomalies. Unormalized Data

1NF – First Normal Form Definition

The first step to constructing the right SQL table is to ensure that the information is in its first normal form. When a table is in its first normal form, searching, filtering and sorting information is easier. The rules to satisfy the 1st normal form are:

  • When the data is in a database table.  The table stores information in rows and columns where one or more columns, called the primary key, uniquely identify each row.
  • Each column contains atomic values, and should be not repeating groups of columns.

Tables cannot contain sub-columns in the first normal form. That is, you cannot list multiple cities in one column and separate them with a semi-colon.

When a value is atomic, the values cannot be further subdivided.  For example, the value “Chicago” is atomic; whereas “Chicago; Los Angeles; New York” is not.

Related to this requirement is the concept that a table should not contain repeating groups of columns such as Customer1Name, Customer2Name, and Customer3Name.

Unormalized Data - Repeating Groups

Once recurring customer columns are put into your own table, our example table is converted into its first normal form. The following is shown:

Data Model in First Normal Form

The repeated column groups in the Customer Table are now linked to the EmployeeID Foreign Key.   As described in the Data Modeling lesson, a foreign key is a value that matches the primary key of another table.

In this case, the customer table contains the corresponding EmployeeID for the SalesStaffInformation row. Here is our data in the first normal form.

First Normal Form Example Data

This design is superior to our original table in several ways:

  1. The original design limited each SalesStaffInformation entry to three customers.  In the new design, the number of customers associated to each design is practically unlimited.
  2. The Customer, which is our original data, is nearly impossible to sort. You could, if you used the UNION statement, but it would be cumbersome.  Now, it is simple to sort customers.
  3. The same holds true for filtering on the customer table.  It is much easier to filter on one customer name related column than three.
  4. The insert and deletion anomalies for Customer have been eliminated.  You can delete all the customer for a SalesPerson without having to delete the entire SalesStaffInformaiton row.

Modification anomalies remain in both tables, but these are fixed once we reorganize them as 2nd normal form.

More tutorials are to follow! Remember, I want to remind you all that if you have other questions you want answered, then post a comment or tweet me.  I’m here to help you. What other subjects would you like to read more about?

Kris Wenzel
 

Kris Wenzel has been working with databases over the past 28 years as a developer, analyst, and DBA. He has a BSE in Computer Engineering from the University of Michigan and a MBA from the University of Notre Dame. Kris has written hundreds of blog articles and many online courses. He loves helping others learn SQL.

  • martin says:

    transitively … is all you have said about 3rd normal form and google doesnt even clearly define that word.

  • may says:

    that is very useful for beginner

  • Radek O. says:

    This is one of the best explanations publicly available. Great stuff, thanks heaps!

  • Stephan says:

    I don’t believe you should have the first and last name in your column SalesPerson. I believe 1NF is supposed to have atomic values

    • Hi,

      There is confusion on atomicity. It is now accepted that being atomic, at least when applied to databases, doesn’t mean indivisible. If that where the case then we would be compelled to break out date and time stamps into their component fields seconds, minutes, hours, days, months, and years.

      Check out this article for more information: http://en.wikipedia.org/wiki/First_normal_form (See the section on atomicity)

  • Kimete says:

    Hi Chris,

    I want to ask you, how you add to table Customers: columns CustormerCity and PostalCode because they are not in first tabel SalesStaff. Or maybe they ar part of database?
    Where you are based for this?
    If you cane explan this?

    Thankyou!

    All the best,

    Kimete

  • Christian Meyer says:

    This was really helpful, thanks for writing. I’m wondering what happens when multiple employees are associated with multiple customers. For example,
    Employee Customer
    123 Ford
    124 Ford
    123 Toyota
    124 Jeep
    How do you deal with the primary and foreign keys in cases like this? Sorry if this question is out of the scope of this article. Although, I do feel that this scenario could arise pretty commonly.
    Also should note that I haven’t read up on 2NF/3NF yet, looking for forward to it though!

    • I think you’ll see that the 2nd and 3rd normalization rules take care of that situation.

      Long story short, you’ll find yourself with three table at some point:
      employee
      employee-customer
      customer

  • black hawk says:

    hey,
    can a table be in 1NF if it has missing values?

  • Khalid says:

    Excellent site for database

  • Neil says:

    Hey Kris thank you for the article it is very useful. I have a question about what exactly ‘No repeating groups of columns’ means. What if, for example, every employee has exactly one ‘key client’ and one ‘secondary client’. Could I then have one column under employees called ‘keyClient’ and one called ‘secondaryClient’? My actual example is modelling court judgments where there is one party and one counter-party. I.e., how different do two attributes have to be before I can include them both and still be in FNF?

    • Technically speaking those clients should be placed in a separate table.

      Here is a quick example… Prob. need case somewhere in there, but hopefully you get the drift…

      Judgement
      ———————
      JudgementID (PK)
      Amount
      JudgementDate
      etc…

      JudgementClient
      ————
      ClientID
      JudgementID
      ClientPrecidence <== name to be whatever makes send for first, second, or third client... Client -------------------- ClientID Name Address

  • Half A Physicist says:

    Great introduction. Thank you!

  • AK says:

    I am not able to see any images in this tutorial page. Please help!

  • report this ad
    >