Thursday, July 21, 2016

Normalization with Example

Why do we need to do normalization?

To eliminate redundancy of data i.e. having same information stored at multiple places, which eventually
 be difficult to maintain and will also increase the size of our database.
With normalization we will have tables with fewer columns which will make data retrieval and insert, 
update and delete operations more efficient.


What do we mean when we say a table is not in normalized form?

Let’s take an example to understand this,
Say I want to create a database which stores my friends name and their top three favorite artists.
This database would be quite a simple so initially I’ll be having only one table in it say friends table. 
Here FID is the primary key.


FIDFNAMEFavoriteArtist
1SrihariAkon, The Corrs, Robbie Williams.
2ArvindEnigma, Chicane, Shania Twain

This table is not in normal form why?

FavoriteArtist column is not atomic or doesn’t have scalar value i.e. it has having more that one value.
Let’s modify this table

FIDFNAMEFavoriteArtist1FavoriteArtist2FavoriteArtist3
1SrihariAkon.The CorrsRobbie Williams.
2ArvindEnigmaChicaneShania Twain

This table is also not in normal form why?

We have now changed our table and now each column has only one value!! (So what’s left?)
Because here we are having multiple columns with same kind of value.

I.e. repeating group of data or repeating columns.

So what we need to do to make it normal or at least bring it in First Normal Form?
  1. We’ll first break our single table into two.
  2. Each table should have information about only one entity so it would be nice if we store our friend’s 
information in one table and his favorite artists’ information in another
(For simplicity we are working with few columns but in real world scenario there could be column
 like friend’s phone no, email , address and favorites artists albums, awards received by them, 
country etc. So in that case having two different tables would make complete sense)


FIDFNAME
1Srihari
2Arvind
FIDFavorite Artist
1Akon.
1The Corrs
1Robbie Williams
2Enigma
2Chicane
2Shania Twain

FID foreign key in FavoriteArtist table which refers to FID in our Friends Table.

Now we can say that our table is in first normal form.

Remember For First Normal Form

1...Column values should be atomic, scalar or should be holding single value
2...No repetition of information or values in multiple columns.

3...So what does Second Normal Form means?


 Second normal form our database should already be in first normal form and every non-key column must
 depend on entire primary key.

Here we can say that our Friend database was already in second normal form l.
Why?

Because we don’t have composite primary key in our friends and favorite artists table.

Composite primary keys are- primary keys made up of more than one column. But there is no such thing
 in our database.
But still let’s try to understand second normal form with another example
This is our new table
GadgetsSupplierCostSupplier Address
HeadphoneAbaci123$New York
Mp3 PlayerSagas250$California
HeadphoneMayas100$London

In about table ITEM+SUPPLIER together form a composite primary key.

Let’s check for dependency

If I know gadget can I know the cost?

No same gadget is provided my different supplier at different rate.

If I know supplier can I know about the cost?

No because same supplier can provide me with different gadgets.

If I know both gadget and supplier can I know cost?

Yes than we can.

So cost is fully dependent (functionally dependent) on our composite primary key (Gadgets+Supplier)

Let’s start with another non-key column Supplier Address.

If I know gadget will I come to know about supplier address?

Obviously no.

If I know who the supplier is can I have it address?

Yes.

So here supplier is not completely dependent on (partial dependent) on our composite primary key
 (Gadgets+Supplier).

This table is surely not in Second Normal Form.

So what do we need to do to bring it in second normal form?

Here again we’ll break the table in two.
GadgetsSupplierCost
HeadphoneAbaci123$
Mp3 PlayerSagas250$
HeadphoneMayas100$
SupplierSupplier Address
AbaciNew York
SagasCalifornia
MayasLondon

We now how to normalize till second normal form.

But let’s take a break over here and learn some definitions and terms.

Composite Key: -Composite key is a primary key composed of multiple columns.
Functional Dependency – When value of one column is dependent on another column.

So that if value of one column changes the value of other column changes as well.

e.g. Supplier Address is functionally dependent on supplier name. If supplier’s name is changed in a record
 we need to change the supplier address as well.

S.Supplier–àS.SupplierAddress

“In our s table supplier address column is functionally dependent on the supplier column”

Partial Functional Dependency – A non-key column is dependent on some, but not all the columns in a composite primary key.

In our above example Supplier Address was partially dependent on our composite key columns (Gadgets+Supplier).

Transitive Dependencytransitive dependency is a type of functional dependency in which the value in a non-key column is determined by the value in another non-key column.

With these definitions in mind let’s move to Third Normal Form.
For a table in third normal form
  • It should already be in Second Normal Form.
  • There should be no transitive dependency, i.e. we shouldn’t have any non-key column depending on any other non-key column.
Again we need to make sure that the non-key columns depend upon the primary key and not on any other non-key column.

AlbumArtistNo. of tracksCountry
Come on overShania Twain11Canada
HistoryMichael Jackson15USA
UpShania Twain11Canada
MCMXC A.D.Enigma8Spain
The cross of changesEnigma10Spain

Although the above table looks fine but still there is something in it because of which we will normalize it further.

Album is the primary key of the above table.

Artist and No. of tracks are functionally dependent on the Album(primary key).

But can we say the same of Country as well?

In the above table Country value is getting repeated because of artist.

So in our above table Country column is depended on Artist column which is a non-key column.

So we will move that information in another table and could save table from redundancy i.e. repeating values of Country column.

AlbumArtistNo. of tracks
Come on overShania Twain11
HistoryMichael Jackson15
UpShania Twain11
MCMXC A.D.Enigma8
The cross of changesEnigma10
ArtistCountry
Shania TwainCanada
Michael JacksonUSA
EnigmaSpain


Normally this is considered enough and we don’t really go on applying the other normal forms. 

Most of real-world application has databases which are in third normal forms.
=========================================================================
Normalization In other words: 

NameDescription
First Normal FormAn entity is in First Normal Form (1NF) when all tables are two-dimensional with no repeating groups.

A row is in first normal form (1NF) if all underlying domains contain atomic values only. 1NF eliminates repeating groups by putting each into a separate table and connecting them with a one-to-many relationship. Make a separate table for each set of related attributes and uniquely identify each record with a primary key.
  • Eliminate duplicative columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
Second Normal FormAn entity is in Second Normal Form (2NF) when it meets the requirement of being in First Normal Form (1NF) and additionally:
  • Does not have a composite primary key. Meaning that the primary key can not be subdivided into separate logical entities.
  • All the non-key columns are functionally dependent on the entire primary key.
  • A row is in second normal form if, and only if, it is in first normal form and every non-key attribute is fully dependent on the key.
  • 2NF eliminates functional dependencies on a partial key by putting the fields in a separate table from those that are dependent on the whole key. An example is resolving many:many relationships using an intersecting entity.
Third Normal FormAn entity is in Third Normal Form (3NF) when it meets the requirement of being in Second Normal Form (2NF) and additionally:
  • Functional dependencies on non-key fields are eliminated by putting them in a separate table. At this level, all non-key fields are dependent on the primary key.
  • A row is in third normal form if and only if it is in second normal form and if attributes that do not contribute to a description of the primary key are move into a separate table. An example is creating look-up tables.
Boyce-Codd Normal FormBoyce Codd Normal Form (BCNF) is a further refinement of 3NF. In his later writings Codd refers to BCNF as 3NF. A row is in Boyce Codd normal form if, and only if, every determinant is a candidate key. Most entities in 3NF are already in BCNF.

BCNF covers very specific situations where 3NF misses inter-dependencies between non-key (but candidate key) attributes. Typically, any relation that is in 3NF is also in BCNF. However, a 3NF relation won't be in BCNF if (a) there are multiple candidate keys, (b) the keys are composed of multiple attributes, and (c) there are common attributes between the keys.
Fourth Normal FormAn entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal Form (3NF) and additionally:
  • Has no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship within an entity if the one-to-many attributes are independent of each other.
  • Many:many relationships are resolved independently.
Fifth Normal FormAn entity is in Fifth Normal Form (5NF) if, and only if, it is in 4NF and every join dependency for the entity is a consequence of its candidate keys.

No comments:

Post a Comment