This paper discusses the importance of understanding the fundamental concept of functional dependency and its role in database design; how functional dependency relates to normalization in terms of the various levels of normal forms; and how normalization aids in creating a database design free of data anomalies and data redundancy. And lastly, an overview of Armstrong's Axioms is explored.

By definition, functional dependency occurs when one attribute in a relation uniquely determines another attribute. That is, for a given relation, attribute B is functionally dependent on attribute A if, for every valid value of A, that value of A uniquely determines the value of B. The functional dependence of B on A is represented by A'B. For example: In a table listing Student characteristics including Student Identification Numbers (ID) and name, it can be said that name is functionally dependent upon ID because a student's name can be uniquely determined from their ID. However, the reverse statement (name -> ID) is not true because more than one student can have the same name but different student IDs.

Functional dependencies are determined by the semantics of the relation; in general, they cannot be determined by inspection of an instance of the relation. That is, a functional dependency (FD) is a constraint and not a property derived from a relation. A functional dependency is a property of the semantics or meaning of the attributes. There are two agendas for the study of functional dependency:
  • Attempting to automate database design by finding computationally feasible ways to manipulate FDs.
  • Investigating the principles that bind the particular pattern of FDs in a relational scheme to the real-world observables
In database design, a functional dependency helps us to detect poor ER design. If the generated relations are not in desired normal form, the problem can be fixed in the E-R diagram. That is, normalization can be done as a part of data modeling. Alternately normalization can be done formally on the relations generated fro the ER model. Thus functional dependency helps in maintaining a relationship among the data’s present in the table. Any columns that are not dependent on the primary of the parent table should be segregated. Every column should be dependent on the primary key of the parent table.

Normalization theory is based on the functional notion of functional dependency and moreover it helps in simplifying the structure of tables. The value of an understanding of functional dependency will become apparent as we work through the normalization process. Normalization as stated earlier addresses how data should be organized within a database in order to make the database as compact and easy to manage as possible and to ensure that it produces consistent results. Normalization rules provide guidelines for defining the design of a relational database. These rules specify how a database should be divided into tables and how the tables should be linked together. The prime objectives of normalization process are to eliminate data redundancy, organize data efficiently, reduce the potential for anomalies during data operations and improve data consistency.

Sometimes normalized logical design databases are de-normalized at the physical design stage to facilitate real world experiences (i.e. performance and practicality). For example, if the relational database does not support non-atomic attributes, and chooses to de-normalize for performance, then the user must create multiple columns (e.g. Student 1, Student 2, etc.) to represent the data structure. Therefore, normalization is the formalization of the design process to make a database compliant with the concept of a Normal Form. There are several levels of the Normal Form, and each level requires that the previous level be satisfied as discussed below:

The current normal forms that were defined by E. F. Codd, inlcude: first normal form (1nf), normal form (1NF), second normal form (2NF), third normal form (3NF), fourth normal form (4NF), and Boyce/Codd Normal Form (BCNF).

A table is in 1NF when all the key attributes are defined with no repeating groups in the table (i.e. each row/column intersection can contain only one value – not a set of values), and all attributes are dependent on the primary key.  That is, each field in a table contains different information. For example, in an employee list, each table would contain only one SSN field. Nevertheless, a table in 1NF could perhaps include partial dependencies that are based on only part of a composite primary key. Transitive dependency, which is dependency of nonprime attribute on another nonprime attribute, may also exist. Therefore, 1NF removes repeating groups.

A table is in 2NF when it is in 1NF and it includes no partial dependencies; that is, no attribute is dependent on only a portion of the primary key.  Nonetheless, there is a possibility that a table in 2NF might still exhibit transitive dependency; that is, one or more attributes may be functionally dependent on nonkey attributes (not part of the primary key). Therefore, each field in a table that is not a determiner of the contents of another field must itself be a function of the other fields in the table. Thus, in this stage, partial dependencies are removed.

A table is in 3NF when it is in 2NF and it contains no transitive dependencies. For example, if two tables both require a SSN field, the SSN information would be separated into a separate table, and the two other tables would then access the SSN information via an index field in the SSN table. Any change to a SSN would automatically reflect a change in all tables that link to the SSN table. At this stage transitive dependencies are removed.

The Boyce/Codd Normal Form (BCNF) is a table that addresses dependencies between columns that are part of a Candidate Key. At this stage, all attributes are dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like A->A). If the normalization rules were applied to a Candidate Key other than the one chosen as the Primary Key would give a different result, so BCNF addresses such problems. Therefore, BCNF can be violated only if the table contains more than one candidate key. BCNF aids in removing the remaining functional dependency anomalies from the prior NFs.

Fourth normal form (4NF) requires that there be no non-trivial multi-valued dependencies of attribute sets on something other than a superset of a candidate key. A table is said to be in 4NF if and only if it is in the BCNF and multi-valued dependencies are functional dependencies. The 4NF removes unwanted data structures: multi-valued dependencies. While the fifth normal form (5NF), removes the remaining anomalies.

Armstrong’s Axioms were developed and published in 1974 by William W. Armstrong as rules of inference for functional dependencies, i.e. used infer all the functional dependencies (FD) on a relational database. Armstrong’s axioms are a sound and complete list of properties of functional dependencies. The axioms are sound in that they generate only functional dependencies in the closure of a set of functional dependencies (denoted as F+) when applied to that set (denoted as F). They are also complete in that repeated application of these rules will generate all functional dependencies in the closure F+.

That is, if you apply them repeatedly to a given set of (FDs), then every new FD generated belongs to the closure (sound) every FD in the closure will eventually be generated (complete). They are used in relational database maintenance when analyzing the functional dependencies and normal form of a database. Changing normal forms or doing schema refinement requires manipulation of these axioms.


Reference:

Rob, P. & Coronel, C. M. (2004). "Database Systems: Design, Implementation, &                  Management 6th Ed." Massachusetts: Thomson Technology

Stair R., Reynolds, G. (2006). “Principles of Information Systems, 7th Ed.” Massachusetts:
                  Thomson Technology

Greenwald, R., Stackowiak, R., & Stern, J. (2004). Oracle essentials 3rd. Oracle 10
Sebastropol, CA: O'Reilly.

Normalization
  • http://www.webopedia.com/TERM/N/normalization.html
  • http://en.wikipedia.org/wiki/Database_normalization
  • http://www.management-hub.com/database-design-normalization.html

Armstrong’s Axioms
  •  http://www.answers.com/topic/armstrong-s-axioms
  • http://en.wikipedia.org/wiki/Armstrong's_axioms#Axiom_of_reflexivity