Analyzing E.F. Codd's paper - "A Relational Model of Data for Large Shared Data Banks".

Summary
In 1970, E.F. Codd of IBM Research published a paper that led to a new way for computers to manage information. His paper, "A Relational Model of Data for Large Shared Data Banks,” proposed a new architecture for storing, managing and interacting with digital data. This new relational model freed application developers from having to know details about the data being managed. This milestone paper sketched out a method for using relational calculus and algebra to enable the storage and retrieval of large amounts of information, and laid the foundation for the relational model.

This publication rested on two key points: It provides a means of describing data with its natural structure only i.e. without superimposing any additional structure for machine representation purposes. Accordingly, it provides a basis for a high level data language, which will yield maximal independence between programs on the one hand and machine representation on the other (Codd, 1970). In other words, the relational model served the following purposes:

It abstracted the representation of data from its physical storage and strived to manipulate data using this abstract model.
Provided independence of the data from the physical representation of the data, of the relationships between the data, and of implementation considerations related to efficiency and like concerns.
It also provided a mathematical basis for the treatment of derivability, redundancy, and consistency of relations by breaking data into distinct non-duplicating sets that could then be related an infinite number of ways to produce an infinite number of representations.
It increased consistency of data - e.g. if you change a name of a customer -- it would change in all reports you did about that customer - because that piece was maintained in only one location - but generated numerous views or representations of the data.
The provision for a high-level nonprocedural language for querying data. Thus, the burden of searching and indexing potentially large volumes of data was removed from the user and placed on the database management system itself.

DBMS vendors that first implemented the relational model

The current software market consists of relational databases based on the model proposed in the late 1960s and early 1970s. Although the relational model was originally proposed and developed at IBM, it was a government-funded effort at the University of California at Berkeley (UC-Berkeley) that disseminated the idea widely and gave it the intellectual legitimacy required for broad acceptance and commercialization. The first commercially available database based on the relational model was released in 1976 by Honeywell Information Systems, Incorporated, and the first database built on the SQL standard, which IBM also invented, was released by Oracle in the early 1980’s.

The early history of the Oracle Corporation

Larry Ellison, CEO and founder of Oracle, was inspired by this publication and wanted Oracle to be compatible with it, but IBM stopped this by keeping the error codes for their DBMS secret. He founded Oracle in 1977 under the name Software Development Laboratories. In 1979 SDL changed its name to Relational Software, Inc. (RSI). In 1983, RSI was renamed Oracle Corporation to more closely align itself with its flagship product Oracle database with Robert Miner as senior programmer. Currently the database industry generates about $8 billion in annual revenue. U.S. companies--including IBM Corporation, Oracle Corporation, Informix Corporation, Sybase Incorporated, Teradata Corporation (now owned by NCR Corporation), and Microsoft Corporation--dominate the world market

Analysis

E.F. Codd’s "A Relational Model of Data for Large Shared Data Banks," is considered to be one of the most important publications in database evolution. Relational database systems prove that physical data independence is achievable. Moreover, relational views offer vastly enhanced logical data independence, relative to CODASYL. As a result set-at-a-time languages offer substantial programmer productivity improvements, relative to record-at-a-time languages. The real power of the relational model comes from the formal mathematical representation for data and the fact that relations exist within the data, but the data itself is not tied to any particular view. That is, it is not necessary to rely on any built-in navigational information that may be part of the domain model. The data can be freely indexed and queried as a whole. This provides incredible power to businesses, which warehouse large volumes of information and often need to do ad-hoc sorts of queries or data mining on that data

Likewise, Edgar Codd discusses what he sees as the advantages in modeling data by use of mathematical relations compared to mathematical graphs of trees or networks. Relations are often represented as tables of rows and columns. Trees are often visualized as nested folders and documents. The network graph, seen by Codd as overly complex and a cause of some of the problems he was addressing, can be visualized as a web. While the directed-graph, might be a more complex mathematical structure than a relation, however, this data model is catching on as “grid computing” becomes more and more main stream.

Moreover, Codd’s proposed languages are based on his mathematical theory that are not the right ones or not user friendly in terms of ease of learning. As such SQL and QUEL database languages are much more user friendly. However, Codd’s paper gave birth to numerous database prototypes. Some like the SQL database became a reality and are based on the characteristics of relational databases, making it possible to perform just about any manipulation needed on the underlying database structure. Today, virtually every relational database offers the ability to respond to SQL calls.

Codd also introduced the term "normalize" to refer to removing nonsimple domains, such as lists or tables of data often referred to as "repeating groups." He is very clear in this paper that a relation could include repeating groups, but that normalizing it would make the data model simpler for some purposes. He stated that the simplicity of the array representation, which becomes feasible when all relations are cast in normal form, is not only an advantage for storage purposes but also for communication of bulk data between systems which use widely different representations of data. (Codd, p. 381). Its quite true that when you normalize your data model, you can produce greater flexibility in your design; ensure that attributes are placed in the proper tables; reduce data redundancy; increase programmer effectiveness; lower application maintenance costs; and maximize stability of the data structure. But the procedures for normalization were not clearly laid out.

As a result, there are two problems with normalization theory when applied to real world database design problems. First, you’ll want to know how to get an initial set of tables. Normalization theory doesn’t answer this important question. Second, and perhaps more serious, normalization theory was based on the concept of functional dependencies, and even real world database administrators could not understand this construct. Hence, database design using normalization seemed “doomed from start.”

Abstracted representation of data from its physical storage fostered the ability to create “database views”, which allow the data to be reorganized logically without physically affecting the data. A view can be as simple as a query to select only certain columns and rows from a single table. Views can also be very complex. For example, a view might show date-sensitive job cost information by pulling data from multiple tables (and even other views) and performing complex calculations to produce the results. This would be complicated if data was not only represented in its natural structure but also in the physical structure to the users.

This publication also talks about "a ‘relational model’ of data for large shared data banks," suggesting that the term "relational model" refers to an abstract view of the data in a specific database, instead of to an abstract view of data in general. In addition, the paper does not offer a concise definition of the term relational model, nor of the term data model. The paper implies that the relational model consists only of the structural aspects; i.e. excluding the manipulative and integrity aspects.

Overall we salute E.F. Codd for his exemplary work and being the God-father of modern databases.

Reference:

Oracle corporation
http://en.wikipedia.org/wiki/Oracle_corporation

E. F. Codd, A Relational Model of Data for Large Shared Data Banks. CACM 13(6): 377-387 (1970)
http://www.acm.org/classics/nov95/toc.html

E. F. Codd, A Database Sublanguage Founded on the Relational Calculus. SIGFIDET Workshop 1971: 35-68
http://lib.nau.edu.ua/acm/disk2/db/conf/sigmod/Codd71.html

E. F. Codd, Normalized Data Structure: A Brief Tutorial. SIGFIDET Workshop 1971: 1-17
http://domino.research.ibm.com/library/cyberdig.nsf/papers/0FD9B681AADEFFA1852570CF005FDC06/$File/RC23819.pdf

Analyzing E.F. Codd's paper - "A Relational Model of Data for Large Shared Data Banks".

Posted by: Mindra

HOME PAGE

Popular

Types of Data Marts

Bible Quote Of The Day

Exam (#1Z0-033) Review Questions

Ethical Dilemmas in Military Social Work

Fighting Terrorism In Information Age Warfare

BLOG ARCHIVE

Favorite Sites

Translate This BLog

Subscribe Us

Contact Form

Analyzing E.F. Codd's paper - "A Relational Model of Data for Large Shared Data Banks".

Posted by: Mindra

Related Posts

Social:

HOME PAGE

Popular

Types of Data Marts

Bible Quote Of The Day

Exam (#1Z0-033) Review Questions

Ethical Dilemmas in Military Social Work

Fighting Terrorism In Information Age Warfare

BLOG ARCHIVE

Favorite Sites

Translate This BLog

SUBSCRIBE TO THIS BLOG

Subscribe Us

Contact Form