Normalization. (Database Management Systems) Brought to you by. Author. Manoj Pisharody BE(IT)

Similar documents

Topics. Database Essential Concepts. What s s a Good Database System? Using Database Software. Using Database Software. Types of Database Programs

Introduction. What is RAID? The Array and RAID Controller Concept. Click here to print this article. Re-Printed From SLCentral

Big Systems, Big Data

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

DATABASE SYSTEMS. Chapter 7 Normalisation

Database Design Patterns. Winter Lecture 24

IS YOUR DATA WAREHOUSE SUCCESSFUL? Developing a Data Warehouse Process that responds to the needs of the Enterprise.

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

Scalable Internet Services and Load Balancing

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

MIS ON HUMAN SELECTION AND RECRUITMENT

Designing an MIS Database for Selection and Recruitment

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

Whitepaper. Innovations in Business Intelligence Database Technology.

Scalable Internet Services and Load Balancing

Normalizing SAS Datasets Using User Define Formats

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

Data Hierarchy. Traditional File based Approach. Hierarchy of Data for a Computer-Based File

Access Tutorial 2 Building a Database and Defining Table Relationships

Introduction to Database Systems

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

Navigating the Big Data infrastructure layer Helena Schwenk

Database Design Standards. U.S. Small Business Administration Office of the Chief Information Officer Office of Information Systems Support

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

B.Com(Computers) II Year RELATIONAL DATABASE MANAGEMENT SYSTEM Unit- I

PARALLEL I/O FOR HIGH PERFORMANCE COMPUTING

Overview. Physical Database Design. Modern Database Management McFadden/Hoffer Chapter 7. Database Management Systems Ramakrishnan Chapter 16

ETL-EXTRACT, TRANSFORM & LOAD TESTING

High performance ETL Benchmark

Vertica Live Aggregate Projections

MCQs~Databases~Relational Model and Normalization

Best Practice of Server Virtualization Using Qsan SAN Storage System. F300Q / F400Q / F600Q Series P300Q / P400Q / P500Q / P600Q Series

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

SQL Server Performance Intelligence

DATABASE MANAGEMENT SYSTEMS


Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

High-Volume Data Warehousing in Centerprise. Product Datasheet

... Denormalized Data

What businesses like yours need to know about financing technology*

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

PIONEER RESEARCH & DEVELOPMENT GROUP

PERFORMANCE TIPS FOR BATCH JOBS

Data Warehousing Concepts

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

There are four technologies or components in the database system that affect database performance:

VISUAL GUIDE to. RX Scripting. for Roulette Xtreme - System Designer 2.0

How/why we use Sharepoint

Database Security. Database Security Requirements

10 Ways to Kill Performance.

Microsoft Dynamics CRM How To Enhance Global Search by Adding/Removing New Entities

Database Security. Soon M. Chung Department of Computer Science and Engineering Wright State University

7. Databases and Database Management Systems

3 SYSTEM DEVELOPMENT METHODOLOGIES

The memoq server in a Corporate Network

What is RAID and how does it work?

Rocky Mountain Technology Ventures. Exploring the Intricacies and Processes Involving the Extraction, Transformation and Loading of Data

Innovative technology for big data analytics

CONTENT STORE SURVIVAL GUIDE

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

Introduction This document s purpose is to define Microsoft SQL server database design standards.

Snapshots in the Data Warehouse BY W. H. Inmon

International Engineering Journal For Research & Development

bigdata Managing Scale in Ontological Systems

Developing Microsoft SQL Server Databases MOC 20464

SQL Server Performance Tuning and Optimization

The memoq server in a Corporate Network

Capacity Planning Process Estimating the load Initial configuration

ONLINE EXTERNAL AND SURVEY STUDIES

Tutorial on Relational Database Design

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

10. Creating and Maintaining Geographic Databases. Learning objectives. Keywords and concepts. Overview. Definitions

David Dye. Extract, Transform, Load

Computer Literacy. Hardware & Software Classification

Next Generation Data Warehouse and In-Memory Analytics

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

How To Make A Legal Archive Of Documents In The European State Of Fjordland

Extended RBAC Based Design and Implementation for a Secure Data Warehouse

ATLAS.ti 7 Survey Import

A database can simply be defined as a structured set of data

Oracle Exam 1z0-591 Oracle Business Intelligence Foundation Suite 11g Essentials Version: 6.6 [ Total Questions: 120 ]

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

A Scheme for Automation of Telecom Data Processing for Business Application

Basics of Dimensional Modeling

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

SQL Server 2012 Optimization, Performance Tuning and Troubleshooting

File Management. Chapter 12

Traditional BI vs. Business Data Lake A comparison

SQL Server. 1. What is RDBMS?

WHITEPAPER. Making the most of SQL Backup Pro

Lecture 25: Database Notes

RATIOS, PROPORTIONS, PERCENTAGES, AND RATES

Transcription:

Normalization (Database Management Systems) Brought to you by Author Manoj Pisharody BE(IT)

Introduction Normalization -Manoj Pisharody People who have started learning about databases must have frequently come up with the word Normalization in quite early stages. The name suggests that it is just about normalizing stuff. But, the story is not that simple. You need to know much about normalization in order to handle databases. It is one of the most important things to learn in database management. (It is assumed that you have the basic knowledge about databases and know how to manage tables). So, let s straight away start discussing about Normalization in thorough depth. What is Normalization? Normalization is a process of reducing redundancies of data in a database. Quite often we come across tables having a lot of bulk data with many columns. All these data might not be necessary all the time whenever we use those tables. So, a better option is to split up the bulk table into small parts and use only those tables which suit the actual purpose at a given instance of time. In this way, redundancy is reduced. To make the long story short, we can simply say that normalization is a process of dividing a big table into smaller ones in order to reduce redundancy. To understand the concept in deep, let us take up a simple example. Suppose we are to manage all the databases of a company (say, My Company). The company must keep track of all the employees, customers, product details and the salary details of all the employees. A simple and straight forward way to do this is to put all this information into a single table and manage all those simultaneously. See below.

Looking at the above table, you may feel that it is perfectly fine. After all, what is the problem with it? We have a big table; we have all the information required by the company together in a single space, thus saving a lot of memory. Well and good!!! But, now think!! If suppose, we need to frequently retrieve/update data about just the employees. Here, does the customer s information or the product details really matter. Definitely no. So, why use the entire table for using just a part of it? We need a solution to this. And the solution is normalization. What we create using normalization is often called as normal forms. Let study about the popular and most widely used normal forms. The First Normal Form To solve the above problem, the first and foremost thing to be done is to divide the entire raw database into smaller tables based on the actual groupings. When each table has been designed, a primary key is assigned to most or all tables. Note that the primary key must be a unique value, so try to select a data element for the primary key that naturally uniquely identifies a specific piece of data. So, let us take up the same previous example and prepare our First normal form. See the figure below: As we can see, the big raw database is divided into three smaller tables- one for employee, customers and products details, each. Thus, to access any one of these tables, we need not handle the other two tables. The Second Normal Form The objectives of the second normal form is to take data that is only partly dependent on the primary key and enter that data into another table. Let us take up the same example of Fig 1-2. Consider the

table- Employee. Here, the entire table has information about the personal details as well as the salary information. But, it is well understood that, to pay salary to an employee, the company does not actually need the employee s personal details. Just his emp_id is sufficient. So, why not use just that? This is the second normal form. Same goes with Customers table. We can separate customer s information from the order details. See the figure below: The Third Normal Form The third normal form s objective is to remove data in a table that is not dependent on the primary key. See the same example of Fig 1-3. For the table named Emp_Pay, the position and position_desc fields are not dependent on primary key (emp_id). So, the better option is to move both these fields to another table. See below:

Advantages As we have already seen in the sections before, normalization has many advantages. Let us list out a few ones: Greater overall database organization Reduction of redundant data Data consistency within the database A much more flexible database design A better handle on database security Enforces concept of referential integrity Disadvantages Although there are many advantages of normalization, there are some disadvantages too. After all, the popular saying All the coins have two sides still holds true. So, let s go through the disadvantages too. There is one substantial drawback of a normalized database: reduced database performance. The factors compromised include CPU usage, memory usage, and input/output (I/O). To In other words, a normalized database requires much more CPU, memory, and I/O to process transactions and database queries than does a denormalized database.

Denormalizing a Database Denormalization is the process of taking a normalized database and modifying table structures to allow controlled redundancy for increased database performance. A denormalized database is not the same as a database that has not been normalized. The purpose of denormalization is to get rid of the problems discussed in the previous sections. Denormalization might involve recombining separate tables or creating duplicate data within tables to reduce the number of tables that need to be joined to retrieve the requested data, which results in less I/O and CPU time. This is normally advantageous in larger data warehousing applications in which aggregate calculations are being made across millions of rows of data within tables. There are costs to denormalization, however. Data redundancy is increased in a denormalized database, which can improve performance but requires more extraneous efforts to keep track of related data. So, now you might be confused whether to normalize your database or not. The solution is to normalize the database upto a certain extent. So that, redundancy is controlled to a great extent, without compromising on other factors like CPU usage, memory usage, and input/output (I/O). Manoj Pisharody manoj@itportal.in -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-