SQL Server Table Design - Best Practices



Similar documents
Using SQL Server Management Studio

How To Create A Table In Sql (Ahem)

SQL Server An Overview

Creating Database Tables in Microsoft SQL Server

MS ACCESS DATABASE DATA TYPES

Once the schema has been designed, it can be implemented in the RDBMS.

10+ tips for upsizing an Access database to SQL Server

Introduction This document s purpose is to define Microsoft SQL server database design standards.

Black Hat Briefings USA 2004 Cameron Hotchkies

A Brief Introduction to MySQL

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

ODBC Client Driver Help Kepware, Inc.

Information Systems SQL. Nikolaj Popov

B.1 Database Design and Definition

SQL Server Database Coding Standards and Guidelines

3.GETTING STARTED WITH ORACLE8i

Introduction to Microsoft Jet SQL

Ontrack PowerControls V8.1 for SQL ReadMe

sqlite driver manual

Oracle Database 10g Express

Using Microsoft Access

Services. Relational. Databases & JDBC. Today. Relational. Databases SQL JDBC. Next Time. Services. Relational. Databases & JDBC. Today.

Microsoft Access 3: Understanding and Creating Queries

SQL DATA DEFINITION: KEY CONSTRAINTS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 7

not at all a manual simply a quick how-to-do guide

Guide to Upsizing from Access to SQL Server

Microsoft SQL connection to Sysmac NJ Quick Start Guide

White Paper. Blindfolded SQL Injection

A basic create statement for a simple student table would look like the following.

Developing Web Applications for Microsoft SQL Server Databases - What you need to know

Financial Data Access with SQL, Excel & VBA

SQL Server. 1. What is RDBMS?

Access Queries (Office 2003)

CSE 530A Database Management Systems. Introduction. Washington University Fall 2013

Microsoft Access 2010

Advance DBMS. Structured Query Language (SQL)

Ontrack PowerControls User Guide Version 8.0

IT2304: Database Systems 1 (DBS 1)

Comparison of Open Source RDBMS

Best Practices in SQL Programming. Madhivanan

IT2305 Database Systems I (Compulsory)

Optimizing Your Data Warehouse Design for Superior Performance

CSC 443 Data Base Management Systems. Basic SQL

Using Microsoft Access

ETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Microsoft SQL Server Connector for Apache Hadoop Version 1.0. User Guide

4 Simple Database Features

Table and field properties Tables and fields also have properties that you can set to control their characteristics or behavior.

20464C: Developing Microsoft SQL Server Databases

Optimizing Performance. Training Division New Delhi

Tutorial on Relational Database Design

Firebird. Embedded SQL Guide for RM/Cobol

SQL. Short introduction

Database Administration with MySQL

StruxureWare Power Monitoring Database Upgrade FAQ

Chapter 5. Microsoft Access

Introduction to SQL for Data Scientists


Database Query 1: SQL Basics

Database Migration : An In Depth look!!

What is a database? The parts of an Access database

2874CD1EssentialSQL.qxd 6/25/01 3:06 PM Page 1 Essential SQL Copyright 2001 SYBEX, Inc., Alameda, CA

Title. Syntax. stata.com. odbc Load, write, or view data from ODBC sources. List ODBC sources to which Stata can connect odbc list

Retrieving Data Using the SQL SELECT Statement. Copyright 2006, Oracle. All rights reserved.

Database Migration from MySQL to RDM Server

Dataset Management with Microsoft Access

Microsoft Office 2010: Access 2010, Excel 2010, Lync 2010 learning assets

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Database 2 Lecture I. Alessandro Artale

Database Design Patterns. Winter Lecture 24

Specifications of Paradox for Windows

DATABASE ADMINISTRATION SQL SERVER STANDARDS

Structured Query Language. Telemark University College Department of Electrical Engineering, Information Technology and Cybernetics

Lab Manual. Databases. Microsoft Access. Peeking into Computer Science Access Lab manual

Blindfolded SQL Injection. Written By: Ofer Maor Amichai Shulman

INTRODUCTION TO MICROSOFT ACCESS Tables, Queries, Forms & Reports

Working with DB2 UDB objects

Database Database Management System (DBMS)

Choosing a Data Model for Your Database

Developing Microsoft SQL Server Databases (20464) H8N64S

How Strings are Stored. Searching Text. Setting. ANSI_PADDING Setting

How To Understand The Basic Concepts Of A Database And Data Science

SQL: joins. Practices. Recap: the SQL Select Command. Recap: Tables for Plug-in Cars

FHE DEFINITIVE GUIDE. ^phihri^^lv JEFFREY GARBUS. Joe Celko. Alvin Chang. PLAMEN ratchev JONES & BARTLETT LEARN IN G. y ti rvrrtuttnrr i t i r

Developing Microsoft SQL Server Databases 20464C; 5 Days

Fact Sheet In-Memory Analysis

Database Development and Management with SQL

MySQL Storage Engines

LOBs were introduced back with DB2 V6, some 13 years ago. (V6 GA 25 June 1999) Prior to the introduction of LOBs, the max row size was 32K and the

Contents WEKA Microsoft SQL Database

MS Access Lab 2. Topic: Tables

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Chapter 4: Database Design

Database Encryption Design Considerations and Best Practices for ASE 15

- Suresh Khanal. Microsoft Excel Short Questions and Answers 1

Knocker main application User manual

Transcription:

CwJ Consulting Ltd SQL Server Table Design - Best Practices Author: Andy Hogg Date: 20 th February 2015 Version: 1.11 SQL Server Table Design Best Practices 1

Contents 1. Introduction... 3 What is a table?... 3 Why is good table design important?... 3 What happens when a table isn t designed well?... 3 2. Clusters and Heaps... 4 Definition of Cluster and Heap... 4 More on clustered indices... 4 Using Identity... 6 Primary Keys, and Clustered Indices... 6 3. Dates and Times... 7 How not to store a Date \ Time... 7 Validation... 7 Culture... 8 Date and Time functionality... 9 Choosing the best Date \ Time data type to use... 10 Using the Time data type... 11 Advice on recording durations... 12 4. Strings... 12 Why a number should be a string.... 12 How to decide... 12 5. Choosing the correct Integer... 13 The different Integer types... 13 Examples of poor choice Integers... 13 6. Approximate Data Types... 14 7. Deprecated Data Types... 14 What does deprecated mean?... 14 What are the deprecated data types?... 14 8. Table and Column Naming... 15 Spaces... 15 Prefixing \ Suffixing Tables and Views with tbl and vw... 15 Reserved Keywords... 15 Using P.e.r.i.o.d.s... 16 9. Constraints... 16 Enforcing business logic... 16 Enforcing data cleanliness... 17 Avoiding NULLs... 17 Improving performance... 18 10. One Page Summary for Better Tables... 19 2

1. Introduction What is a table? A table is a collection of related data held in a structured format within a database. It consists of columns and rows. EmployeeNumber FirstName LastName 127 Fred Smith 254 Joachim Löw 874 Steve Jones 423 Ralf Little Relational Database engines (such as SQL Server) store, modify and retrieve data in this format. Why is good table design important? Table design has a huge effect on every aspect of system performance. For example the amount of storage used, the amount of memory used, the amount of processing power required, the likelihood of deadlocks, the integrity of the data etc. What happens when a table isn t designed well? Badly designed tables often perform well. At first However as time goes on, and the amount of data and level of concurrency (number of users) increases, problems begin to manifest themselves. Badly designed tables simply do not scale well. So systems which appear to work fine when first commissioned, will perform badly at an enterprise level with hundreds of users and billions of rows of data. SQL Server Table Design Best Practices 3

2. Clusters and Heaps Definition of Cluster and Heap Tables can store rows in either an ordered format, or an unordered format (known as a heap). Ordered storage stores the rows in the order of the values in a particular column or composite of columns. Heaps store rows in the order that they are inserted (although rows may be moved around later as data is modified). Ordering the table is implemented by applying a clustered index to the table. There are very few occasions when it is best to leave a table as a heap instead of applying a clustered index. These edge cases are few and far between. A good rule to follow is to always create a clustered index on a table. More on clustered indices Since a clustered index defines the order in which rows are stored, there can only ever be one clustered index on a table. This is because it s only possible to order something in one way at a time. For example, it s not possible to take a deck of cards and order it by value low to high and also to order it by value high to low, at the same time. You should give careful consideration when selecting which column or columns to use for the clustered index. The ideal cluster key has the following four properties:- 1. Static Base the cluster key on a column or composite of columns which contain nonvolatile values (values that will not change). Remember that the rows are ordered by the value of the cluster key, so if the value of the cluster key changes, the row is then out of order and must be moved. Avoid using volatile values in a cluster key. 2. Unique It is possible to create a clustered index on a column or composite of columns which contain non-unique values, but it is highly undesirable to do this. In the case on a non-unique cluster key, SQL Server needs to add a hidden 4 byte uniquifier column. This has a substantial impact on performance. Avoid non-unique clustered indices. 4

3. Narrow The ideal data type for a column being used as a cluster key is an INT. This is because at 4 bytes it is narrow, and narrowness is a hugely desirable property in cluster keys. It s possible to find clustered indices defined on very wide data types e.g. CHAR(250) or even a composite of several very wide data types. This is a bad design decision and will affect performance. Avoid using wide keys. 4. Increasing The ideal cluster key will use continually increasing values. It s very easy to store something in an order when it s already given in that order. For example, consider writing down the letters A,B,C,D,E,F,G,H in alphabetical order. Now consider writing down the letters J,W,Y,D,G,E,T,F in alphabetical order. Which would be the easiest and quickest task to perform? Use a continually incrementing value for the cluster key. SQL Server Table Design Best Practices 5

Using Identity SQL Server provides the facility to automatically generate an ever increasing integer value within a table s column. This is often good to use as a cluster key. When using the IDENTITY property, two parameters are specified. The first (the seed) specifies the number at which the automatic numbering should commence. The second (the increment) specifies how many gaps between each automatic number should be left. For example:- IDENTITY(1,1) will produce a series like this 1,2,3,4,5,6,7.. IDENTITY(100,1) will produce a series like this 100,101,102,103,104,105.. IDENTITY(1,2) will produce a series like this 1,3,5,7,9.. A very common design mistake that developers make when using IDENTITY, is to define this as an INT starting from 1 and incrementing by 1 so specifying IDENTITY(1,1). The range of values for an INT is -2,147,483,648 to 2,147,483,647. By starting the identity at 1, the range of usable values (and therefore the maximum number of rows in the table) is halved. Instead, start the IDENTITY at the lowest possible value i.e. negative. For an INT that would mean a seed of -2,147,483,648 to maximise the number of possible values. By specifying IDENTITY(-2147483648,1) the series would then run:- -2147483648, -2147483647, -2147483646, -2147483645,-2147483644 Primary Keys, and Clustered Indices A primary key is a single field or combination of fields that uniquely defines a row. None of the fields that are part of the primary key can be nullable. A table can have only one primary key. By default, if a primary key is defined on a table in SQL Server, and no clustered index already exists on that table, then a clustered index will be automatically created with the same column definition as the primary key. Bear this in mind if you declare a primary key on a table. As defaults go, this isn t awful behaviour however like most defaults, one size never fits all cases. Be aware that the primary key and the cluster key don t have to use the same column or composite columns. Separating the definition of the primary key from the cluster key can sometimes be the correct design decision. 6

3. Dates and Times How not to store a Date \ Time A mistake that many developers often make when designing tables is to store date \ time data in a string e.g. a character data type such as CHAR, NCHAR, VARCHAR, NVARCHAR. We will discuss why this is a design anti-pattern in the following sections. Validation By not using one of the date \ time data types, no validation of the data takes place. For example it s possible to store a date such as the 30 th February, or a time such as 14:61:62 Compare this:- To this:- SQL Server Table Design Best Practices 7

Culture No regional culture information is stored when a date is stored as a string. For example given a date of 03-04-2014, an English person would read this as the 3 rd of April, whereas an American would read this as the 4 th March. By storing this in a data type designed for the purpose, the day, month and year is tracked and we know what the date actually represents. We can also easily display it in formats for other nationalities:- 8

Date and Time functionality The T-SQL language has many built-in functions to make life much easier when querying and manipulating dates and times. However these functions cannot be used if the date or time is stored in a string format. An example of querying using built-in T-SQL functions:- SQL Server Table Design Best Practices 9

Choosing the best Date \ Time data type to use SQL Server offers 6 different data types for storing date and time data. https://msdn.microsoft.com/en-gb/library/ms186724.aspx#dateandtimedatatypes Many people, know only of the DATETIME data type, and so use this automatically. Advice:- If storing dates, and the time component is not important or even recorded, then use the DATE data type instead. DATE will use 3 bytes, whereas DATETIME will use 8 bytes. Here's an example of a table using a DATETIME data type when a DATE would have clearly been a better choice:- Storing 00:00:00.000 for every row is wasteful. Here it costs (6 x 5) = 30 bytes per row. If storing times and the date component is not important or even recorded, then use the TIME data type instead and specify the precision that you need. TIME will use between 3-5 bytes (depending on precision), whereas DATETIME will use 8 bytes. 10

If storing both the date and time is a requirement, then consider using the SMALLDATETIME data type. SMALLDATETIME can store dates between January 1, 1900, - June 6, 2079. It stores time to a granularity of 1 second. Unless dates outside of this range are required, or fractional second precision is needed, then SMALLDATETIME is a better choice than DATETIME. SMALLDATETIME uses 4 bytes whereas DATETIME uses 8 bytes. Using the Time data type The TIME data type should only ever be used to record a point in time. It should never be used to record a duration of passing of time. For example a very bad use of the TIME data type would be to record hours worked. In the example below, it s difficult to try to calculate how many hours were worked in the week. SQL Server Table Design Best Practices 11

Advice on recording durations When faced with a requirement such as the one above, there are 2 different design patterns which are better solutions:- Design the table with a column for StartTime and a column for EndTime. Use the TIME or SMALLDATETIME data types for these. (Using TIME would assume that people never work past midnight, so SMALLDATETIME might be a better choice). Then use the date and time functions within T-SQL to calculate the duration between them. Design the table with a column named HoursWorked and a column named MinutesWorked. Use the TINYINT data type for both. 4. Strings Why a number should be a string. A common mistake often made with table design is to store a number in an integer format when it might be better be served as a string. A telephone number is a good example of this. By storing a telephone number as an integer, it s not possible to append leading zeroes to the number (needed for international dialling) or to add helpful telephone number punctuation like + or - or () or <space>. For example +0044 (0) 7123 987 654 Expressed as an integer that would be 4407123987654. How to decide When deciding whether to store a number in some kind of numeric format, or as a string ask the question:- Will there be a need to perform any mathematical calculations on this value? If the answer to this question is no, then store the value as a string. For example, it s unlikely that you would need to find the sum of all your friends phone numbers. It s also unlikely that you would need to calculate the average phone number of all your contacts. 12

5. Choosing the correct Integer The different Integer types It s very common indeed to see the INT data type used exclusively in tables whenever a whole number needs to be stored. Many people are unaware that SQL Server supports 4 different INT data types, which have the following properties:- Data Type Range Storage BIGINT -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 8 Bytes INT -2,147,483,648 to 2,147,483,647 4 Bytes SMALLINT -32,768 to 32,767 2 Bytes TINYINT 0 to 255 1 Byte Avoid immediately using an INT data type automatically without considering its smaller cousins. Examples of poor choice Integers Here are a couple of real world examples where more thought might have led to a better choice of data type:- A column which is an INT data type (so taking up 4 bytes of storage) yet only holding two possible values (33 or 36). This could have been implemented as a TINYINT data type (1 byte) saving 3 bytes per row. A column which is an INT data type (so taking up 4 bytes of storage) yet only holding four possible values (-1,0,1,2). This could have been implemented as a SMALLINT data type (2 bytes) saving 2 bytes per row. SQL Server Table Design Best Practices 13

6. Approximate Data Types T-SQL provides two data types FLOAT and REAL which can be used to store extremely small or extremely large numbers. However these two data types store an approximation of the value, as opposed to the exact value. The approximate nature of these 2 data types therefore means that their use is best avoided in any financial system. Instead, use the DECIMAL or NUMERIC data types. They are functionally the same, however the important point is that they store an exact value and not an approximation of one. Here s an example of a table which is using approximate data types to store financial data:- Unless these are incredibly small, or immensely huge numbers, the designer of this table might have been better advised to use the DECIMAL \ NUMERIC data types for these. 7. Deprecated Data Types What does deprecated mean? As newer versions of the SQL Server product are released over time, Microsoft removes outdated features. In order to give SQL Server users time to provision for these features being removed, Microsoft first mark these features as deprecated. A deprecated feature is one that is still available to use in the product, but which is scheduled to be removed from a later version of the product in the future. For this reason, it s important not to use deprecated data types in any new development work, since these data types have a finite life span. What are the deprecated data types? Currently, the following data types are deprecated:- TEXT NTEXT IMAGE 14

8. Table and Column Naming Spaces Avoid using spaces in table or column names. Table or column names with spaces in them must always be surrounded by delimited identifiers. For example compare these:- Prefixing \ Suffixing Tables and Views with tbl and vw Although many people make a case for typeful names, in the case of tables and views, experience suggests that this is a bad idea. Consider a single table named tbl_person which contains details of both permanent staff and contractors that is queried for reporting purposes. At some stage in the future, it s decided to change this by splitting tbl_person into two separate tables, one for contractors (tbl_contractor) and one for permanent staff (tbl_permanentstaff). To make this change without modifying the existing reporting queries, a view is created to replace the original Person table. The view simply unions the contents of the two new tables. The end result is a view named tbl_person. The prefix is now misleading because it s no longer a table, it s a view. Reserved Keywords The T-SQL language has a certain vocabulary of reserved keywords. These reserved keywords shouldn t be used as the name of a table or column. It is possible to use these reserved keywords as a name by using delimited identifiers, however this practice is strongly discouraged. A full list of reserved keywords may be found here. https://msdn.microsoft.com/en-us/library/ms189822.aspx SQL Server Table Design Best Practices 15

Using P.e.r.i.o.d.s SQL Server uses the period to distinguish the levels of an object hierarchy. For example - Server.Database.Schema.Object Or often just Schema.Object Avoid using periods in any object or column name in SQL Server. So for example, don t name a table Gas.Operational.Cost.Estimate It is much safer to express this as either:- GasOperationalCostEstimate or Gas_Operational_Cost_Estimate 9. Constraints Constraints are a way of restricting which values are allowed to be stored in a column. They perform many useful functions. Enforcing business logic Constraints can prevent data being entered which contravenes business logic. For example if an error severity level should only ever be -1,0,1, or 2; then a CHECK constraint could be set to enforce this:- Or if an order in an order table should always have an assignment to a customer in a customer table, then a foreign key constraint could be used to enforce this. 16

Enforcing data cleanliness Constraints can be used to help with keeping data clean. For example in a Gender column we could set a CHECK constraint to restrict values to Male, Female or Unknown. This would prevent values being entered such as M, F, Man, Woman, Hombre etc. Avoiding NULLs In a database, a NULL means the value is not known or not applicable. NULL does not mean zero, nor does it mean (an empty string). In some cases, this may be reasonable. For example, a column that stores mobile phone numbers might allow NULLs, because not everyone has a mobile phone. However sometimes it s important to ensure that any record being added or modified has a compulsory piece of information specified. When a column value shouldn t be allowed to be omitted, use NOT NULL in the column definition. SQL Server Table Design Best Practices 17

Improving performance Constraints can help the SQL Server query optimiser to do a better job. Consider a bag of marbles of various colours. We know that there are 20 marbles in the bag, and that 12 of them are red. How many of the marbles are black? In order to answer this question we must empty out the bag and individually count the black marbles (because there might be marbles of other colours in the bag too). However, if a constraint is added The bag can only contain red or black marbles, then it s immediately obvious how many black marbles are in the bag, without the need to empty all the marbles out of the bag and count them. The SQL Server query optimiser can take such logical short-cuts if constraints are defined on columns. 18

10. One Page Summary for Better Tables Design tables properly from the outset to avoid future problems. Define a clustered index on tables. Always use a cluster key that is static, unique, narrow, and incrementing. If you have a composite cluster key which uses a combination of more than three columns, you should probably rethink your table design. Consider creating an ID column for the cluster key, and use the IDENTITY property As a mechanism to populate it. If using the IDENTITY property, seed it with the lowest value possible. Remember that a primary key and a cluster key don t have to use the same columns. Don t store dates and times as strings. Consider various different data types for suitability, rather than just automatically using INT or DATETIME. Don t use the TIME data type to record a duration of time. Don t automatically store a number as an INT (e.g. a phone number is better stored as text). Don t use FLOAT and REAL data types to represent financial data. Don t create any new tables which use deprecated data types. Don t use spaces or periods in table or column names. Don t use reserved keywords in table or column names. Don t prefix \ suffix tables or views with tbl or vw. Use constraints whenever applicable. If a column shouldn t contain NULL values then define it as NOT NULL. SQL Server Table Design Best Practices 19