RNDr. Michal Kopecký, Ph.D. Department of Software Engineering, Faculty of Mathematics and Physics, Charles University in Prague

course: Database Applications (NDBI026) WS2015/16 RNDr. Michal Kopecký, Ph.D. Department of Software Engineering, Faculty of Mathematics and Physics, Charles University in Prague

student duties final DB application DB layers Application layer in DB (procedures/functions/triggers, etc.) Either in Oracle or MS SQL database Attendance recommended (but not mandatory) the slides alone are not comprehensive other sources manuals: http://docs.oracle.com/cd/e11882_01/index.htm http://www.orafaq.com/ http://technet.microsoft.com/en-us/library/bb545450.aspx web: http://www.ms.mff.cuni.cz/~kopecky/teaching/ndbi026/ 2

It is about (knowledge of theory from course NDBI025 is supposed) Practical database development against given database server What take into account During creation of the DB schema When SQL query are written Optimization Indexes Execution plans In multi-user environment Locking Transaction processing For data security 3

Other topics are subject to follow-up courses Database languages I, II Datalog Oracle and MS SQL Server administration Transactions Stochastic methods in databases Searching the web and multimedia databases Retrieval of multimedia content on the web XML technology NoSQL databases 4

Relational Model Currently main platform for OLTP/OLAP Query optimization Indexing and correct query formulation can affect the execution time in many orders Multi-user environment Not correctly implemented application can cause incorrect data processing and strange results 5

Procedural extension Triggers - extended integrity constraint checking Procedures and functions application logic Object-oriented extensions User-defined types Nested tables Full-text extensions XML data processing 6

RDBMS Oracle 11g Object-relational database Support for server-side code execution in languages: PL/SQL Java C/C++ (any.dll/.so library) XML support, multi-media support, RDBMS MS SQL 2008 R2 Object-relational database Support for server-side code execution in languages: T-SQL C# XML support, text-search support, 7

SQL standards and implementations SELECT statement Embedded functions 8

Structured query language Standard language for access to (relational) databases Originally ambitions to provide natural language (that s why, e.g., SELECT is so complex a single phrase) Different subsets of statements Data definition language (DDL) CREATE/ALTER creation and altering of relational (table) schemas Definition of integrity constraints Data manipulation language (DML) Querying Data insertion, Deletion, Updating Transaction management Administration 9

Standards ANSI/ISO SQL 86, 89, 92, 1999, 2003 (backwards compatible) Commercial systems implement SQL at different standard level (most often SQL 99, 2003) Unfortunately, not strict implementation Lot of extra nonstandard features supported Some standard ones not supported Specific extensions for procedural, transactional and other functionality TRANSACT-SQL (Microsoft SQL Server) PL/SQL (Oracle) 10

SQL 86 first shot, intersection of IBM SQL implementations SQL 89 small revision triggered by industry, many details left for 3 rd parties SQL 92 stronger language, specification 6x longer than for SQL 86/89 schema modification, tables with metadata, inner joins, cascade deletes/updates based on foreign keys, set operations, transactions, cursors, exceptions four subversions Entry, Transitional, Intermediate, Full SQL 1999 many new features, e.g., object-relational extensions types STRING, BOOLEAN, REF, ARRAY, types for full-text, images, spatial data triggers, roles, programming language, regular expressions, recursive queries, etc. SQL 2003 further extensions, e.g., XML management, autonumbers, std. sequences, but also type BIT removed 11

SQL-86 SQL-92 ANSI/SQL2;ISO/IEC 9075:1992 Entry, Intermediate, Full SQL-99 ANSI/ISO/IEC 9075:1999 SQL-2003 ISO/IEC 9075:2003 86 92 99 12

Individual database servers not strictly follow standards Usually SQL-92 Entry Lot of non-portable extensions Strong vendor-lock Not all features implemented according to ANSI Newer versions have better compatibility Usually exist both native and ANSI versions side-by-side Common SQL-92 compatible RDBMS 86 92 99 13

The more features above SQL- 92 Entry are used in the application The less is probability that the application will be able to run on different RDBMS from different vendor Lot of high-level fetaures are available only in proprietary form and can not be easily ported Necessity to choose the platform before the application development Change of the platform during development is complicated and expensive Common SQL-92 compatible RDBMS 78 92 99 14

What to do in case that RDBMS doesn t understand my query? Is (not) the statement correct? Does the RDBMS (not) understand/support given feature? Validators of SQL http://developer.mimer.se/validator/ At the end SQL statement has to be rewritten 15

SELECT [DISTINCT] expr_c1 [[AS] c_alias1] [, ] FROM source1 [[AS] t_alias1] [, ] [WHERE row_cond] [GROUP BY expr_g1 [, ] [HAVING group_cond]] [ORDER BY expr_o1 [, ]] 16

SELECT [DISTINCT] expr_c1 [[AS] c_alias1] [, ] FROM source1 [[AS] t_alias1] [, ] [WHERE row_cond] [GROUP BY expr_g1 [, ] [HAVING group_cond]] [ORDER BY expr_o1 [, ]] First, all data sources (tables, views, subqueries) are combined together If sources are delimited by commas, a cartesian product is computed ANSI SQL-92 introduced JOIN ON, NATURAL JOIN, OUTER JOIN, 17

SELECT [DISTINCT] expr_c1 [[AS] c_alias1] [, ] FROM source1 [[AS] t_alias1] [, ] [WHERE row_cond] [GROUP BY expr_g1 [, ] [HAVING group_cond]] [ORDER BY expr_o1 [, ]] Second, rows that don t follow the condition are eliminated 18

SELECT [DISTINCT] expr_c1 [[AS] c_alias1] [, ] FROM source1 [[AS] t_alias1] [, ] [WHERE row_cond] [GROUP BY expr_g1 [, ] [HAVING group_cond]] [ORDER BY expr_o1 [, ]] Remaining rows are grouped according to equality of grouping expressions (SORT/HASH) Every resulting row group contains atomic columns with values of grouping expressions and set columns with sets of values from all rows that form the group 19

SELECT [DISTINCT] expr_c1 [[AS] c_alias1] [, ] FROM source1 [[AS] t_alias1] [, ] [WHERE row_cond] [GROUP BY expr_g1 [, ] [HAVING group_cond]] [ORDER BY expr_o1 [, ]] Groups that don t correspond to the group conditions are eliminated 20

SELECT [DISTINCT] expr_c1 [[AS] c_alias1] [, ] FROM source1 [[AS] t_alias1] [, ] [WHERE row_cond] [GROUP BY expr_g1 [, ] [HAVING group_cond]] [ORDER BY expr_o1 [, ]] Rows/groups are ordered according to required expression values 21

SELECT [DISTINCT] expr_c1 [[AS] c_alias1] [, ] FROM source1 [[AS] t_alias1] [, ] [WHERE row_cond] [GROUP BY expr_g1 [, ] [HAVING group_cond]] [ORDER BY expr_o1 [, ]] Remaining (ordered) rows/groups are produced on the output In case of DISTINCT select, all duplicities are removed (before ORDER BY) Require additional SORT/HASH operation 22

GROUP BY has to sort/hash all rows to put rows from one group together Useful to group as less rows as possible If rows can be filtered out by WHERE clause before grouping, the result will be more effective than if unwanted groups are eliminated later 23

SELECT Street, COUNT(*) FROM Citizen WHERE City='Prague' GROUP BY City, Street; Only one million of rows is ordered/hashed SELECT Street, COUNT(*) FROM Citizen GROUP BY City, Street HAVING City='Prague'; 10 millions of rows are ordered / hashed, most of groups are dropped in the next step 24

DISTINCT clause sorts (hashes) resulting rows (even before ORDER BY operation), to find and eliminate duplicit records If it is possible, it is good to write query without DISTINCT clause ORDER BY clause should be used only when necessary It is not good idea to use it in view definitions, because the view is often used as a source for further querying 25

CREATE TABLE tab_name ( col_name [(maxsize[,prec])] [col_constr],, row_constraint, ); CREATE TABLE Person ( id numeric(11,0) CONSTRAINT Person_PK PRIMARY KEY, name character(50) NOT NULL ); 26

SQL-92 distinguishes two server-side encodings of characters Due to UTF-8 (UTF-16) support Able to store and manipulate characters from any language Not so effective multi-byte storage for national language alphabets 1. Global character set, Can use single-byte encoding CP-1250, ISO-8859-2,, or UTF 2. National character set, For texts in national language, can use UTF 27

SQL-92 distinguishes further two string representations 1. Fixed length Simpler data actualization Less effective representation 2. Variable length, only used characters are stored (plus length) More effective representation More complicated data actualization due to different number of bytes needed 28

CHARACTER(n) text in fixed length n bytes/chars CHARACTER VARYING(n) CHAR VARYING(n) text in variable length, max. n bytes/chars NATIONAL CHARACTER(n) text in fixed length n bytes/chars in national alphabet NATIONAL CHARACTER VARYING(n) NATIONAL CHAR VARYING(n) NCHAR VARYING(n) text in variable length, max. n bytes/chars in national alphabet 29

Constants are enclosed in single quotas Single quotas inside string has to be doubled 30

NUMERIC(p[,s]) common numeric type using p numbers, (with fixed decimal point using s positions after decimal point INTEGER, INT, SMALLINT integer FLOAT(b) real with b-bit precision REAL real DOUBLE PRECISION real number with double precision 31

DATE date (YYYY-MM-DD), precision at least days, maybe more TIME time (HH:MM.SS.MMMM), precision at least seconds TIMESTAMP date plus time (YYYY-MM-DD HH:MM.SS.MMMM) TIMESTAMP(p) WITH TIMEZONE p denotes precision of second fragments, timezone as +HH:MM, resp. HH:MM at the end 32

Constants are enclosed in single quotas in shown format 33

Databases Not necessary support all mentioned types Sometimes support them not natively, the data type is translated to similar natively supported type 34

CHARACTER(n) CHARACTER VARYING(n) CHAR VARYING(n) NATIONAL CHARACTER(n) NATIONAL CHARACTER VARYING(n) NATIONAL CHAR VARYING(n) NCHAR VARYING(n) NUMERIC(p,s) INTEGER, INT, SMALLINT FLOAT(b) DOUBLE PRECISION REAL CHAR(n) VARCHAR2(n) VARCHAR2(n) NCHAR(n) NVARCHAR2(n) NVARCHAR2(n) NVARCHAR2(n) NUMBER(p,s) NUMBER(38) NUMBER NUMBER NUMBER 35

DATE Precision in seconds, i.e. corresponds to TIMESTAMP minimal requirements in SQL-92 Default (American) format DD-MON-YY for example 01-JAN-2015 VARCHAR2(size), //recommended VARCHAR(size) String in variable length representation The size max. 4000 chars (recommended. max. 2000 chars) 36

[CONSTRAINT cons_name] constraint_definition [INITIALLY {DEFERRED IMMEDIATE}] [[NOT] DEFERRABLE] If the constraint is not explicitly named, it obtains usually artificial name (In Oracle e.g. SYS_Cnnnnnn). Therefore it is recommended to name them explicitly Column constraints are delimited each from another by space 37

NULL, resp. NOT NULL The column can, resp. cannot contain undefined value NULL. UNIQUE The column has to have all not null values different. PRIMARY KEY The column forms the primary key of the table, is automatically understood as both NOT NULL and UNIQUE. 38

CHECK (condition) Column value has to fulfill given condition. REFERENCES table_name(column) [ON DELETE {CASCADE SET NULL}] Column value references to primary key, or candidate key (UNIQUE column) of given table Using ON DELETE clause the deletion of master row is allowed. If it is deleted, referencing row is deleted as well or its value is set to NULL 39

DEFAULT value Not exactly integrity constraint, cannot be named, cannot be deferred Default value, used if the INSERT didn t use value for this column explicitly By default is column defined as DEFAULT NULL 40

Example CREATE TABLE Person( RC NUMERIC(11,0) CONSTRAINT Person_PK PRIMARY KEY, NAME CHAR VARYING(30) CONSTRAINT Person_U_Name UNIQUE NOT NULL, EMAIL CHAR VARYING(30) CONSTRAINT Person_C_Email CHECK (EMAIL LIKE '_%@_%._%' ); 41

Information about tables are in Oracle available in views USER_TABLES USER_TAB_COLUMNS USER_CONSTRAINTS Information about tables are in MS SQL available in views INFORMATION_SCHEMA.TABLES INFORMATION_SCHEMA.COLUMNS INFORMATION_SCHEMA.TABLE_CONSTRAINTS 42

Can be applied on more columns of the same row CHECK (event_begin <= event_end) Can define multi-column primary, candidate and foreign keys PRIMARY KEY (event_begin, event_end) FOREIGN KEY (event_begin, event_end) REFERENCES Parent (x, y) 43

ENABLED / DISABLED Constraint is (is not) active and the validity is checked ALTER TABLE tab_name {ENABLE DISABLE} CONSTRAINT cons_name; DEFERRED / IMMEDIATE Constraint checking is deferred at the end of transaction, by default is checked immediately after every data change DEFERRABLE / NOT DEFERRABLE Constraint can be / cannot be deferred 44

If possible, check all data changes at the moment they occures and can be checked Whatever the user can insert in wrong place/format/ will be inserted wrongly Integrity constraints, resp. triggers Cleaning of inconsistent data later is timeconsuming and often not fully possible It is better to check everything at the database, than hope that the input will be tested in every applications running on data 45

Check the uniqueness of data Every table should have the primary key Even in case the primary key is artificial, individual instances (rows) usually have some natural one or multi column identifier, which should be set as candidate key of the table (UNIQUE) Sometimes more candidate keys can be found 46