Topics in basic DBMS course



Similar documents
DATABASDESIGN FÖR INGENJÖRER - 1DL124

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

CSE 233. Database System Overview

SQL Query Evaluation. Winter Lecture 23

<Insert Picture Here> Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region

Big Data from a Database Theory Perspective

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Survey of Big Data Architecture and Framework from the Industry

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Contents RELATIONAL DATABASES

Concept and Applications of Data Mining. Week 1

MS SQL Performance (Tuning) Best Practices:

Oracle Database 11g Comparison Chart

Database Systems. Lecture 1: Introduction

COIS Databases

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Data Warehouse: Introduction

SQL Server 2005 Features Comparison

Introduction. A. Bellaachia Page: 1

Big Data With Hadoop

Tertiary Storage and Data Mining queries

Case Study - I. Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008.

Introduction to SQL Tuning. 1. Introduction to SQL Tuning SkillBuilders, Inc. SKILLBUILDERS

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to:

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Programa de Actualización Profesional ACTI Oracle Database 11g: SQL Tuning Workshop

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

Foundations of Business Intelligence: Databases and Information Management

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Enterprise and Standard Feature Compare

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Why compute in parallel? Cloud computing. Big Data 11/29/15. Introduction to Data Management CSE 344. Science is Facing a Data Deluge!

Sunnie Chung. Cleveland State University

Foundations of Business Intelligence: Databases and Information Management

Introducing Oracle Exalytics In-Memory Machine

DATABASE MANAGEMENT SYSTEM

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

1 File Processing Systems

Your Data, Any Place, Any Time.

Data Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609.

Index Selection Techniques in Data Warehouse Systems

Data Centric Computing Revisited

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option

Multi-dimensional index structures Part I: motivation

WHITE PAPER ON ORACLE 10g GRID

Sanjeev Kumar. contribute

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Luncheon Webinar Series May 13, 2013

ORACLE DATABASE 10G ENTERPRISE EDITION

Cloud Computing at Google. Architecture

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Course MIS. Foundations of Business Intelligence

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Architectures for Big Data Analytics A database perspective

Storing Measurement Data

The basic data mining algorithms introduced may be enhanced in a number of ways.

TIM 50 - Business Information Systems

A Study on Big-Data Approach to Data Analytics

IBM Netezza High Capacity Appliance

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Stay Tuned for Today s Session! NAVIGATING THE DATABASE UNIVERSE"

Enterprise Applications

LISTE DES DOCUMENTS ORACLE

Accelerate Business Advantage with Dynamic Warehousing

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

Oracle Architecture, Concepts & Facilities

Powerful Management of Financial Big Data

The Cubetree Storage Organization

Key Attributes for Analytics in an IBM i environment

Actian SQL in Hadoop Buyer s Guide

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

تعریف Big Data. موضوعات مطرح در حوزه : Big Data. 1. Big Data Foundations

Managing large sound databases using Mpeg7

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

The University of Jordan

Understanding the Value of In-Memory in the IT Landscape

Oracle Big Data SQL Technical Update

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Objectives of Lecture 1. Class and Office Hours. Labs and TAs. CMPUT 391: Introduction. Introduction

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

How To Use Hp Vertica Ondemand

SQL Server 2008 Business Intelligence

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

CSE 132A. Database Systems Principles

Research Statement Immanuel Trummer

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract)

Oracle Database 11g: SQL Tuning Workshop

DATA WAREHOUSING AND OLAP TECHNOLOGY

Advanced Oracle SQL Tuning

Reference Architecture, Requirements, Gaps, Roles

Transcription:

Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch 9) Object-relational databases and query language (Amos II) Multi-media database overview http://user.it.uu.se/~udbl/dbt-ht2004/

Modern DBMS research areas Query processing is a central database research area: How to find correct result fast from large database Queries over new scalable storage types: Text, XML, images, voice, music, Indexing new storage types. New DBMS architectures Wrapper/mediators, P2P databases, main memory, real time,.. New query optimization methods Adaptive, new heuristics, rule based,

Modern DBMS research areas Representation and search of spatial data Representation of points, lines, surfaces, bitmaps, maps, etc. Find similar or close objects, regions, etc. Spatial indexes, indexing mobile data. Representation and search of voice, video, music, and other multi-media data Representation of very large objects Streamed (real-time) retrieval, QoS Searching for sections, scenes, patterns, similarities, etc.

Modern DBMS research areas Stream databases Queries over indefinite stream of data, not disk tables Continous rather than passive queries Data reduction queries yield new smaller streams Combine with passive data. Representation and search of temporal data Time stamping of all data Queries over time, trends, etc. Temporal indexing Representation and search of ordered data, e.g. sequences and arrays.

Modern DBMS research areas Representation and search of unstructured textual data Free text search/indexing in database server (e.g. Oracle) Search text similar to other text (c.f. Google) Mixing structured and free text search Representation and search of semistructured data Usually XML structures Tree structures, some structure known Path expressions combined with queries

Modern DBMS research areas Data warehouses, large relational databases for enterprise data storage and analysis Very large relational databases OLAP, Advanced queries, data cubes, statistics, trends Database support for statistical analysis of data from large databases, data mining Data mining is large scale statistical inference! E.g. discovery of clusters, trends, correlations, etc. Data mining utilizes databases and data warehouses DBMS requirements: Performance of stat. extractions, statistically sound query results

Modern DBMS research areas High performance and availability DBMSs based on distributed and parallel computations Very high performance, reliability, scalability Use of modern hardware, e.g. large main memories, fast communication, P2P New kinds of storage structures: SDDSs, Scalable Distributed Data Structures LH LH*

Course topics Object-Relational query optimization (this lecture) Mediator/wrapper approach (heterogeneous databases) Lecture. Temporal Databases Real-time Databases Stream Databases Scalable distributed data structures (LH*) Semi-structured databases (XML) Uncertainty in databases Parallel and distributed databases Meta-optimization Spatial databases

SQL Parser Relational calculus (variant of predicate calculus) Rewriter Relational calculus Cost-based optimizer Extended relational algebra (functional program) Interpreter

The Query Processing problem Transform: High-Level Declarative Query --> Low-Level Execution Plan Normally: Relational Calculus --> Annotated Physical Relational Algebra The execution plan is a (functional) program which is interpreted by the evaluation engine to produce the query result Problem: For every query there may be very many possible execution plans: O(2 Q ) where Q is number of operations in query

The Query Processing problem The optimal plan can be millions of times faster than an unoptimized plan! Why? The complexity of optimal plan improved automatically, e.g. index used instead of linear search of database. select name from person where ssn=123456 select ssn from person where name like To% E.g from O(N 2 ) to O(1), where N is size of database! Query optimization may have huge payoff! However: Query optimization time may be significant!

Cost-based query optimization 1. Generate all likely execution plans (heuristics to avoid some unlikely ones) 2. Estimate the cost of executing each of the generated plans 3. Choose the cheapest one The cost depends on amount of data processed (disk blocks accessed). -> DBMS maintains statistical model of data distribution in tables. E.g. select ssn from person where name > M

Optimization criteria: a. # of disk blocks read (dominates) b. CPU usage c. Communication time Normally weighted average of different criteria. Cost depends on query execution strategy, storage methods, and indexing used

Optimizing the optimizer (meta-optimization): Naïve approach (trying all execution orders and indexes): O( Q!) Dynamic programming O( Q 2 ) O(3 Q ) generates optimal plan. Normally used. Hillclimbing O( Q 2 ) may generate suboptimal plans. Randomized methods O( Q 2 ) converge to optimal plan.