Advanced Information Management



Similar documents
1 File Processing Systems

Contents RELATIONAL DATABASES

Databases in Organizations

DBMS / Business Intelligence, SQL Server

Overview of Data Management

Introduction to Databases

CSE 132A. Database Systems Principles

Introduction: Database management system

Data Integration Hub for a Hybrid Paper Search

ECS 165A: Introduction to Database Systems

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

Overview of Database Management

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

Chapter 2 Database System Concepts and Architecture

Foundations of Business Intelligence: Databases and Information Management

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Foundations of Business Intelligence: Databases and Information Management

Database Design and Programming

Foundations of Business Intelligence: Databases and Information Management

Introduction to Database Systems. Chapter 1 Introduction. Chapter 1 Introduction

DATABASE MANAGEMENT SYSTEM

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.

Topics in basic DBMS course

Course MIS. Foundations of Business Intelligence

10. Creating and Maintaining Geographic Databases. Learning objectives. Keywords and concepts. Overview. Definitions

DATABASE SYSTEM CONCEPTS AND ARCHITECTURE CHAPTER 2

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Database Systems. Lecture 1: Introduction

Databases and Information Management

MicroStrategy Course Catalog

ORACLE DATABASE 10G ENTERPRISE EDITION

Chapter 1: Introduction

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

MS Designing and Optimizing Database Solutions with Microsoft SQL Server 2008

B.Sc (Computer Science) Database Management Systems UNIT-V

Modern Databases. Database Systems Lecture 18 Natasha Alechina

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

purexml Critical to Capitalizing on ACORD s Potential

Overview of Database Management Systems

Business Analytics and Data Visualization. Decision Support Systems Chattrakul Sombattheera

How To Improve Performance In A Database

In-memory databases and innovations in Business Intelligence

Database Management. Chapter Objectives

High-Volume Data Warehousing in Centerprise. Product Datasheet

The Import & Export of Data from a Database

CERULIUM TERADATA COURSE CATALOG

COIS Databases

Relational Database Basics Review

History of Database Systems

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

DATA WAREHOUSING AND OLAP TECHNOLOGY

Choosing a Data Model for Your Database


Principles of Database. Management: Summary

CSE 233. Database System Overview

Introductory Concepts

COMP5138 Relational Database Management Systems. Databases are Everywhere!

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Foundations of Business Intelligence: Databases and Information Management

Flattening Enterprise Knowledge

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 1 Outline

Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence

Data Modeling for Big Data

Relational Databases for the Business Analyst

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

<Insert Picture Here> Oracle SQL Developer 3.0: Overview and New Features

Trafodion Operational SQL-on-Hadoop

Managing XML Data to optimize Performance into Object-Relational Databases

DISTRIBUTED AND PARALLELL DATABASE

Your Data, Any Place, Any Time.

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

The Classical Architecture. Storage 1 / 36

Introduction to Object-Oriented and Object-Relational Database Systems

Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to:

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Oracle Warehouse Builder 10g

Geodatabase Programming with SQL

Oracle Database 11g: SQL Tuning Workshop

Oracle Database 11g Comparison Chart

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Foundations of Business Intelligence: Databases and Information Management

Chapter 1 Databases and Database Users

Transcription:

Anwendersoftware a Advanced Information Management Chapter 1: Introduction Holger Schwarz Universität Stuttgart Sommersemester 2009

Overview Basics Terms Database Management Systems Database Design Query Languages Classes of Database Applications Transaction Management Business Intelligence Geographic Information Systems Engineering Applications Enterprise Content Management Overview 2

Database Terms A data model is a collection of concepts for describing data. A schema is a description of a particular collection of data, using the a given data model. The relational model of data is the most widely used model today. Main concept: relation, basically a table with rows and columns. Every relation has a schema, which describes the columns, or fields. attribute column name projects relation name table name projectno manager description budget PJ23 Miller main bodywork team 1 000 000 PJ15 Maynard specialized wings 100 000 PJ47 Morris electronics 500 000 tuple / row 3

Database Management Systems DBMS: A tool for creating and managing g large amounts of data efficiently and allowing it to persist over long periods of time, safely. (Garcia-Molina et. al., 2002) Levels of abstraction provide logical data independence as well as physical data independence Many external schemas (views) describe how users see the data. One conceptual schema defines logical structure One physical schema describes the files and indexes used. DBMS Database DBS 4

Advantages of a DBMS Data independence Efficient data access + Data integrity & security Data administration Concurrent access, crash recovery Reduced application development time! Can be expensive, complicated to set up and maintain This cost & complexity must be offset by need 5

Structure of a DBMS A typical DBMS has a layered architecture. Each layer provides some kind of data abstraction and data mapping Concurrency control, recovery as well as transaction management have to be supported (within some layers). This is one of several possible architectures; each system has its own variations. DBM MS Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB 6

Process Model: information collection Database Design semantical data modeling logical data modeling database installation analysis of meaning interview noun analysis brainstorming document analysis... rough modeling precise modeling ERM UML NIAM EXPRESS-G IDEF1X STEP... conceptual schema DBMS independent relational object-oriented XML hierarchical network DBMS dependent time DB2 INGRES ORACLE ONTOS... conceptional cept o logical physical schema design schema design schema design normalization indexes clustering tuning 7

Relational Algebra Cartesian restriction projection product division a b c x y a a b x y x a a a x y z x z a σ π b c c y x y b c x y >< union set intersection set difference natural join a1 a2 a3 b1 b1 b2 b1 b2 b3 c1 c2 c3 a1 a2 a3 b1 b1 b2 c1 c1 c2 8

parts Examples of Tables usage partno P050 version 1.0 projectno PJ23 part_description bodywork partno version uses_ partno uses_ version quantity P050 2.0 PJ23 bodywork P050 1.0 P101 1.0 1 P101 1.0 PJ23 front body section P050 1.0 P102 1.2 2 P101 1.1 PJ23 front body section P050 10 1.0 P103 12 1.2 2 P101 2.0 PJ23 front body section P050 1.0 P104 1.2 2 P050 1.0 P111 1.0 2 P102 1.2 PJ23 a column P050 2.0 P101 1.1 1 P103 1.2 PJ23 b column P104 1.2 PJ23 c column P050 2.0 P102 1.2 2 P111 1.0 PJ15 rear wing P050 2.0 P103 1.2 2 P111 12 1.2 PJ15 rear wing P050 2.0 P104 1.2 2 P112 1.0 PJ15 front wing P050 2.0 P111 1.2 2 P050 2.0 P112 1.0 2 projects projectno manager description budget PJ23 Miller main bodywork team 1 000 000 PJ15 Maynard specialized wings 100 000 PJ47 Morris electronics 500 000 9

SQL Queries Example: List all parts of version 1.0 and the manager that is responsible for the corresponding project. SELECT partno, version, manager FROM parts, projects WHERE parts.projectno = projects.projectno AND version = '1.0' Example: Which project is responsible for more than two different parts? SELECT projectno, COUNT(DISTINCT partno) FROM parts GROUP BY projectno HAVING COUNT(DISTINCT partno) > 2 10

Other SQL Statements SQL language supports in addition to retrieval data manipulation operations that rely on the query capabilities INSERT INSERT INTO table [ (column-commalist) ] { VALUES row-constr-commalist t table-exp DEFAULT VALUES } UPDATE UPDATE table SET update-assignment-commalist assignment [WHERE cond-exp] DELETE DELETE FROM table [WHERE cond-exp] In addition there are statements for Data definition Data control Embedding of SQL into host languages 11

Other Important Concepts Some other important concepts in SQL:1999: View: A virtual table that is formed by a query expression and does not physically exist. Routine: A procedure, function or method that is known (in some cases also stored) by the system. It can be written in SQL or an external host language. Trigger: Allows actions to be taken when data is inserted, updated or deleted. Schema: A named collection of objects in the database. Catalog: A named collection of schemas in a database. User: Authorization identifier to control access to the database. Privilege: Defines the allowed operations for each user. 12

What we re skipping Access Methods disk layout for tuples and pages indexes (B-trees, linear hashing) Query optimization how to map a declarative query (SQL) to a query plan (relational algebra + implementations) Query processing algorithms sort, hash, join algorithms Transaction concepts and processing Atomicity, consistency, isolation, and durability 13

Overview Basics Terms Database Management Systems Database Design Query Languages Classes of Database Applications Transaction Management Business Intelligence Geographic Information Systems Engineering Applications Enterprise Content Management Overview 14

Transaction Processing BOT UPDATE accounts SET balance = balance - 3 WHERE A# = 03874; Transaction System Database System Transac ction Progr am Card? PIN? Account? Order Output # balance 03874 17 14 EOT OK 15

Business Intelligence effectiveness OLAP Data Mining strategic performance Data Warehouse Transaction Processing (OLTP) (heterogeneous information systems, isolated information islands, constantly increasing data sets) planning operational o a Data Mining Mining Engine OLAP Engine Data Warehouse OLAP 16

Business Intelligence effectiveness OLAP Data Mining strategic performance Data Warehouse Transaction Processing (OLTP) (heterogeneous information systems, isolated information islands, constantly increasing data sets) planning operational o a Data Mining Mining Engine OLAP Engine Data Warehouse OLAP 17

Optimization of Multi Queries Find the Top 25 Products that show the highest raise in turnover in the last quarter in 2000 compared to the preceeding quarter. Anfrage Query1 1 Query Anfrage 2 2 Anfrage Query4 4 Anfrage Query 5 5 Query Anfrage 6 6 Anfrage Query 8 8 Query3 Anfrage 3 Anfrage Query 7 7 Information Request SQL OLAP + Data Mining? Query Generator Meta Data Result Result Preprocessing OLAP Engine Partial Results Data Warehouse 18

Data Mining Techniques Association rule discovery Classification Example of application: store layout Example of application: insurance risk prediction {beer, nappies} {potato chips} support = 0.04 confidence = 0.81 Clustering/segmentation Example of application: customer mailings Regression Example of application: customer ranking revenue revenue #children age age 19

Classification: Training Phase Given: a set of training tuples carrying a class label Aim: a model (classifier) that assigns a class label to a given tuple by deriving the label from the values of the tuple s attributes training data classification algorithms name age income credit Mary 20-30 low poor James 30-40 low fair Bill 30-40 low good John 20-30 medium fair Marc 40-50 high good Anne 40-50 high good classifier (model) IF age = 40-50 OR income = high THEN credit = good store in DB, e.g., XML 20

Classification: Test Phase test data classifier (model) prediction name age income credit Paul 20-30 high good Jenny 40-50 low fair Rick 30-40 high fair credit fair fair good If there is a significant discrepancy between the expected (and by definition correct) result and the result predicted by the model it may be necessary to adapt/rebuild the model. 21

Classification: Application Phase unseen data classifier (model) prediction name age income Jim 20-30 high Phil 30-40 low Kate 40-50 medium credit fair poor fair 22

Master Data Management Overlapping and redundant data, applications, infrastructure No single, consolidated view of critical enterprise data Hand coded data integration spaghetti Master Data Management (MDM): business processes, technical and data integration architecture to create and maintain a system of record for core business entities across disparate applications in the enterprise Single view of data: Single view of customers (cross selling) Single view of suppliers (purchasing leverage) Single view of parts (inventory) Master Data Legacy Applications Legacy Applications Master Data Master Data Management System Legacy Applications Master Data Historical / Analytical Systems New Applications 23

Geographic Information System Search for areas of high traffic and high emission? SELECT Position FROM Emission E, Traffic V Spatial Extender WHERE overlaps(e.polygon, V.polygon) Spatial Query Database System 24

Enterprise Content Management Operational Content Statements, Invoices, Reports Scanned Images Fax Rich Media Video Audio Web Content HTML Graphic Files Business Content Workgroup Documents Word Processing Spreadsheet Presentation e-mail 25

ECM Infrastructure Customer Service Siebel, Customer Loyalty Operational Productivity SAP, Vertical Applications, e-records INTEGRATION LAYER Rich Media E-Commerce, e-learning, Brand Assets Archiving Search & Access Rights Management Media Streaming OTHER CONTENT STORES Content Manager Relational Data e-mail Exchange Legacy Systems Other File Systems ECM Platform 26

Challenges for DBMS Additional data models: XML, OO, Complex data structures: spatial objects, mining models, Various data types: documents, images, Integrating heterogeneous data sources Extended query languages needed Seamless integration with SQL 27

Overview Object-Relational Technology Motivation Table Expression and Recursion Large Objects Structured Types Hands-on Training 28

Object-Relational Technology DBM MS routines / methods built-in data types Seamless integration with SQL and Database Extend RDBMS to handle nontraditional data types spatial Co-existence of plug-in data with traditional data in a table Combined search in a single data mining i SQL statement Leverage existing data, applications, skills still image 29

Hierarchy of Types and Tables CREATE TYPE Straßen_T AS ( Name CHAR (40), Länge DECIMAL (9,2), Breite DECIMAL (5,2)) NOT FINAL ; CREATE TYPE Autobahn_T UNDER Straßen_T AS ( Gebühr Money_T) NOTFINAL ; CREATE TYPE Ortsstraßen_T UNDER Straßen_T AS ( Ort Orte_T) NOT FINAL ; SELECT * FROM Straßen WHERE Breite > 7,50 Straßen OID Name Länge Breite O21 Schillerweg 7,75 5,00 Autobahn is_a OID Name Länge Breite Gebühr O08 A6 324,00 18,20 20 O71 A8 564,50 20,10 10 is_a Ortsstraßen OID Name Länge Breite Ort O12 Schillerstr. 2,50 7,50 Köln O13 Mozartstr. 3,25 8,75 München 30

Object Relational Extensions Improve application development productivity Encapsulate specialist expertise for use by non-experts Vendors develop & support plug-ins so that you don't have to Consistent semantics ready for use Open architecture Write your own plug-ins Provide performance and scalability (combined w/ security, availability,...) Optimization Parallelism Industrial strength 31

Overview XML and Databases XML Data Model Path Expressions XQuery XML Support in DBMSs XML Processing Hands-on Training 32

library.xsd Example XML document <xs:schema xmlns:xs="http://www.w3.org/2001/xmlschema"> <xs:element name="name" type="xs:string"/> <xs:element name="born" type="xs:date"/> <xs:element name="dead" type="xs:date"/> <xs:attribute name="id" type="xs:id"/> <xs:element name="author"> <xs:complextype> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minoccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> <?xml version="1.0"?> <author id="cms"> <name> Charles M Schulz </name> <born> 1922-11-26 </born> <dead> 2000-02-12 </dead> </author> </xs:complextype> XML h </xs:element> </xs:schema> XML schema document 33

XQuery: FLWOR Expression FOR_ clause RETURN_ clause LET_clause WHERE_clause ORDER_BY_clause FOR clause, LET clause generate list of tuples of bound variables (order preserving) by iterating over a set of nodes (possibly specified by a path expression), or binding a variable to the result of an expression ession WHERE clause applies a predicate to filter the tuples produced by FOR/LET ORDER BY clause imposes order on the surviving tuples RETURN clause is executed for each surviving tuple, generates ordered list of outputs 34

XQuery: FLWOR Expression FOR_ clause RETURN_ clause LET_clause WHERE_clause ORDER_BY_clause for $x in doc("bank2.xml")/bank/account let $acctno := $x/@account-number number where $x/balance > 400 return <account-number> {string($acctno)} </account-number> 35

XML and Databases X Path XQ Query Query Optimization and Execution describe schema: XML schema query languages: XQuery storage technologies indexing technologies DBMS Operators Files and Access Methods Buffer Management Disk Space Management native XML-DBMS hybrid systems: relational + xml DB 36

Overview Content Management Introduction and Basics Information Retrieval Technology: - Indexing - Search Hands-on Training 37

Elements of an ECM System APIs Application Development Workflow Components Capture Index Search / Access Create Rights Mgmt. Workflow Library server Device support Storage Management Resource manager Mgmt. Utilities 38

Overview Applications Development Data-intensive Applications Web-based Applications Technology Overview - JDBC - JEE/EJB - Web Services Hands-on Training 39