Capturing Database Transformations for Big Data Analytics

Size: px
Start display at page:

Download "Capturing Database Transformations for Big Data Analytics"

Transcription

1 Capturing Database Transformations for Big Data Analytics David Sergio Matusevich University of Houston

2 Organization Introduction Classification Program Case Study Conclusions

3 Extending ER Models to Capture Database Transformations to Build Data Sets for Data Mining Analysis Carlos Ordonez Sofian Maabout David Sergio Matusevich Wellington Cabrera Carlos Ordonez, Sofian Maabout, David Sergio Matusevich, Wellington Cabrera. Extending ER Models to Capture Database Transformations to Build Data Sets for Data Mining, Data & Knowledge Engineering (DKE), 2014, Elsevier.

4 Data Mining Projects Data mining projects usually require the preparation of a dataset specially created for answering the particular question asked by the user. For example, given a database of a cellular phone company, we might ask: What percentage of users will change data plans with the advent of a new smartphone. This will require the creation of a dataset where at least one of the columns will be CHANGED_PLAN and another one could be BEFORE_NEW_DEVICE. Many of the intermediate tables created for this project might remain in the database, using resources.

5 Motivation: Saving Work Different users might ask similar questions, leading to the creation of tables that are virtually identical, cluttering the system. For instance a researcher might want to answer the question: What percentage of men between the ages of 18 and 35 will change data plans when a new device is introduced. If the researcher is not aware of the previous user project, some of the intermediate tables created might be exact duplicates of the ones created before.

6 Contribution In this work we present: A classification of the most common transformations user in data mining, We propose an extension of the ER Model to keep track of the intermediate tables created, and We introduce a tool designed to: Simplify the use of naming conventions, Keep track of attributes and keys, and Facilitate the recognition of duplicate tables.

7 Building a data set for data mining Building a data mining dataset involves successive rounds of aggregation and denormalization.

8 Note: If the database is not static, the transformation tables must also be updated. This could be resource intensive, and could be left to the last minute, that is, a transformation table is only updated when it is reused. We also limit ourselves to transformations that happen inside the database. Transformations happening outside the, such as those performed by Extract-Transform-Load (ETL) Tools, are not considered here.

9 Model vs Theory Entity-Relationship (ER) Model Entities Relationships Relational Model Tables Foreign Keys Database: DD SS, II where: SS = and II are integrity constraints. SS 1, SS 2,, SS nn are tables The set of tables SS is one of the inputs for the tool we wrote.

10 Well Formed Queries We define a well formed query as one that complies with the following requirements: Always produces a table with a primary key and a potentially empty set of non-key attributes. Each join operator is computed based on a foreign key and primary key from the referencing table and the referenced table, respectively.

11 Database Transformation Queries Goal: To produce a single table XX, that will be used as input for a data mining algorithm. Given SS = SS 1, SS 2,, SS nn a set of source tables and QQ = qq 1, qq 2,, qq mm a sequence of queries (XX is the result of qq mm ). The sequence of queries will produce a set of transformation tables: where XX = TT nn.

12 Data Sets Clearly different projects will require the creation of different data sets. This results in a number of different XX ii, each associated with a query plan QQ ii : QQ 1 = qq 1, qq 2,, qq mm XX 1 QQ kk = qq 1, qq 2,, qq mm XX kk

13 The Transformation Tables In order to allow for easy reuse, transformation tables must incorporate into their metadata: The query that created them An indication of whether the entities come from a source table or another transformation table (provenance). PK PK PK PK PK PK PK Aggregation: T9 CustomerID Promotion SalesOrderID OdrMonth StateProvinceCode TerritoryID MakeFlag MaxProductLine... Style SELECT CustomerID, Promotion, SalesOrderID, OdrMonth, StateProvinceCode, TerritoryID,MakeFlag, max(taxamt) AS TaxGrpByOdrID, max(freight) AS FreightGrpByOdrID, max(totaldue) AS TotalDueGrpByOdrID, sum(orderqty) AS OdrQtyCycleGrpByOdrID, sum(linetotal) AS LineTotGrpByOdrID, sum(standardcost) AS StdCostGrpByOdrID, max(productline) AS MaxProductLine, max(size) AS SizeCycle, max(class) AS ClassCycle, max(style) AS Style INTO T9 from T7 group by CustomerID, Promotion, SalesOrderID, OdrMonth, StateProvinceCode, TerritoryID, MakeFlag; /* CustomerID, Promotion, SalesOrderID, OdrMonth, TaxGrpByOdrID, FreightGrpByOdrID, TotalDueGrpByOdrID, StateProvinceCode, TerritoryID, OdrQtyCycleGrpByOdrID, LineTotGrpByOdrID, MakeFlag, StdCostGrpByOdrID, MaxProductLine, SizeCycle, ClassCycle, Style */ SQL

14 Transformations Proposition: Let TT = TT 1 TT 2 TT nn a transformation on appropriate foreign keys. Every query used to transform TT either: 1. Includes the primary key of TT which comes from some TT ii or 2. it does not include the primary key of TT, but it includes a subset of kk primary keys of kk tables TT ii to later compute group-by aggregations. Proof sketch: All aggregation queries are assumed to have grouping columns in order to identify the object of study. That is, they represent GROUP BY queries in SQL. Therefore every must include the primary key of either joined table in order produce a data set. An aggregation must use keys to group rows (otherwise, records cannot be identified and further processed) and the only available keys are foreign keys.

15 Classification of transformations We distinguish two mutually exclusive database transformations: 1. Denormalization, which brings attributes from other entities into the transformation entity or simply combines existing attributes. 2. Aggregation, which creates a new attribute grouping rows and computing a summarization. Transformations Denormalization Aggregation Direct Derivation Expression Case Count / Sum Max / Min Arithmetic String Date

16 The CASE statement Example: SELECT.. CASE WHEN A1='married' or A2='employed' THEN 1 ELSE 0 END AS binaryvariable.. FROM.. The CASE statement does not have a relational algebra translation. It derives a binary attribute nor present before in the database, and might even introduce NULLS.

17 Sample Database Source: S1 Source: S2 Source: S3 PK FK1 I A1 A2 A3 K1 K2 PK,FK 1 I PK J A4 A5 A6 A7 K3 PK K 2 A8 In this simple example S1 could be a table of transactions, S2 a table pf products and S3 could contain details about the product.

18 Sample Script Entry point /* q0: T0, universe */ SELECT I, /* I is the record id, or point id mathematically */ CASE WHEN A1= married or A2= employed THEN 1 ELSE 0 END AS Y,/* binary target variable */ A3 AS X1 /* 1st variable */ INTO T0 FROM S1; /* q1: denormalize and filter valid records */ SELECT S2.I,S2.J,A4,A5,A6,A7,K2,K3 INTO T1 FROM S1 JOIN S2 ON S1.I=S2.I WHERE A6>10; /* q2: aggregate */ SELECT I, sum(a4) AS X2,sum(A5) AS X3,max(1) AS k /* k is FK */ INTO T2 FROM T1 GROUP BY I; /* q3: get min, max */ SELECT 1 AS k, min(x3) AS minx3, max(x3) as maxx3 INTO T3 FROM T2; /*q4: math transform */ SELECT I, log(x2) AS X2 /* 2nd variable */ (X3-minX3)/(maxX3-minX3) AS X3 /* 3rd variable range [0,1]*/ INTO T4 FROM T2 JOIN T3 ON T2.K=T3.K; /* get the min/max */ /* q5: denormalize, gather attribute from referenced table S3 */ SELECT I,J,A7,A8 INTO T5 FROM T1 JOIN S3 ON T1.K2=S3.K2; /* q6: aggregate with CASE */ SELECT I, sum(case WHEN A7= Y THEN A8 ELSE 0 END) AS X4 INTO T6 FROM T5 GROUP BY I; /* q7: data set, star join this data set can be used for: logistic regression, decision tree, SVM */ SELECT T0.I,X1,X2,X3,X4,Y INTO X FROM T0 JOIN T4 ON T0.I=T4.I JOIN T6 ON T0.I=T6.I; Output

19 Transformations Tool The transformations tool consists of: A query parser that determines if a query is a denormalization or an aggregation, An attribute tracker, that determines the provenance of all attributes, A set of rules to determine keys and foreign keys of the transformation tables. The input for the tool is a query script and a list of source tables. The output is a list of transformation tables with type, attributes, keys and provenances clearly marked.

20 Remarks The tool was written in C++. It requires that the queries be written as regular expressions (for simplicity). The tool does not provide any feedback to the user regarding the queries he is using. In a future iteration we plan to incorporate suggestions regarding naming and existence of similar tables.

21 Tool Development Start Input Script { q, q 2, } Q =, 1 q m Planned Finished No Parse query q i Execute query Write new query to log No Determine type Check DB for similar table Exists? Update Entity Model with new table Last query? Normalize names The program should create a database of queries. Yes Rename table and entities (if needed) Yes Create Query Plan Flow Chart Finish X

22 Tool Output Denormalization: T0(I,Y,X1, PK(I), FK(S1.I)); Denormalization: T1(I,J,A4,A5,A6,A7,K2,K3, PK(I,J), FK(S2.I,S2.J),FK(S3.K2)); Aggregation: T2(I,X2,X3,K, PK(I), FK(S1.I)); Aggregation: T3(K,minX3,maxX3,PK(K)); Aggregation: T4(I,X2,X3,PK(I),FK(S1.I)); Denormalization: T5(I,J,A7,A8, PK(I,J), FK(S2.I,S2.J)); Aggregation: T6(I,X4, PK(I), FK(S1.I)); Denormalization: X(I,X1,X2,X3,X4,Y, PK(I), FK(S1.I));

23 Program Detail Script SELECT I, CASE WHEN (A1= married or A2= employed ) THEN 1 ELSE 0 END AS Y, A3 AS X1 INTO TABLE0 FROM S1; Output Denormalization: T0(I,Y,X1,PK(I),FK(S1.I)); The output of the code identifies the type of transformation (denormalization or aggregation), the attributes present in the new table as well as information about keys and foreign keys. Furthermore, it changes the name of the table to a normalized name.

24 Denormalization PK S1 I Denormalization: T0 PK,FK 1 I S2 A1 A2 A3 K1 PK,FK 1 I PK J FK2 A4 A5 A6 A7 K2 K3 SELECT I,CASE WHEN A1='married' or A2='employed' THEN 1 ELSE 0 END AS Y,A3 AS X1 INTO T0 FROM S1; PK S3 K2 A8 SELECT S2.I, S2.J, A4, A5, A6, A7, K2, K3 INTO T1 FROM S1 JOIN S2 ON S1.I = S2.I WHERE A6>10; Denormalization: T1 PK,FK 1 I PK,FK 1 J FK2 A4 A5 A6 A7 K2 K3 SQL Y X1 SQL

25 Aggregation PK S1 I A1 A2 A3 K1 Aggregation: T6 PK,FK 1 I Aggregation: T2 PK,FK1 I K X2 X3 SQL S2 PK,FK 1 I PK J FK2 A4 A5 A6 A7 K2 K3 S3 PK K2 A8 X4 SQL SELECT I, sum(a4) AS X2, sum(a5) AS X3, max(1) AS K INTO T2 FROM T1 GROUP BY I; SELECT I, sum(case WHEN A7='Y' THEN A8 ELSE 0 END) AS X4 INTO T6 FROM T5 GROUP BY I;

26 Future Extensions We need to extend the program to search the database for transformation tables that might have already been created. Incorporate it as a plugin of a major. This would allow considerable savings in time and resources when preparing datasets. Create a plugin for a modeling software to show the new tables created, as well as the metadata stored when using the program. Introduce a work-flow chart for the query plan.

27 Conclusions Minimal extension to the ER model to represent data transformations in an ER diagram. Introduced an algorithm to extend an existing ER model, keeping the data set in mind as the final goal. Help analysts reuse existing tables or views. Help understanding complex SQL queries at a high level. Our work bridges the gap between a logical database model represented by a standard ER model and a physical database model represented by SQL queries.

28 The AdventureWorks Database

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner 24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India

More information

Instant SQL Programming

Instant SQL Programming Instant SQL Programming Joe Celko Wrox Press Ltd. INSTANT Table of Contents Introduction 1 What Can SQL Do for Me? 2 Who Should Use This Book? 2 How To Use This Book 3 What You Should Know 3 Conventions

More information

Database Design for the Uninitiated CDS Brownbag Series CDS

Database Design for the Uninitiated CDS Brownbag Series CDS Database Design for the Uninitiated Paul Litwin FHCRC Collaborative Data Services 1 CDS Brownbag Series This is the ninth in a series of seminars Materials for the series can be downloaded from www.deeptraining.com/fhcrc

More information

SQL SELECT Query: Intermediate

SQL SELECT Query: Intermediate SQL SELECT Query: Intermediate IT 4153 Advanced Database J.G. Zheng Spring 2012 Overview SQL Select Expression Alias revisit Aggregate functions - complete Table join - complete Sub-query in where Limiting

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database. Physical Design Physical Database Design (Defined): Process of producing a description of the implementation of the database on secondary storage; it describes the base relations, file organizations, and

More information

In This Lecture. SQL Data Definition SQL SQL. Notes. Non-Procedural Programming. Database Systems Lecture 5 Natasha Alechina

In This Lecture. SQL Data Definition SQL SQL. Notes. Non-Procedural Programming. Database Systems Lecture 5 Natasha Alechina This Lecture Database Systems Lecture 5 Natasha Alechina The language, the relational model, and E/R diagrams CREATE TABLE Columns Primary Keys Foreign Keys For more information Connolly and Begg chapter

More information

Normalization. Reduces the liklihood of anomolies

Normalization. Reduces the liklihood of anomolies Normalization Normalization Tables are important, but properly designing them is even more important so the DBMS can do its job Normalization the process for evaluating and correcting table structures

More information

Fundamentals of Database Design

Fundamentals of Database Design Fundamentals of Database Design Zornitsa Zaharieva CERN Data Management Section - Controls Group Accelerators and Beams Department /AB-CO-DM/ 23-FEB-2005 Contents : Introduction to Databases : Main Database

More information

WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math

WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Textbook Correlation WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Following Directions Unit FIRST QUARTER AND SECOND QUARTER Logic Unit

More information

SQL Server. 2012 for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

SQL Server. 2012 for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach TRAINING & REFERENCE murach's SQL Server 2012 for developers Bryan Syverson Joel Murach Mike Murach & Associates, Inc. 4340 N. Knoll Ave. Fresno, CA 93722 www.murach.com murachbooks@murach.com Expanded

More information

Reflections on Agile DW by a Business Analytics Practitioner. Werner Engelen Principal Business Analytics Architect

Reflections on Agile DW by a Business Analytics Practitioner. Werner Engelen Principal Business Analytics Architect Reflections on Agile DW by a Business Analytics Practitioner Werner Engelen Principal Business Analytics Architect Introduction Werner Engelen Active in BI & DW since 1998 + 6 years at element61 Previously:

More information

Creation of Datasets for Data Mining Analysis by Using Horizontal Aggregation in SQL

Creation of Datasets for Data Mining Analysis by Using Horizontal Aggregation in SQL International Journal of Computer Applications in Engineering Sciences [VOL III, ISSUE I, MARCH 2013] [ISSN: 2231-4946] Creation of Datasets for Data Mining Analysis by Using Horizontal Aggregation in

More information

SQL SERVER TRAINING CURRICULUM

SQL SERVER TRAINING CURRICULUM SQL SERVER TRAINING CURRICULUM Complete SQL Server 2000/2005 for Developers Management and Administration Overview Creating databases and transaction logs Managing the file system Server and database configuration

More information

Chapter 7: Data Mining

Chapter 7: Data Mining Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain

More information

IT2304: Database Systems 1 (DBS 1)

IT2304: Database Systems 1 (DBS 1) : Database Systems 1 (DBS 1) (Compulsory) 1. OUTLINE OF SYLLABUS Topic Minimum number of hours Introduction to DBMS 07 Relational Data Model 03 Data manipulation using Relational Algebra 06 Data manipulation

More information

Handling Missing Values in the SQL Procedure

Handling Missing Values in the SQL Procedure Handling Missing Values in the SQL Procedure Danbo Yi, Abt Associates Inc., Cambridge, MA Lei Zhang, Domain Solutions Corp., Cambridge, MA ABSTRACT PROC SQL as a powerful database management tool provides

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

SQL SERVER SELF-SERVICE BI WITH MICROSOFT EXCEL

SQL SERVER SELF-SERVICE BI WITH MICROSOFT EXCEL SQL SERVER SELF-SERVICE BI WITH MICROSOFT EXCEL JULY 2, 2015 SLIDE 1 Data Sources OVERVIEW OF AN ENTERPRISE BI SOLUTION Reporting and Analysis Data Cleansi ng Data Models JULY 2, 2015 SLIDE 2 Master Data

More information

Tips and techniques to improve DB2 Web Query for i performance and productivity

Tips and techniques to improve DB2 Web Query for i performance and productivity Tips and techniques to improve DB2 Web Query for i performance and productivity Jackie Jansen Information Builders jackie_jansen@ibi.com 2012 Wellesley Information Services. All rights reserved. Agenda

More information

TIM 50 - Business Information Systems

TIM 50 - Business Information Systems TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz March 1, 2015 The Database Approach to Data Management Database: Collection of related files containing records on people, places, or things.

More information

MOC 20461C: Querying Microsoft SQL Server. Course Overview

MOC 20461C: Querying Microsoft SQL Server. Course Overview MOC 20461C: Querying Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to query Microsoft SQL Server. Students will learn about T-SQL querying, SQL Server

More information

BO Universe Design Best Practices

BO Universe Design Best Practices White Paper BO Universe Design Best Practices Abstract According to the book of Genesis, the universe was created in seven days. The following day-by-day analogy provides a best practices guide for the

More information

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig Contents Acknowledgements... 1 Introduction to Hive and Pig... 2 Setup... 2 Exercise 1 Load Avro data into HDFS... 2 Exercise 2 Define an

More information

Beginning C# 5.0. Databases. Vidya Vrat Agarwal. Second Edition

Beginning C# 5.0. Databases. Vidya Vrat Agarwal. Second Edition Beginning C# 5.0 Databases Second Edition Vidya Vrat Agarwal Contents J About the Author About the Technical Reviewer Acknowledgments Introduction xviii xix xx xxi Part I: Understanding Tools and Fundamentals

More information

Oracle Data Miner (Extension of SQL Developer 4.0)

Oracle Data Miner (Extension of SQL Developer 4.0) An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,

More information

IT2305 Database Systems I (Compulsory)

IT2305 Database Systems I (Compulsory) Database Systems I (Compulsory) INTRODUCTION This is one of the 4 modules designed for Semester 2 of Bachelor of Information Technology Degree program. CREDITS: 04 LEARNING OUTCOMES On completion of this

More information

3. Relational Model and Relational Algebra

3. Relational Model and Relational Algebra ECS-165A WQ 11 36 3. Relational Model and Relational Algebra Contents Fundamental Concepts of the Relational Model Integrity Constraints Translation ER schema Relational Database Schema Relational Algebra

More information

Lecture #11 Relational Database Systems KTH ROYAL INSTITUTE OF TECHNOLOGY

Lecture #11 Relational Database Systems KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture #11 Relational Database Systems KTH ROYAL INSTITUTE OF TECHNOLOGY Contents Storing data Relational Database Systems Entity Relationship diagrams Normalisation of ER diagrams Tuple Relational Calculus

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional

More information

The software shall provide the necessary tools to allow a user to create a Dashboard based on the queries created.

The software shall provide the necessary tools to allow a user to create a Dashboard based on the queries created. IWS BI Dashboard Template User Guide Introduction This document describes the features of the Dashboard Template application, and contains a manual the user can follow to use the application, connecting

More information

1. Dimensional Data Design - Data Mart Life Cycle

1. Dimensional Data Design - Data Mart Life Cycle 1. Dimensional Data Design - Data Mart Life Cycle 1.1. Introduction A data mart is a persistent physical store of operational and aggregated data statistically processed data that supports businesspeople

More information

Relational Database: Additional Operations on Relations; SQL

Relational Database: Additional Operations on Relations; SQL Relational Database: Additional Operations on Relations; SQL Greg Plaxton Theory in Programming Practice, Fall 2005 Department of Computer Science University of Texas at Austin Overview The course packet

More information

Design and Implementation

Design and Implementation Pro SQL Server 2012 Relational Database Design and Implementation Louis Davidson with Jessica M. Moss Apress- Contents Foreword About the Author About the Technical Reviewer Acknowledgments Introduction

More information

Wave Analytics Data Integration

Wave Analytics Data Integration Wave Analytics Data Integration Salesforce, Spring 16 @salesforcedocs Last updated: April 28, 2016 Copyright 2000 2016 salesforce.com, inc. All rights reserved. Salesforce is a registered trademark of

More information

Vendor: Crystal Decisions Product: Crystal Reports and Crystal Enterprise

Vendor: Crystal Decisions Product: Crystal Reports and Crystal Enterprise 1 Ability to access the database platforms desired (text, spreadsheet, Oracle, Sybase and other databases, OLAP engines.) Y Y 2 Ability to access relational data base Y Y 3 Ability to access dimensional

More information

Relational Database Basics Review

Relational Database Basics Review Relational Database Basics Review IT 4153 Advanced Database J.G. Zheng Spring 2012 Overview Database approach Database system Relational model Database development 2 File Processing Approaches Based on

More information

DBMS / Business Intelligence, SQL Server

DBMS / Business Intelligence, SQL Server DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.

More information

IBM WebSphere DataStage Online training from Yes-M Systems

IBM WebSphere DataStage Online training from Yes-M Systems Yes-M Systems offers the unique opportunity to aspiring fresher s and experienced professionals to get real time experience in ETL Data warehouse tool IBM DataStage. Course Description With this training

More information

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot www.etidaho.com (208) 327-0768 Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot 3 Days About this Course This course is designed for the end users and analysts that

More information

LearnFromGuru Polish your knowledge

LearnFromGuru Polish your knowledge SQL SERVER 2008 R2 /2012 (TSQL/SSIS/ SSRS/ SSAS BI Developer TRAINING) Module: I T-SQL Programming and Database Design An Overview of SQL Server 2008 R2 / 2012 Available Features and Tools New Capabilities

More information

CHAPTER 6 DATABASE MANAGEMENT SYSTEMS. Learning Objectives

CHAPTER 6 DATABASE MANAGEMENT SYSTEMS. Learning Objectives CHAPTER 6 DATABASE MANAGEMENT SYSTEMS Management Information Systems, 10 th edition, By Raymond McLeod, Jr. and George P. Schell 2007, Prentice Hall, Inc. 1 Learning Objectives Understand the hierarchy

More information

ETL PROCESS IN DATA WAREHOUSE

ETL PROCESS IN DATA WAREHOUSE ETL PROCESS IN DATA WAREHOUSE OUTLINE ETL : Extraction, Transformation, Loading Capture/Extract Scrub or data cleansing Transform Load and Index ETL OVERVIEW Extraction Transformation Loading ETL ETL is

More information

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution Warehouse and Business Intelligence : Challenges, Best Practices & the Solution Prepared by datagaps http://www.datagaps.com http://www.youtube.com/datagaps http://www.twitter.com/datagaps Contact contact@datagaps.com

More information

Building In-Database Predictive Scoring Model: Check Fraud Detection Case Study

Building In-Database Predictive Scoring Model: Check Fraud Detection Case Study Building In-Database Predictive Scoring Model: Check Fraud Detection Case Study Jay Zhou, Ph.D. Business Data Miners, LLC 978-726-3182 jzhou@businessdataminers.com Web Site: www.businessdataminers.com

More information

Graphical Web based Tool for Generating Query from Star Schema

Graphical Web based Tool for Generating Query from Star Schema Graphical Web based Tool for Generating Query from Star Schema Mohammed Anbar a, Ku Ruhana Ku-Mahamud b a College of Arts and Sciences Universiti Utara Malaysia, 0600 Sintok, Kedah, Malaysia Tel: 604-2449604

More information

The Analyst's Perspective: Advanced BI with PowerPivot DAX, SharePoint Dashboards, and SQL Data Mining

The Analyst's Perspective: Advanced BI with PowerPivot DAX, SharePoint Dashboards, and SQL Data Mining The Analyst's Perspective: Advanced BI with PowerPivot DAX, SharePoint Dashboards, and SQL Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.com 1 1 Objectives

More information

Oracle SQL. Course Summary. Duration. Objectives

Oracle SQL. Course Summary. Duration. Objectives Oracle SQL Course Summary Identify the major structural components of the Oracle Database 11g Create reports of aggregated data Write SELECT statements that include queries Retrieve row and column data

More information

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Jasna S MTech Student TKM College of engineering Kollam Manu J Pillai Assistant Professor

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute

More information

Distributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu

Distributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu Distributed Aggregation in Cloud Databases By: Aparna Tiwari tiwaria@umail.iu.edu ABSTRACT Data intensive applications rely heavily on aggregation functions for extraction of data according to user requirements.

More information

Business Intelligence Tutorial: Introduction to the Data Warehouse Center

Business Intelligence Tutorial: Introduction to the Data Warehouse Center IBM DB2 Universal Database Business Intelligence Tutorial: Introduction to the Data Warehouse Center Version 8 IBM DB2 Universal Database Business Intelligence Tutorial: Introduction to the Data Warehouse

More information

SQL Server 2005. Introduction to SQL Server 2005. SQL Server 2005 basic tools. SQL Server Configuration Manager. SQL Server services management

SQL Server 2005. Introduction to SQL Server 2005. SQL Server 2005 basic tools. SQL Server Configuration Manager. SQL Server services management Database and data mining group, SQL Server 2005 Introduction to SQL Server 2005 Introduction to SQL Server 2005-1 Database and data mining group, SQL Server 2005 basic tools SQL Server Configuration Manager

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,

More information

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the

More information

SQL Server An Overview

SQL Server An Overview SQL Server An Overview SQL Server Microsoft SQL Server is designed to work effectively in a number of environments: As a two-tier or multi-tier client/server database system As a desktop database system

More information

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design Physical Database Design Process Physical Database Design Process The last stage of the database design process. A process of mapping the logical database structure developed in previous stages into internal

More information

Oracle Database Development Standards For DNR Staff and Contractors. Table of Contents

Oracle Database Development Standards For DNR Staff and Contractors. Table of Contents Oracle Database Development Standards For DNR Staff and Contractors Table of Contents INTRODUCTION...2 DATABASE ORGANIZATION...2 DATABASE PROCEDURES...3 Development...3 Testing...3 Production Release...4

More information

IT-Pruefungen.de. Hochwertige Qualität, neueste Prüfungsunterlagen. http://www.it-pruefungen.de

IT-Pruefungen.de. Hochwertige Qualität, neueste Prüfungsunterlagen. http://www.it-pruefungen.de IT-Pruefungen.de Hochwertige Qualität, neueste Prüfungsunterlagen http://www.it-pruefungen.de Exam : 70-452 Title : PRO:MS SQL Server@ 2008, Designing a Business Intelligence Version : Demo 1. You design

More information

Ad Hoc Advanced Table of Contents

Ad Hoc Advanced Table of Contents Ad Hoc Advanced Table of Contents Functions... 1 Adding a Function to the Adhoc Query:... 1 Constant... 2 Coalesce... 4 Concatenate... 6 Add/Subtract... 7 Logical Expressions... 8 Creating a Logical Expression:...

More information

The process of database development. Logical model: relational DBMS. Relation

The process of database development. Logical model: relational DBMS. Relation The process of database development Reality (Universe of Discourse) Relational Databases and SQL Basic Concepts The 3rd normal form Structured Query Language (SQL) Conceptual model (e.g. Entity-Relationship

More information

SQL Server 2012 Business Intelligence Boot Camp

SQL Server 2012 Business Intelligence Boot Camp SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations

More information

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries

More information

6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization 6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization Animals(name,age,species,cageno,keptby,feedtime) Keeper(id,name)

More information

The Benefits of Data Modeling in Data Warehousing

The Benefits of Data Modeling in Data Warehousing WHITE PAPER: THE BENEFITS OF DATA MODELING IN DATA WAREHOUSING The Benefits of Data Modeling in Data Warehousing NOVEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2 SECTION 2

More information

Vendor: Brio Software Product: Brio Performance Suite

Vendor: Brio Software Product: Brio Performance Suite 1 Ability to access the database platforms desired (text, spreadsheet, Oracle, Sybase and other databases, OLAP engines.) yes yes Brio is recognized for it Universal database access. Any source that is

More information

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB Outline Database concepts Conceptual Design Logical Design Communicating with the RDBMS 2 Some concepts Database: an

More information

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We

More information

Developing Web Applications for Microsoft SQL Server Databases - What you need to know

Developing Web Applications for Microsoft SQL Server Databases - What you need to know Developing Web Applications for Microsoft SQL Server Databases - What you need to know ATEC2008 Conference Session Description Alpha Five s web components simplify working with SQL databases, but what

More information

Sisense. Product Highlights. www.sisense.com

Sisense. Product Highlights. www.sisense.com Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze

More information

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle Outlines Business Intelligence Lecture 15 Why integrate BI into your smart client application? Integrating Mining into your application Integrating into your application What Is Business Intelligence?

More information

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many

More information

7 Steps to Successful Data Blending for Excel

7 Steps to Successful Data Blending for Excel COOKBOOK SERIES 7 Steps to Successful Data Blending for Excel What is Data Blending? The evolution of self-service analytics is upon us. What started out as a means to an end for a data analyst who dealt

More information

Oracle OLAP. Describing Data Validation Plug-in for Analytic Workspace Manager. Product Support

Oracle OLAP. Describing Data Validation Plug-in for Analytic Workspace Manager. Product Support Oracle OLAP Data Validation Plug-in for Analytic Workspace Manager User s Guide E18663-01 January 2011 Data Validation Plug-in for Analytic Workspace Manager provides tests to quickly find conditions in

More information

QlikView 11.2 SR5 DIRECT DISCOVERY

QlikView 11.2 SR5 DIRECT DISCOVERY QlikView 11.2 SR5 DIRECT DISCOVERY FAQ and What s New Published: November, 2012 Version: 5.0 Last Updated: December, 2013 www.qlikview.com 1 What s New in Direct Discovery 11.2 SR5? Direct discovery in

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

More information

Data Hierarchy. Traditional File based Approach. Hierarchy of Data for a Computer-Based File

Data Hierarchy. Traditional File based Approach. Hierarchy of Data for a Computer-Based File Management Information Systems Data and Knowledge Management Dr. Shankar Sundaresan (Adapted from Introduction to IS, Rainer and Turban) LEARNING OBJECTIVES Recognize the importance of data, issues involved

More information

Technology WHITE PAPER

Technology WHITE PAPER Technology WHITE PAPER What We Do Neota Logic builds software with which the knowledge of experts can be delivered in an operationally useful form as applications embedded in business systems or consulted

More information

SQL Server 2008 Core Skills. Gary Young 2011

SQL Server 2008 Core Skills. Gary Young 2011 SQL Server 2008 Core Skills Gary Young 2011 Confucius I hear and I forget I see and I remember I do and I understand Core Skills Syllabus Theory of relational databases SQL Server tools Getting help Data

More information

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc. PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4

More information

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. White Paper Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. Contents Data Management: Why It s So Essential... 1 The Basics of Data Preparation... 1 1: Simplify Access

More information

BRIO QUERY FUNCTIONALITY IN COMPARISION TO CRYSTAL REPORTS

BRIO QUERY FUNCTIONALITY IN COMPARISION TO CRYSTAL REPORTS BRIO QUERY FUNCTIONALITY IN COMPARISION TO CRYSTAL REPORTS Category Downstream Analysis Nested Queries Brio Functionality Ability to create data sets Ability to create tables and upload tables Available

More information

Microsoft Access 3: Understanding and Creating Queries

Microsoft Access 3: Understanding and Creating Queries Microsoft Access 3: Understanding and Creating Queries In Access Level 2, we learned how to perform basic data retrievals by using Search & Replace functions and Sort & Filter functions. For more complex

More information

Using SQL Server Management Studio

Using SQL Server Management Studio Using SQL Server Management Studio Microsoft SQL Server Management Studio 2005 is a graphical tool for database designer or programmer. With SQL Server Management Studio 2005 you can: Create databases

More information

Distance Learning and Examining Systems

Distance Learning and Examining Systems Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed

More information

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24 Data Federation Administration Tool Guide Content 1 What's new in the.... 5 2 Introduction to administration

More information

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file? Files What s it all about? Information being stored about anything important to the business/individual keeping the files. The simple concepts used in the operation of manual files are often a good guide

More information

Framing Business Problems as Data Mining Problems

Framing Business Problems as Data Mining Problems Framing Business Problems as Data Mining Problems Asoka Diggs Data Scientist, Intel IT January 21, 2016 Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS

More information

ICAB4136B Use structured query language to create database structures and manipulate data

ICAB4136B Use structured query language to create database structures and manipulate data ICAB4136B Use structured query language to create database structures and manipulate data Release: 1 ICAB4136B Use structured query language to create database structures and manipulate data Modification

More information

Oracle Database: SQL and PL/SQL Fundamentals

Oracle Database: SQL and PL/SQL Fundamentals Oracle University Contact Us: 1.800.529.0165 Oracle Database: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn This course is designed to deliver the fundamentals of SQL and PL/SQL along

More information

Programming with SQL

Programming with SQL Unit 43: Programming with SQL Learning Outcomes A candidate following a programme of learning leading to this unit will be able to: Create queries to retrieve information from relational databases using

More information

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015 COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt

More information

Database Design and Database Programming with SQL - 5 Day In Class Event Day 1 Activity Start Time Length

Database Design and Database Programming with SQL - 5 Day In Class Event Day 1 Activity Start Time Length Database Design and Database Programming with SQL - 5 Day In Class Event Day 1 Welcome & Introductions 9:00 AM 20 Lecture 9:20 AM 40 Practice 10:00 AM 20 Lecture 10:20 AM 40 Practice 11:15 AM 30 Lecture

More information

Experience, Not Metrics

Experience, Not Metrics Part 7: Consolidating Test Results User Experience, Not Metrics by: R. Scott Barber You ve been running this test for weeks and sending me charts almost every day, but what does it all mean?!? If your

More information

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/ Model Deployment Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Model Deployment Creation of the model is generally not the end of the project.

More information

Database Design Standards. U.S. Small Business Administration Office of the Chief Information Officer Office of Information Systems Support

Database Design Standards. U.S. Small Business Administration Office of the Chief Information Officer Office of Information Systems Support Database Design Standards U.S. Small Business Administration Office of the Chief Information Officer Office of Information Systems Support TABLE OF CONTENTS CHAPTER PAGE NO 1. Standards and Conventions

More information

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project Janet Delve, University of Portsmouth Kuldar Aas, National Archives of Estonia Rainer Schmidt, Austrian Institute

More information

Types of Software Testing (E-ams) For MBA Programs and CAD12

Types of Software Testing (E-ams) For MBA Programs and CAD12 GRADE 12 Grade 12: Term 1-10 weeks/40 hours Data and Information Management: Database design and concepts (±1 week/4 hours) - Relational database overview o Normalisation (overview and purpose) to reduce

More information