Capturing Database Transformations for Big Data Analytics
|
|
- Audra Nichols
- 8 years ago
- Views:
Transcription
1 Capturing Database Transformations for Big Data Analytics David Sergio Matusevich University of Houston
2 Organization Introduction Classification Program Case Study Conclusions
3 Extending ER Models to Capture Database Transformations to Build Data Sets for Data Mining Analysis Carlos Ordonez Sofian Maabout David Sergio Matusevich Wellington Cabrera Carlos Ordonez, Sofian Maabout, David Sergio Matusevich, Wellington Cabrera. Extending ER Models to Capture Database Transformations to Build Data Sets for Data Mining, Data & Knowledge Engineering (DKE), 2014, Elsevier.
4 Data Mining Projects Data mining projects usually require the preparation of a dataset specially created for answering the particular question asked by the user. For example, given a database of a cellular phone company, we might ask: What percentage of users will change data plans with the advent of a new smartphone. This will require the creation of a dataset where at least one of the columns will be CHANGED_PLAN and another one could be BEFORE_NEW_DEVICE. Many of the intermediate tables created for this project might remain in the database, using resources.
5 Motivation: Saving Work Different users might ask similar questions, leading to the creation of tables that are virtually identical, cluttering the system. For instance a researcher might want to answer the question: What percentage of men between the ages of 18 and 35 will change data plans when a new device is introduced. If the researcher is not aware of the previous user project, some of the intermediate tables created might be exact duplicates of the ones created before.
6 Contribution In this work we present: A classification of the most common transformations user in data mining, We propose an extension of the ER Model to keep track of the intermediate tables created, and We introduce a tool designed to: Simplify the use of naming conventions, Keep track of attributes and keys, and Facilitate the recognition of duplicate tables.
7 Building a data set for data mining Building a data mining dataset involves successive rounds of aggregation and denormalization.
8 Note: If the database is not static, the transformation tables must also be updated. This could be resource intensive, and could be left to the last minute, that is, a transformation table is only updated when it is reused. We also limit ourselves to transformations that happen inside the database. Transformations happening outside the, such as those performed by Extract-Transform-Load (ETL) Tools, are not considered here.
9 Model vs Theory Entity-Relationship (ER) Model Entities Relationships Relational Model Tables Foreign Keys Database: DD SS, II where: SS = and II are integrity constraints. SS 1, SS 2,, SS nn are tables The set of tables SS is one of the inputs for the tool we wrote.
10 Well Formed Queries We define a well formed query as one that complies with the following requirements: Always produces a table with a primary key and a potentially empty set of non-key attributes. Each join operator is computed based on a foreign key and primary key from the referencing table and the referenced table, respectively.
11 Database Transformation Queries Goal: To produce a single table XX, that will be used as input for a data mining algorithm. Given SS = SS 1, SS 2,, SS nn a set of source tables and QQ = qq 1, qq 2,, qq mm a sequence of queries (XX is the result of qq mm ). The sequence of queries will produce a set of transformation tables: where XX = TT nn.
12 Data Sets Clearly different projects will require the creation of different data sets. This results in a number of different XX ii, each associated with a query plan QQ ii : QQ 1 = qq 1, qq 2,, qq mm XX 1 QQ kk = qq 1, qq 2,, qq mm XX kk
13 The Transformation Tables In order to allow for easy reuse, transformation tables must incorporate into their metadata: The query that created them An indication of whether the entities come from a source table or another transformation table (provenance). PK PK PK PK PK PK PK Aggregation: T9 CustomerID Promotion SalesOrderID OdrMonth StateProvinceCode TerritoryID MakeFlag MaxProductLine... Style SELECT CustomerID, Promotion, SalesOrderID, OdrMonth, StateProvinceCode, TerritoryID,MakeFlag, max(taxamt) AS TaxGrpByOdrID, max(freight) AS FreightGrpByOdrID, max(totaldue) AS TotalDueGrpByOdrID, sum(orderqty) AS OdrQtyCycleGrpByOdrID, sum(linetotal) AS LineTotGrpByOdrID, sum(standardcost) AS StdCostGrpByOdrID, max(productline) AS MaxProductLine, max(size) AS SizeCycle, max(class) AS ClassCycle, max(style) AS Style INTO T9 from T7 group by CustomerID, Promotion, SalesOrderID, OdrMonth, StateProvinceCode, TerritoryID, MakeFlag; /* CustomerID, Promotion, SalesOrderID, OdrMonth, TaxGrpByOdrID, FreightGrpByOdrID, TotalDueGrpByOdrID, StateProvinceCode, TerritoryID, OdrQtyCycleGrpByOdrID, LineTotGrpByOdrID, MakeFlag, StdCostGrpByOdrID, MaxProductLine, SizeCycle, ClassCycle, Style */ SQL
14 Transformations Proposition: Let TT = TT 1 TT 2 TT nn a transformation on appropriate foreign keys. Every query used to transform TT either: 1. Includes the primary key of TT which comes from some TT ii or 2. it does not include the primary key of TT, but it includes a subset of kk primary keys of kk tables TT ii to later compute group-by aggregations. Proof sketch: All aggregation queries are assumed to have grouping columns in order to identify the object of study. That is, they represent GROUP BY queries in SQL. Therefore every must include the primary key of either joined table in order produce a data set. An aggregation must use keys to group rows (otherwise, records cannot be identified and further processed) and the only available keys are foreign keys.
15 Classification of transformations We distinguish two mutually exclusive database transformations: 1. Denormalization, which brings attributes from other entities into the transformation entity or simply combines existing attributes. 2. Aggregation, which creates a new attribute grouping rows and computing a summarization. Transformations Denormalization Aggregation Direct Derivation Expression Case Count / Sum Max / Min Arithmetic String Date
16 The CASE statement Example: SELECT.. CASE WHEN A1='married' or A2='employed' THEN 1 ELSE 0 END AS binaryvariable.. FROM.. The CASE statement does not have a relational algebra translation. It derives a binary attribute nor present before in the database, and might even introduce NULLS.
17 Sample Database Source: S1 Source: S2 Source: S3 PK FK1 I A1 A2 A3 K1 K2 PK,FK 1 I PK J A4 A5 A6 A7 K3 PK K 2 A8 In this simple example S1 could be a table of transactions, S2 a table pf products and S3 could contain details about the product.
18 Sample Script Entry point /* q0: T0, universe */ SELECT I, /* I is the record id, or point id mathematically */ CASE WHEN A1= married or A2= employed THEN 1 ELSE 0 END AS Y,/* binary target variable */ A3 AS X1 /* 1st variable */ INTO T0 FROM S1; /* q1: denormalize and filter valid records */ SELECT S2.I,S2.J,A4,A5,A6,A7,K2,K3 INTO T1 FROM S1 JOIN S2 ON S1.I=S2.I WHERE A6>10; /* q2: aggregate */ SELECT I, sum(a4) AS X2,sum(A5) AS X3,max(1) AS k /* k is FK */ INTO T2 FROM T1 GROUP BY I; /* q3: get min, max */ SELECT 1 AS k, min(x3) AS minx3, max(x3) as maxx3 INTO T3 FROM T2; /*q4: math transform */ SELECT I, log(x2) AS X2 /* 2nd variable */ (X3-minX3)/(maxX3-minX3) AS X3 /* 3rd variable range [0,1]*/ INTO T4 FROM T2 JOIN T3 ON T2.K=T3.K; /* get the min/max */ /* q5: denormalize, gather attribute from referenced table S3 */ SELECT I,J,A7,A8 INTO T5 FROM T1 JOIN S3 ON T1.K2=S3.K2; /* q6: aggregate with CASE */ SELECT I, sum(case WHEN A7= Y THEN A8 ELSE 0 END) AS X4 INTO T6 FROM T5 GROUP BY I; /* q7: data set, star join this data set can be used for: logistic regression, decision tree, SVM */ SELECT T0.I,X1,X2,X3,X4,Y INTO X FROM T0 JOIN T4 ON T0.I=T4.I JOIN T6 ON T0.I=T6.I; Output
19 Transformations Tool The transformations tool consists of: A query parser that determines if a query is a denormalization or an aggregation, An attribute tracker, that determines the provenance of all attributes, A set of rules to determine keys and foreign keys of the transformation tables. The input for the tool is a query script and a list of source tables. The output is a list of transformation tables with type, attributes, keys and provenances clearly marked.
20 Remarks The tool was written in C++. It requires that the queries be written as regular expressions (for simplicity). The tool does not provide any feedback to the user regarding the queries he is using. In a future iteration we plan to incorporate suggestions regarding naming and existence of similar tables.
21 Tool Development Start Input Script { q, q 2, } Q =, 1 q m Planned Finished No Parse query q i Execute query Write new query to log No Determine type Check DB for similar table Exists? Update Entity Model with new table Last query? Normalize names The program should create a database of queries. Yes Rename table and entities (if needed) Yes Create Query Plan Flow Chart Finish X
22 Tool Output Denormalization: T0(I,Y,X1, PK(I), FK(S1.I)); Denormalization: T1(I,J,A4,A5,A6,A7,K2,K3, PK(I,J), FK(S2.I,S2.J),FK(S3.K2)); Aggregation: T2(I,X2,X3,K, PK(I), FK(S1.I)); Aggregation: T3(K,minX3,maxX3,PK(K)); Aggregation: T4(I,X2,X3,PK(I),FK(S1.I)); Denormalization: T5(I,J,A7,A8, PK(I,J), FK(S2.I,S2.J)); Aggregation: T6(I,X4, PK(I), FK(S1.I)); Denormalization: X(I,X1,X2,X3,X4,Y, PK(I), FK(S1.I));
23 Program Detail Script SELECT I, CASE WHEN (A1= married or A2= employed ) THEN 1 ELSE 0 END AS Y, A3 AS X1 INTO TABLE0 FROM S1; Output Denormalization: T0(I,Y,X1,PK(I),FK(S1.I)); The output of the code identifies the type of transformation (denormalization or aggregation), the attributes present in the new table as well as information about keys and foreign keys. Furthermore, it changes the name of the table to a normalized name.
24 Denormalization PK S1 I Denormalization: T0 PK,FK 1 I S2 A1 A2 A3 K1 PK,FK 1 I PK J FK2 A4 A5 A6 A7 K2 K3 SELECT I,CASE WHEN A1='married' or A2='employed' THEN 1 ELSE 0 END AS Y,A3 AS X1 INTO T0 FROM S1; PK S3 K2 A8 SELECT S2.I, S2.J, A4, A5, A6, A7, K2, K3 INTO T1 FROM S1 JOIN S2 ON S1.I = S2.I WHERE A6>10; Denormalization: T1 PK,FK 1 I PK,FK 1 J FK2 A4 A5 A6 A7 K2 K3 SQL Y X1 SQL
25 Aggregation PK S1 I A1 A2 A3 K1 Aggregation: T6 PK,FK 1 I Aggregation: T2 PK,FK1 I K X2 X3 SQL S2 PK,FK 1 I PK J FK2 A4 A5 A6 A7 K2 K3 S3 PK K2 A8 X4 SQL SELECT I, sum(a4) AS X2, sum(a5) AS X3, max(1) AS K INTO T2 FROM T1 GROUP BY I; SELECT I, sum(case WHEN A7='Y' THEN A8 ELSE 0 END) AS X4 INTO T6 FROM T5 GROUP BY I;
26 Future Extensions We need to extend the program to search the database for transformation tables that might have already been created. Incorporate it as a plugin of a major. This would allow considerable savings in time and resources when preparing datasets. Create a plugin for a modeling software to show the new tables created, as well as the metadata stored when using the program. Introduce a work-flow chart for the query plan.
27 Conclusions Minimal extension to the ER model to represent data transformations in an ER diagram. Introduced an algorithm to extend an existing ER model, keeping the data set in mind as the final goal. Help analysts reuse existing tables or views. Help understanding complex SQL queries at a high level. Our work bridges the gap between a logical database model represented by a standard ER model and a physical database model represented by SQL queries.
28 The AdventureWorks Database
Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner
24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India
More informationInstant SQL Programming
Instant SQL Programming Joe Celko Wrox Press Ltd. INSTANT Table of Contents Introduction 1 What Can SQL Do for Me? 2 Who Should Use This Book? 2 How To Use This Book 3 What You Should Know 3 Conventions
More informationDatabase Design for the Uninitiated CDS Brownbag Series CDS
Database Design for the Uninitiated Paul Litwin FHCRC Collaborative Data Services 1 CDS Brownbag Series This is the ninth in a series of seminars Materials for the series can be downloaded from www.deeptraining.com/fhcrc
More informationSQL SELECT Query: Intermediate
SQL SELECT Query: Intermediate IT 4153 Advanced Database J.G. Zheng Spring 2012 Overview SQL Select Expression Alias revisit Aggregate functions - complete Table join - complete Sub-query in where Limiting
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationPhysical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.
Physical Design Physical Database Design (Defined): Process of producing a description of the implementation of the database on secondary storage; it describes the base relations, file organizations, and
More informationIn This Lecture. SQL Data Definition SQL SQL. Notes. Non-Procedural Programming. Database Systems Lecture 5 Natasha Alechina
This Lecture Database Systems Lecture 5 Natasha Alechina The language, the relational model, and E/R diagrams CREATE TABLE Columns Primary Keys Foreign Keys For more information Connolly and Begg chapter
More informationNormalization. Reduces the liklihood of anomolies
Normalization Normalization Tables are important, but properly designing them is even more important so the DBMS can do its job Normalization the process for evaluating and correcting table structures
More informationFundamentals of Database Design
Fundamentals of Database Design Zornitsa Zaharieva CERN Data Management Section - Controls Group Accelerators and Beams Department /AB-CO-DM/ 23-FEB-2005 Contents : Introduction to Databases : Main Database
More informationWESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math
Textbook Correlation WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Following Directions Unit FIRST QUARTER AND SECOND QUARTER Logic Unit
More informationSQL Server. 2012 for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach
TRAINING & REFERENCE murach's SQL Server 2012 for developers Bryan Syverson Joel Murach Mike Murach & Associates, Inc. 4340 N. Knoll Ave. Fresno, CA 93722 www.murach.com murachbooks@murach.com Expanded
More informationReflections on Agile DW by a Business Analytics Practitioner. Werner Engelen Principal Business Analytics Architect
Reflections on Agile DW by a Business Analytics Practitioner Werner Engelen Principal Business Analytics Architect Introduction Werner Engelen Active in BI & DW since 1998 + 6 years at element61 Previously:
More informationCreation of Datasets for Data Mining Analysis by Using Horizontal Aggregation in SQL
International Journal of Computer Applications in Engineering Sciences [VOL III, ISSUE I, MARCH 2013] [ISSN: 2231-4946] Creation of Datasets for Data Mining Analysis by Using Horizontal Aggregation in
More informationSQL SERVER TRAINING CURRICULUM
SQL SERVER TRAINING CURRICULUM Complete SQL Server 2000/2005 for Developers Management and Administration Overview Creating databases and transaction logs Managing the file system Server and database configuration
More informationChapter 7: Data Mining
Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain
More informationIT2304: Database Systems 1 (DBS 1)
: Database Systems 1 (DBS 1) (Compulsory) 1. OUTLINE OF SYLLABUS Topic Minimum number of hours Introduction to DBMS 07 Relational Data Model 03 Data manipulation using Relational Algebra 06 Data manipulation
More informationHandling Missing Values in the SQL Procedure
Handling Missing Values in the SQL Procedure Danbo Yi, Abt Associates Inc., Cambridge, MA Lei Zhang, Domain Solutions Corp., Cambridge, MA ABSTRACT PROC SQL as a powerful database management tool provides
More informationDataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets
More informationSQL SERVER SELF-SERVICE BI WITH MICROSOFT EXCEL
SQL SERVER SELF-SERVICE BI WITH MICROSOFT EXCEL JULY 2, 2015 SLIDE 1 Data Sources OVERVIEW OF AN ENTERPRISE BI SOLUTION Reporting and Analysis Data Cleansi ng Data Models JULY 2, 2015 SLIDE 2 Master Data
More informationTips and techniques to improve DB2 Web Query for i performance and productivity
Tips and techniques to improve DB2 Web Query for i performance and productivity Jackie Jansen Information Builders jackie_jansen@ibi.com 2012 Wellesley Information Services. All rights reserved. Agenda
More informationTIM 50 - Business Information Systems
TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz March 1, 2015 The Database Approach to Data Management Database: Collection of related files containing records on people, places, or things.
More informationMOC 20461C: Querying Microsoft SQL Server. Course Overview
MOC 20461C: Querying Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to query Microsoft SQL Server. Students will learn about T-SQL querying, SQL Server
More informationBO Universe Design Best Practices
White Paper BO Universe Design Best Practices Abstract According to the book of Genesis, the universe was created in seven days. The following day-by-day analogy provides a best practices guide for the
More informationBIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig
BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig Contents Acknowledgements... 1 Introduction to Hive and Pig... 2 Setup... 2 Exercise 1 Load Avro data into HDFS... 2 Exercise 2 Define an
More informationBeginning C# 5.0. Databases. Vidya Vrat Agarwal. Second Edition
Beginning C# 5.0 Databases Second Edition Vidya Vrat Agarwal Contents J About the Author About the Technical Reviewer Acknowledgments Introduction xviii xix xx xxi Part I: Understanding Tools and Fundamentals
More informationOracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,
More informationIT2305 Database Systems I (Compulsory)
Database Systems I (Compulsory) INTRODUCTION This is one of the 4 modules designed for Semester 2 of Bachelor of Information Technology Degree program. CREDITS: 04 LEARNING OUTCOMES On completion of this
More information3. Relational Model and Relational Algebra
ECS-165A WQ 11 36 3. Relational Model and Relational Algebra Contents Fundamental Concepts of the Relational Model Integrity Constraints Translation ER schema Relational Database Schema Relational Algebra
More informationLecture #11 Relational Database Systems KTH ROYAL INSTITUTE OF TECHNOLOGY
Lecture #11 Relational Database Systems KTH ROYAL INSTITUTE OF TECHNOLOGY Contents Storing data Relational Database Systems Entity Relationship diagrams Normalisation of ER diagrams Tuple Relational Calculus
More informationBasics of Dimensional Modeling
Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional
More informationThe software shall provide the necessary tools to allow a user to create a Dashboard based on the queries created.
IWS BI Dashboard Template User Guide Introduction This document describes the features of the Dashboard Template application, and contains a manual the user can follow to use the application, connecting
More information1. Dimensional Data Design - Data Mart Life Cycle
1. Dimensional Data Design - Data Mart Life Cycle 1.1. Introduction A data mart is a persistent physical store of operational and aggregated data statistically processed data that supports businesspeople
More informationRelational Database: Additional Operations on Relations; SQL
Relational Database: Additional Operations on Relations; SQL Greg Plaxton Theory in Programming Practice, Fall 2005 Department of Computer Science University of Texas at Austin Overview The course packet
More informationDesign and Implementation
Pro SQL Server 2012 Relational Database Design and Implementation Louis Davidson with Jessica M. Moss Apress- Contents Foreword About the Author About the Technical Reviewer Acknowledgments Introduction
More informationWave Analytics Data Integration
Wave Analytics Data Integration Salesforce, Spring 16 @salesforcedocs Last updated: April 28, 2016 Copyright 2000 2016 salesforce.com, inc. All rights reserved. Salesforce is a registered trademark of
More informationVendor: Crystal Decisions Product: Crystal Reports and Crystal Enterprise
1 Ability to access the database platforms desired (text, spreadsheet, Oracle, Sybase and other databases, OLAP engines.) Y Y 2 Ability to access relational data base Y Y 3 Ability to access dimensional
More informationRelational Database Basics Review
Relational Database Basics Review IT 4153 Advanced Database J.G. Zheng Spring 2012 Overview Database approach Database system Relational model Database development 2 File Processing Approaches Based on
More informationDBMS / Business Intelligence, SQL Server
DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.
More informationIBM WebSphere DataStage Online training from Yes-M Systems
Yes-M Systems offers the unique opportunity to aspiring fresher s and experienced professionals to get real time experience in ETL Data warehouse tool IBM DataStage. Course Description With this training
More informationData Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot
www.etidaho.com (208) 327-0768 Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot 3 Days About this Course This course is designed for the end users and analysts that
More informationLearnFromGuru Polish your knowledge
SQL SERVER 2008 R2 /2012 (TSQL/SSIS/ SSRS/ SSAS BI Developer TRAINING) Module: I T-SQL Programming and Database Design An Overview of SQL Server 2008 R2 / 2012 Available Features and Tools New Capabilities
More informationCHAPTER 6 DATABASE MANAGEMENT SYSTEMS. Learning Objectives
CHAPTER 6 DATABASE MANAGEMENT SYSTEMS Management Information Systems, 10 th edition, By Raymond McLeod, Jr. and George P. Schell 2007, Prentice Hall, Inc. 1 Learning Objectives Understand the hierarchy
More informationETL PROCESS IN DATA WAREHOUSE
ETL PROCESS IN DATA WAREHOUSE OUTLINE ETL : Extraction, Transformation, Loading Capture/Extract Scrub or data cleansing Transform Load and Index ETL OVERVIEW Extraction Transformation Loading ETL ETL is
More informationData Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution
Warehouse and Business Intelligence : Challenges, Best Practices & the Solution Prepared by datagaps http://www.datagaps.com http://www.youtube.com/datagaps http://www.twitter.com/datagaps Contact contact@datagaps.com
More informationBuilding In-Database Predictive Scoring Model: Check Fraud Detection Case Study
Building In-Database Predictive Scoring Model: Check Fraud Detection Case Study Jay Zhou, Ph.D. Business Data Miners, LLC 978-726-3182 jzhou@businessdataminers.com Web Site: www.businessdataminers.com
More informationGraphical Web based Tool for Generating Query from Star Schema
Graphical Web based Tool for Generating Query from Star Schema Mohammed Anbar a, Ku Ruhana Ku-Mahamud b a College of Arts and Sciences Universiti Utara Malaysia, 0600 Sintok, Kedah, Malaysia Tel: 604-2449604
More informationThe Analyst's Perspective: Advanced BI with PowerPivot DAX, SharePoint Dashboards, and SQL Data Mining
The Analyst's Perspective: Advanced BI with PowerPivot DAX, SharePoint Dashboards, and SQL Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.com 1 1 Objectives
More informationOracle SQL. Course Summary. Duration. Objectives
Oracle SQL Course Summary Identify the major structural components of the Oracle Database 11g Create reports of aggregated data Write SELECT statements that include queries Retrieve row and column data
More informationPreparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL
Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Jasna S MTech Student TKM College of engineering Kollam Manu J Pillai Assistant Professor
More informationRelational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute
More informationDistributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu
Distributed Aggregation in Cloud Databases By: Aparna Tiwari tiwaria@umail.iu.edu ABSTRACT Data intensive applications rely heavily on aggregation functions for extraction of data according to user requirements.
More informationBusiness Intelligence Tutorial: Introduction to the Data Warehouse Center
IBM DB2 Universal Database Business Intelligence Tutorial: Introduction to the Data Warehouse Center Version 8 IBM DB2 Universal Database Business Intelligence Tutorial: Introduction to the Data Warehouse
More informationSQL Server 2005. Introduction to SQL Server 2005. SQL Server 2005 basic tools. SQL Server Configuration Manager. SQL Server services management
Database and data mining group, SQL Server 2005 Introduction to SQL Server 2005 Introduction to SQL Server 2005-1 Database and data mining group, SQL Server 2005 basic tools SQL Server Configuration Manager
More informationDistributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
More informationOptimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,
More informationOracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html
Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the
More informationSQL Server An Overview
SQL Server An Overview SQL Server Microsoft SQL Server is designed to work effectively in a number of environments: As a two-tier or multi-tier client/server database system As a desktop database system
More informationPhysical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design
Physical Database Design Process Physical Database Design Process The last stage of the database design process. A process of mapping the logical database structure developed in previous stages into internal
More informationOracle Database Development Standards For DNR Staff and Contractors. Table of Contents
Oracle Database Development Standards For DNR Staff and Contractors Table of Contents INTRODUCTION...2 DATABASE ORGANIZATION...2 DATABASE PROCEDURES...3 Development...3 Testing...3 Production Release...4
More informationIT-Pruefungen.de. Hochwertige Qualität, neueste Prüfungsunterlagen. http://www.it-pruefungen.de
IT-Pruefungen.de Hochwertige Qualität, neueste Prüfungsunterlagen http://www.it-pruefungen.de Exam : 70-452 Title : PRO:MS SQL Server@ 2008, Designing a Business Intelligence Version : Demo 1. You design
More informationAd Hoc Advanced Table of Contents
Ad Hoc Advanced Table of Contents Functions... 1 Adding a Function to the Adhoc Query:... 1 Constant... 2 Coalesce... 4 Concatenate... 6 Add/Subtract... 7 Logical Expressions... 8 Creating a Logical Expression:...
More informationThe process of database development. Logical model: relational DBMS. Relation
The process of database development Reality (Universe of Discourse) Relational Databases and SQL Basic Concepts The 3rd normal form Structured Query Language (SQL) Conceptual model (e.g. Entity-Relationship
More informationSQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
More informationChapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification
Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries
More information6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization
6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization Animals(name,age,species,cageno,keptby,feedtime) Keeper(id,name)
More informationThe Benefits of Data Modeling in Data Warehousing
WHITE PAPER: THE BENEFITS OF DATA MODELING IN DATA WAREHOUSING The Benefits of Data Modeling in Data Warehousing NOVEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2 SECTION 2
More informationVendor: Brio Software Product: Brio Performance Suite
1 Ability to access the database platforms desired (text, spreadsheet, Oracle, Sybase and other databases, OLAP engines.) yes yes Brio is recognized for it Universal database access. Any source that is
More informationDatabase Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB
Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB Outline Database concepts Conceptual Design Logical Design Communicating with the RDBMS 2 Some concepts Database: an
More information1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle
1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We
More informationDeveloping Web Applications for Microsoft SQL Server Databases - What you need to know
Developing Web Applications for Microsoft SQL Server Databases - What you need to know ATEC2008 Conference Session Description Alpha Five s web components simplify working with SQL databases, but what
More informationSisense. Product Highlights. www.sisense.com
Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze
More informationOutlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle
Outlines Business Intelligence Lecture 15 Why integrate BI into your smart client application? Integrating Mining into your application Integrating into your application What Is Business Intelligence?
More informationLost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole
Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many
More information7 Steps to Successful Data Blending for Excel
COOKBOOK SERIES 7 Steps to Successful Data Blending for Excel What is Data Blending? The evolution of self-service analytics is upon us. What started out as a means to an end for a data analyst who dealt
More informationOracle OLAP. Describing Data Validation Plug-in for Analytic Workspace Manager. Product Support
Oracle OLAP Data Validation Plug-in for Analytic Workspace Manager User s Guide E18663-01 January 2011 Data Validation Plug-in for Analytic Workspace Manager provides tests to quickly find conditions in
More informationQlikView 11.2 SR5 DIRECT DISCOVERY
QlikView 11.2 SR5 DIRECT DISCOVERY FAQ and What s New Published: November, 2012 Version: 5.0 Last Updated: December, 2013 www.qlikview.com 1 What s New in Direct Discovery 11.2 SR5? Direct discovery in
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationEngineering Problem Solving and Excel. EGN 1006 Introduction to Engineering
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
More informationData Hierarchy. Traditional File based Approach. Hierarchy of Data for a Computer-Based File
Management Information Systems Data and Knowledge Management Dr. Shankar Sundaresan (Adapted from Introduction to IS, Rainer and Turban) LEARNING OBJECTIVES Recognize the importance of data, issues involved
More informationTechnology WHITE PAPER
Technology WHITE PAPER What We Do Neota Logic builds software with which the knowledge of experts can be delivered in an operationally useful form as applications embedded in business systems or consulted
More informationSQL Server 2008 Core Skills. Gary Young 2011
SQL Server 2008 Core Skills Gary Young 2011 Confucius I hear and I forget I see and I remember I do and I understand Core Skills Syllabus Theory of relational databases SQL Server tools Getting help Data
More informationPowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.
PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4
More informationWhite Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.
White Paper Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. Contents Data Management: Why It s So Essential... 1 The Basics of Data Preparation... 1 1: Simplify Access
More informationBRIO QUERY FUNCTIONALITY IN COMPARISION TO CRYSTAL REPORTS
BRIO QUERY FUNCTIONALITY IN COMPARISION TO CRYSTAL REPORTS Category Downstream Analysis Nested Queries Brio Functionality Ability to create data sets Ability to create tables and upload tables Available
More informationMicrosoft Access 3: Understanding and Creating Queries
Microsoft Access 3: Understanding and Creating Queries In Access Level 2, we learned how to perform basic data retrievals by using Search & Replace functions and Sort & Filter functions. For more complex
More informationUsing SQL Server Management Studio
Using SQL Server Management Studio Microsoft SQL Server Management Studio 2005 is a graphical tool for database designer or programmer. With SQL Server Management Studio 2005 you can: Create databases
More informationDistance Learning and Examining Systems
Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed
More informationSAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide
SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24 Data Federation Administration Tool Guide Content 1 What's new in the.... 5 2 Introduction to administration
More informationFiles. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?
Files What s it all about? Information being stored about anything important to the business/individual keeping the files. The simple concepts used in the operation of manual files are often a good guide
More informationFraming Business Problems as Data Mining Problems
Framing Business Problems as Data Mining Problems Asoka Diggs Data Scientist, Intel IT January 21, 2016 Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS
More informationICAB4136B Use structured query language to create database structures and manipulate data
ICAB4136B Use structured query language to create database structures and manipulate data Release: 1 ICAB4136B Use structured query language to create database structures and manipulate data Modification
More informationOracle Database: SQL and PL/SQL Fundamentals
Oracle University Contact Us: 1.800.529.0165 Oracle Database: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn This course is designed to deliver the fundamentals of SQL and PL/SQL along
More informationProgramming with SQL
Unit 43: Programming with SQL Learning Outcomes A candidate following a programme of learning leading to this unit will be able to: Create queries to retrieve information from relational databases using
More informationCOSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015
COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt
More informationDatabase Design and Database Programming with SQL - 5 Day In Class Event Day 1 Activity Start Time Length
Database Design and Database Programming with SQL - 5 Day In Class Event Day 1 Welcome & Introductions 9:00 AM 20 Lecture 9:20 AM 40 Practice 10:00 AM 20 Lecture 10:20 AM 40 Practice 11:15 AM 30 Lecture
More informationExperience, Not Metrics
Part 7: Consolidating Test Results User Experience, Not Metrics by: R. Scott Barber You ve been running this test for weeks and sending me charts almost every day, but what does it all mean?!? If your
More informationModel Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/
Model Deployment Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Model Deployment Creation of the model is generally not the end of the project.
More informationDatabase Design Standards. U.S. Small Business Administration Office of the Chief Information Officer Office of Information Systems Support
Database Design Standards U.S. Small Business Administration Office of the Chief Information Officer Office of Information Systems Support TABLE OF CONTENTS CHAPTER PAGE NO 1. Standards and Conventions
More informationEuropean Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project Janet Delve, University of Portsmouth Kuldar Aas, National Archives of Estonia Rainer Schmidt, Austrian Institute
More informationTypes of Software Testing (E-ams) For MBA Programs and CAD12
GRADE 12 Grade 12: Term 1-10 weeks/40 hours Data and Information Management: Database design and concepts (±1 week/4 hours) - Relational database overview o Normalisation (overview and purpose) to reduce
More information