Business Intelligence: Recent Experiences in Canada



Similar documents
Data Quality in Information Integration and Business Intelligence

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

Web-Based Genomic Information Integration with Gene Ontology

Report on the Dagstuhl Seminar Data Quality on the Web

MDM and Data Warehousing Complement Each Other

Analance Data Integration Technical Whitepaper

JOURNAL OF OBJECT TECHNOLOGY

The Benefits of Data Modeling in Data Warehousing

Preferred Strategies: Business Intelligence for JD Edwards

Analance Data Integration Technical Whitepaper

Security Issues for the Semantic Web

Management Decision Making. Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011

Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM)

A Knowledge Management Framework Using Business Intelligence Solutions

MULTI AGENT-BASED DISTRIBUTED DATA MINING

Database Marketing, Business Intelligence and Knowledge Discovery

Data Warehouses in the Path from Databases to Archives

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

ECS 165A: Introduction to Database Systems

Semantic Business Process Management Lectuer 1 - Introduction

SOLUTION BRIEF CA ERWIN MODELING. How Can I Manage Data Complexity and Improve Business Agility?

Automatic Timeline Construction For Computer Forensics Purposes

Chapter 1: Introduction

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

SOLUTION BRIEF CA ERwin Modeling. How can I understand, manage and govern complex data assets and improve business agility?

CSE 132A. Database Systems Principles

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques

Master of Science in Health Information Technology Degree Curriculum

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Context Capture in Software Development

A Comprehensive Approach to Master Data Management Testing

UltraQuest Cloud Server. White Paper Version 1.0

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

On the general structure of ontologies of instructional models

Data Warehouse: Introduction

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

Make the right decisions with Distribution Intelligence

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Business-driven governance: Managing policies for data retention

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Service Oriented Data Management

Introduction to Datawarehousing

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

IBM Cognos TM1 Enterprise Planning, Budgeting and Analytics

Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide

Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring

White Paper February How IBM Cognos 8 Business Intelligence meets the needs of financial and business analysts

Foundations of Business Intelligence: Databases and Information Management

a Data Science Univ. Piraeus [GR]

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Semantic Variability Modeling for Multi-staged Service Composition

Effecting Data Quality Improvement through Data Virtualization

A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Data Integration and ETL Process

QAD Business Intelligence Data Warehouse Demonstration Guide. May 2015 BI 3.11

Leveraging unstructured data for improved decision making: A retail banking perspective

Business Intelligence and Service Oriented Architectures. An Oracle White Paper May 2007

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

BUSINESS INTELLIGENCE AS SUPPORT TO KNOWLEDGE MANAGEMENT

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Warehouse Overview. Srini Rengarajan

Cúram Business Intelligence Reporting Developer Guide

Industry Models and Information Server

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

Data Validation with OWL Integrity Constraints

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Cúram Business Intelligence and Analytics Guide

Enterprise Data Quality

A View Integration Approach to Dynamic Composition of Web Services

SENG 520, Experience with a high-level programming language. (304) , Jeff.Edgell@comcast.net

70% Fuel for HR Careers

Data warehouse and Business Intelligence Collateral

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

5 Best Practices for SAP Master Data Governance

POLAR IT SERVICES. Business Intelligence Project Methodology

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

Data Governance Best Practice

Advanced Analytics. The Way Forward for Businesses. Dr. Sujatha R Upadhyaya

Business Process Models as Design Artefacts in ERP Development

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

HOW TO DO A SMART DATA PROJECT

Exploring the Synergistic Relationships Between BPC, BW and HANA

Framework for Data warehouse architectural components

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Transcription:

Business Intelligence: Recent Experiences in Canada Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies

2 Business Intelligence and Data Management Business Intelligence (BI) requires modeling of different and complex operational aspects of business activities of an enterprise This requires understanding those business activities and actors, and learning from experience and data Business activities have become more complex: distributed tasks, interaction of different computational and human processes, increasing amount and complexity of data,... It becomes essential to be able to automate tasks that are related with data management (DM) Many of them for automating and/or supporting decision making in organizations

3 Almost every area and problem of BI touches DM problems The community of data management at large has been doing research in- and developing techniques and tools for BI There are many open research problems in DM that are motivated, or made more relevant and urgent by BI needs The database (data management) research community has been traditionally strong in Canada There have been successful BI companies in Canada, even before the term was coined, e.g. Cognos, Business Objects, IBM (first IBM CAS created in Toronto),... The Canadian DM research community decided to put together a large BI research project

The BIN Initiative The genesis of the The NSERC Strategic Network on Data Management for Business Intelligence (BIN) In 2007 around 20 academic researchers across Canada started putting together a large research project in BI Potential industrial partners were approached and workshops with them were held, most prominently IBM, Cognos (later bought by IBM), Business Objects (later bought by SAP), etc. 15 principal investigators (PIs) were designated for a proposal submission Universities: Toronto, Ottawa, Carleton (Ottawa), British Columbia (Vancouver), Dalhousie (Halifax), Alberta (Edmonton), Univ. of Waterloo 4

Proposal submitted to NSERC (Canada s main research funding agency) for an NSERC Strategic Network 5

Eventually, among many participants, two networks were selected, one of them our BIN Network Started in 2009, for 5 years Approx. $1 million per year from NSERC $0.5 million per year from industrial partners, main contributors: IBM, SAP A Board of Directors was nominated, with external members from academia (USA, Germany,...) and companies, in particular industrial partners An executive committee was created: network director, four theme leaders, and representatives from industrial partners (also acting as interface to the academic partners) 6

Every year, PIs submit reports and proposals, including request for funding Funds come from the NSERC portion Industrial partners top-up if project is particularly relevant to industry There is room for both basic and applied research Funds are used mostly to pay graduate students, and postdocs; and exchanges among network members 7

8 The Network s Themes Network director: Renee Miller (Univ. of Toronto) 1. Strategy and Policy Management Theme leader: John Mylopoulos (Univ. of Toronto) 2. Capitalizing on Document Assets Theme leader: Frank Tompa (Univ. Waterloo) 3. Adaptive Data Cleaning Theme leader: L. B. (Carleton Univ.) 4. Business-Driven Data Integration Theme leader: Iluju Kiringa (Univ. of Ottawa) Other PIs, researchers, graduate students gather around a theme or several or them

9 1. Strategy and Policy Management: Modeling business processes Modeling and representing business policies (business rules) How to specify and integrate BPs and wokflows with data management How to automate the process from the specification How to run and monitor it Reasoning about the BP Does it guarantee that the desired business goals will be achieved? Or undesirable behavior always avoided?

10 New expense claim filed Awaiting Approval Reviewed by supervisor Paid Payment completed Reviewed by Finance Dept. Awaiting Payment A business process (BP) Different ways of modeling BPs, involving states, agents, policies, workflows, data management Newer approach: Data-Centric Business Processes, inparticular IBM s Business Artifacts...

11 Business Model policy Database Model constraints Stages of business objects in a BP are represented through database states, and their changes as database updates Creation and elimination of business objects represented as database operations The BP can be monitored and run via the underlying data management process, through database states, queries, integrity constraints, triggers,... The data layer becomes a footprint of the business layer An idea to have into account when modeling both layers and mapping them...

12 2. Capitalizing on Document Assets: Learning from data, e.g. machine learning, data mining Increasing amount of semi- or non-structured data Information retrieval Non-classical data mining: web data, and web documents in general text files blogs emails social networks (learning about/from on-line communities) sentiment analysis recommender systems...

13 4. Business-Driven Data Integration: Integration of data from multiple and heterogeneous data sources, in different ways: Materialized integration, e.g. data warehouses

Virtual integration, e.g. mediator-based data integration systems 14 2 3 2 3 2 3 data sources mediator 1 4 Integration of data from WWW, and more complex and diverse data spaces Enterprise integration systems: local database systems (operational, DWHs), work-flows, external data sources, application programs, ERP systems, WWW, etc.

15 Data exchange target ICs D schema mappings??? a database source schema S and an instance a database target schema T and NO instance How to materialize an instance for the target schema What are the semantically correct data at the target? Querying data sources through ontologies Data stay underneath, and may not represent the way the user sees the business Ontology is metadata or an explicit, formal ER model, etc.

16 user queries/answers ontology captures business view/model of data data mappings low-level data Integration of data management with higher-level reasoning systems: intelligent information systems, knowledge bases, ontologies, semantic web, etc. How the business model drives the processes of data integration in any of the above forms?

17 3. Adaptive Data Cleaning: Making the right business inferences and decisions requires quality data Not always clear how to assess the quality of data or clean data Too many dimensions of data quality: accuracy, completeness, redundancy, freshness, consistency, certainty, redundancy,sense, etc. Actually, not even clear how to characterize clean data within a data repository Consistency of Databases:

18 A database instance D is a model of an outside reality An integrity constraint on D is a condition that D is expected to satisfy in order to capture the semantics of the application domain AsetIC of integrity constraints (ICs) helps specify/maintain the correspondence between D and that reality What If the database is inconsistent?

Bringing it back to a consistent state may be difficult, impossible, nondeterministic, undesirable, unmaintainable, etc. We may have to live with inconsistent data... The database (the model) is departing from the outside reality that is being modeled However, the information is not all semantically incorrect Most likely most of the data in the database is still consistent Idea: (a) Keep the database as it is (b) Obtain semantically meaningful information at query time; dealing with inconsistencies on-the-fly Particularly appealing in virtual data integration... (no direct access to the data sources) 19

This requires: (a) logically characterizing consistent data within an inconsistent database (b) Developing algorithms for retrieving the consistent data 20 Why adaptive data cleaning? We want to go beyond verticals Most commonly data cleaning solutions are specific, domain dependent, and vertical Not extendible or adaptable Start from scratch for every problem and application

More generic approaches and techniques? 21 Declarative data cleaning Specification of data quality assessment criteria Specification of data quality properties/requirements What properties should be satisfied for the data to be clean? (not necessarily explicitly how to clean) Specification of data cleaning activities? Possibly derived from data quality requirements Generic and parameterizable solutions New ways of understanding and modeling data quality Quantify and assess data and metadata quality

22 Many problems, more specifically: What can be extracted, abstracted out from different domains, problems and techniques in DC? Identification of relevant parameters and dimensions of data quality assessment and cleaning Specification of quality data Identification and characterization of quality constraints Specification of quality constraints (particular case: ICs) More recently: entity-resolution, duplicate detection/resolution Specification languages? Definition of quality predicates What kind of primitives?

Obtaining quality data: data cleaning vs. obtaining clean data at query or application time Characterization and retrieval of clean data in/from a dirty data source Data quality in data integration Materialized, virtual,... Data quality assessment and data cleaning are context dependent Identification, characterization, specification and applications of contexts for data cleaning Specification and implementation of data cleaning activities Characterization of semantically clean data in data spaces with data accessible through search queries 23

Conclusions Data management touches almost every aspect of BI 24 General and flexible solutions to old problems become crucial New solutions to old problems becomes necessary New problems have appeared that need solutions In DC many stimulating scientific and technical challenges Many fundamental problems have not been (properly) addressed There is a lack of basic research There is much room for both basic and applied research Room for exciting interdisciplinary research: business and data management