Beyond GIS: Spatial On-Line Analytical Processing and Big Data. University of Maine Orono, USA

Similar documents
Beyond GIS: Spatial On-Line Analytical Processing and Big Data

DATA WAREHOUSING - OLAP

Open source geospatial Business Intelligence (BI) in action!

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Data W a Ware r house house and and OLAP II Week 6 1

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Data Warehouse: Introduction

Building Geospatial Business Intelligence Solutions with Free and Open Source Components

Business Intelligence, Analytics & Reporting: Glossary of Terms

CHAPTER 4 Data Warehouse Architecture

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

Part 22. Data Warehousing

Enabling geospatial Business Intelligence

Multidimensional Management of Geospatial Data Quality Information for its Dynamic Use Within GIS

Data Warehouse design

BUILDING OLAP TOOLS OVER LARGE DATABASES

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

Introducing CXAIR. E development and performance

Microsoft Implementing Data Models and Reports with Microsoft SQL Server

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Monitoring Genebanks using Datamarts based in an Open Source Tool

An Architectural Review Of Integrating MicroStrategy With SAP BW

Management Consulting Systems Integration Managed Services WHITE PAPER DATA DISCOVERY VS ENTERPRISE BUSINESS INTELLIGENCE

GeoKettle: A powerful open source spatial ETL tool

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

University of Gaziantep, Department of Business Administration

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

IST722 Data Warehousing

Business Intelligence & Product Analytics

SQL Server 2012 Business Intelligence Boot Camp

Cincom Business Intelligence Solutions

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Self-Service Business Intelligence

Implementing Data Models and Reports with Microsoft SQL Server

DATA WAREHOUSING AND OLAP TECHNOLOGY

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

Data Warehousing Systems: Foundations and Architectures

Self-Service Business Intelligence: The hunt for real insights in hidden knowledge Whitepaper

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Adobe Insight, powered by Omniture

Safe Harbor Statement

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

The difference between. BI and CPM. A white paper prepared by Prophix Software

CS2032 Data warehousing and Data Mining Unit II Page 1

Implementing Data Models and Reports with Microsoft SQL Server

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

By Makesh Kannaiyan 8/27/2011 1

14. Data Warehousing & Data Mining

RESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE

Developing Business Intelligence and Data Visualization Applications with Web Maps

CHAPTER 4: BUSINESS ANALYTICS

Hybrid Support Systems: a Business Intelligence Approach

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

Big Data Visualization and Dashboards

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

Dimensional Modeling for Data Warehouse

IBM Cognos Performance Management Solutions for Oracle

Microsoft Business Intelligence

A Design and implementation of a data warehouse for research administration universities

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Exploring the Synergistic Relationships Between BPC, BW and HANA

When to consider OLAP?

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

IBM Cognos Express Essential BI and planning for midsize companies

W H I T E P A P E R B u s i n e s s I n t e l l i g e n c e S o lutions from the Microsoft and Teradata Partnership

Designing a Dimensional Model

Oracle Business Intelligence 11g Business Dashboard Management

Sterling Business Intelligence

Whitepaper. Innovations in Business Intelligence Database Technology.

Open Source Business Intelligence Intro

QlikView Business Discovery Platform. Algol Consulting Srl

Endeca Introduction to Big Data Analytics

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Integrating GIS within the Enterprise Options, Considerations and Experiences

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Business Intelligence and Healthcare

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

Week 3 lecture slides

Cognos 8 Best Practices

ProClarity Analytics Family

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

B.Sc (Computer Science) Database Management Systems UNIT-V

LEARNING SOLUTIONS website milner.com/learning phone

GEOGRAPHIC INFORMATION SYSTEMS

A Service-oriented Architecture for Business Intelligence

2. Metadata Modeling Best Practices with Cognos Framework Manager

Transcription:

Beyond GIS: Spatial On-Line Analytical Processing and Big Data Yvan Bedard, PhD, P.Eng. Independant consultant Strategic and Scientific Advisor, Intelli3 Researcher: Centre for Research in Geomatics, Laval University, Quebec University of Maine Orono, USA 1

Presentation Origin of SOLAP Nature Evolution Examples of applications State-of-the-art for today s technology Challenges that remain SOLAP and Big Data 2

ORIGINS OF SOLAP 3

Origins Organisations worldwide invest hundreds of millions of dollars annually to acquire large amounts of data about the land, its resources and uses These data however prove difficult to use by managers who need: aggregated information spatial comparisons correlations fast synthesis over time queries interactive exploration geogr. knowledge discovery etc. - trends analysis - space-time - unexpected - crosstab analysis - hypothesis dev.

Barriers to make analysis with transactional systems GIS and DBMS design are transactional by nature Oriented towards data acquisition, storing, updating, integrity checking, simple querying Transactional databases are usually normalized so duplication of data is kept to a minimum : To preserve data integrity and simplify data update A strong normalization makes the analysis of data more complex : High number of tables, therefore high number of joins between tables (less efficient). Long processing time Development of complex queries 5

Analytical approach vs transactional approach No unique data structure is good for BOTH managing transactions and supporting complex queries. Therefore, two categories of databases must co-exist: transactional and analytical (E.F. Codd). Example of co-existence: one source -> several datacubes Source Legacy transactional database Read-only ETL Analytical Data cubes Restructured & aggregated data 6

BI Market Business Intelligence exists since the early 1990s and its market is larger that the GIS market. However, it didn t address spatial data until recently. BI and GIS evolved in different silos for many years. 7 Eckerson, 2007

Today s Level of Integration Integrating GIS and BI is a recent field with a lot of potential Spatially-enabling BI is becoming more common Larger smiley = more has been achieved Larger lightning = more difficult challenge 8

SOLAP Epochs 1996-2000: pionneering early prototypes in universities Laval U. - Simon Fraser U. - U. Minnesota 2001-2004: early adopters advanced prototypes in universities first applications in industry 2005-... : maturing larger number of ad hoc applications First commercial SOLAP technologies 2010- : wide adoption About 40 commercial products

NATURE OF SOLAP 10

A Natural Evolution Nature of geospatial data Spatial Non-spatial GIS SOLAP DBMS OLAP Details Synthesis Decisional Nature of data Add capabilities to existing systems, don't aim at replacing them Add value to existing data, no attempt to manage these data

Analytical System Architectures (ex. standard data warehouse) OLAP Dashboards Reporting - - Legacy OLTP systems DW Datamarts 12

Analytical System Architectures (ex. direct, without data warehouse) Legacy transactional databases Datacubes Most projects we do have such an architecture: simpler, faster, less costly Requires highly open SOLAP technology to connect to a variety of legacy systems (DBMS, BI, GIS, CAD, Big Data engines, etc.) 13

Datacube Concepts Dimension = axis of analysis organized hierarchically Ex: a Time dimension Members 2008 All years 2009 2010 Years Levels Months Days Hypercube = N dimensions Datacube = casual name = hypercube 1/4

Datacube Concepts Structure of datacubes Dimension 1 (ex. balanced) Dimension 2 (ex. simpler) Dimension N (ex. N:N paths) Measures (ex. sales) Dimension 3 Dimension 4 (ex. inconsistent) (ex. unbalanced) Members = filters (similar to independant variables) Measures = result (similar to dependant variables) 2/4

Datacube Concepts Fine-grained analysis Dimension 1 (ex. balanced) Dimension 2 (ex. simpler) Dimension N (ex. N:N paths) Measures (ex. sales) Dimension 3 Dimension 4 (ex. inconsistent) (ex. unbalanced) Uses detailed members of dimensions hierarchies 2/4

Datacube Concepts Global analysis Dimension 1 (ex. balanced) Dimension 2 (ex. simpler) Dimension N (ex. N:N paths) Measures (ex. sales) Dimension 3 Dimension 4 (ex. inconsistent) (ex. unbalanced) Uses highly-aggregated members As fast as fine-grained analysis (always <10 sec.) Requires only a few mouse clics (no query language) 2/4

Datacube Concepts Cube (hypercube) = all facts Provinces Cities Montrea l Ottawa Toronto Levis Quebec Ontario A "sales territory" dimension A "sales" data cube Fact: each unique combination of fine-grained or aggregated members and of their resulting measures 5M 2M 1M 2M 2M 3M 2M 1M 3M 2M 2M 2M 3M 1M 2M 2M 1M 2008 2009 A "time" 2010 dimension blouses shirts tops jeans trousers pants A "product" dimension Ex.: sold for 2M$ of shirts in Ottawa in 2010 Ex. : sold for 8M$ of pants in Ontario in 2010 Ex. : sold for 5M$ of jeans in Montreal in 2008 Item level Category level 3/4

Datacube Concepts Data structures (MOLAP, ROLAP, HOLAP): Multidimensional (proprietary) Relational implementation of datacubes Client and server provides the multidimensional view l Star schemas, snowflake schemas, constellation schemas Hybrid solutions Query languages: SQL = standard for transactional database Used in ROLAP MDX = standard for datacubes Used in MOLAP 19

Spatial Datacube Concepts Spatial dimensions Geometric spatial dimension Non-geometric spatial dimension Canada Canada CB Québec Montréal Mixed spatial dimension NB Québec N.B. more concepts exist

Spatial Datacube Concepts Spatial measures Metric operators Distance Area Perimeter Topological operators Adjacent Within Intersect Spatial dimension 1 Spatial dimension 2 N.B. more concepts exist 3/3

Spatial Datacube and SOLAP Spatial OLAP (On-Line Analytical Processing) SOLAP is the most widely used tool to harness the power of spatial datacubes It provides operators that don t exist in GIS SOLAP = generic software supporting rapid and easy navigation within spatial datacubes for the interactive exploration of spatio-temporal data having many levels of information granularity, themes, epochs and display modes which are synchronized or not: maps, tables and diagrams 22

Characteristics of SOLAP Provides a high level of interactivity response times < 10 seconds independently of the level of data aggregation today's vs historic or future data measured vs simulated data Ease-of-use and intuitiveness requires no SQL-type query language no need to know the underlying data structure Supports intuitive, interactive and synchronized exploration of spatio-temporal data for different levels of granularity in maps, tables and charts that are synchronized at will

The Power of SOLAP Lies on its Capability to Support Fast and Easy Interactive Exploration of Spatial Data Select 1 year -> Select all years -> Select 4 years -> Multimap View: 7 clicks, 5 seconds

The Power of SOLAP Lies on its Capability to Support Fast and Easy Interactive Exploration of Spatial Data Select all regions -> Drill-down on one region -> Roll-up -> Show Synchronized Views: 6 clicks, 5 seconds

The Power of SOLAP Lies on its Capability to Support Fast and Easy Interactive Exploration of Spatial Data Change data -> Roll-up -> Roll-up -> Pivot : 6 click, 5 seconds

Functionalities: Exploration-oriented ü Visualization and synchronized displays An operation on one type of display (e.g. drill, pivot or filter) must automatically replicate on all other types of display Drill Down (when enabled). All data per country Canada All data per province Canada Canada Provinces Provinces Provinces 27

Functionalities: Exploration-oriented Visualization ü and intelligent automatic mapping Intelligent automatic mapping: ü Supports user s knowledge ü Generates coherent maps by using predefined display rules in accordance to the user s selection ü Instantaneous display ü No SQL involved ü Manual processing: ü ü Involve specific knowledge by the user (database, semiology, mapping) Is time-consuming What color, symbol, pattern? Which advanced map? map thematic classification display type 16

EVOLUTION OF SOLAP and TODAY S STATE OF THE ART 29

Approaches to Develop SOLAP Applications Ad hoc, proprietary programming specific to one application Combining GIS + OLAP capabilities GIS-centric OLAP-centric Integrated SOLAP -The dominant tool offers its full capabilities but gets minimal capabilities from the other tool -GUI provided by the dominant tool Ad hoc programming (ex. using diverse open-source softwares) SOLAP technology (the most efficient)

Off-the-Shelf Integrated SOLAP Facilitates the deployment of a SOLAP application by offering built-in elements (e.g. Framework, operators, unique GUI) or P in I S A om G OL D Loosely coupled ü ü ü ü a nt n it o a ng r i e g ut g m t n b in m -l i le ir ra ta ab equ rog o T cap r p ra g te ) n l i TS a t CO o T ( Strongly coupled 2 GUI vs common and unique GUI Built-in integration framework (no need to program the solution) Offers built-in functionalities to visualize and explore data No dominant component 31 31 n it o

Commercial Offerings About 40 SOLAP-like products exist Most : Run with only one GIS or DBMS or BI software Are OLAP-centric or GIS-centric Are limited to one type of datacube (ROLAP or MOLAP) Have limited cartographic capabilities l Geometry: l number of spatial dimensions l types of spatial dimensions (ex. alternate) l Types of geometry (ex. lines, aggregated shapes) l Intelligent mapping rules for efficient geovisualization l Often ignore ISO or OGC standards The technology that came out of our lab, 32 Map4Decision, doesn t suffer from these limitations

EXAMPLES OF SOLAP APPLICATIONS 33

Experiences since 1996 Besides developing theoretical concepts, we have experimented with several technologies to build SOLAP applications and test concepts Experimentations in: forestry transport recruitment climatology - agriculture - public health search & rescue - sports archeology - infrastructures erosion - etc. Experimentations with: MapX - ArcGIS - Geomedia - SoftMap Oracle - Access - SQL-Server - MySQL Proclarity - Cognos - etc.

Example: Road Safety Analysis (Transport Quebec)

Example: Origin-Destination Analysis (Cities + Transport Quebec) 36

Example: Marine Transportation (Transport Quebec) 37

Example: Managing Infrastructures (Port of Montreal) 38

Example: Coastal erosion management (Transport Quebec) 39

Example: Coastal Erosion Management (Transport Quebec) 40

Example: Coastal Erosion Management (Transport Quebec) Ø Criteria to assess the risk of erosion and landslide 1. 2. 3. 4. 5. 6. Distance between road and bank Type of bank Height of the bank Average slope of parcels Presence of watercourses, surface water spillinganalyse de chaque parcelle pour chaque Presence and quality of protection critère infrastructures 7. Distance between the bank and 5m waterline 8. Land use and occupation Ø Divided the coastal zone into parcels Ø Each parcel has values for each criteria 41

Example of Measured Benefits in a Project for Transport Quebec M.J.Proulx, Intelli3 (2009) Annual Report : Solution géodécisionnelle : 150 maps and tables Static data 200 000 maps and tables Dynamic applications Analysis & page editing (3 months-person) Updating (1 month-person) Data structuration (15 days-person) Updating ( 5 dayx-person) Ad hoc queries continuously Delays to produce outputs Application in intranet Fast response Depend upon an expert in cartography Easy user interface 42

Benefits of SOLAP applications In our projects, positive results in many applications have been achieved, such as: cutting by a factor of 10 the time required to produce maps and reports that summarize key information allowing new users having never heard of GIS to produce hundred of thousands of synchronized maps, reports and tables on demand with only three hours of training providing keyboardless access to geospatial data at different levels of detail with a facility never achieved before

SOLAP AND BIG DATA 44

Big Data Big Data characteristics (the Vs) Volume Velocity Variability And 5 more Vs Value, Validity, Veracity, Vulnerability and Visualization These characteristics are happening at an unprecedented pace. Examples include: Mining social networks (ex. Facebook) Monitoring web surfing (ex. Google) Tracing users interactions (ex. Amazon) Exploring smartphone usages (ex. Apple apps) 45

Business Intelligence and SOLAP BI transforms large volumes of structured raw data into meaningful information for more effective decision-making BI provide historical, current, and predictive views Over the last 20 years, BI has developed a strong data analytics culture, powerful data visualization solutions and proven methods to integrate with organizations structured database ecosystems Business Analytics (BA) has been used recently to highlight analytical capabilities OLAP is widely used for Business Analytics 46

BI -> Big Data SOLAP has its roots in BI An important part of the Big Data discourse is similar to the discourse of BI However, Big Data is not BI with bigger data The main differences are in velocity, variety and the underlying technology to tackle these two characteristics Another difference is that Big Data often comes from outside, typically from the cloud. 47

BI -> Big Data While some see Big Data as the new generation of BI, others see it as a different family of products The boundary between Big Data and BI is not clear as there exist two groups of technologies: Big Data core technologies vs. Big-Data-enabled technologies History repeats itself: Spatially-enabled DBMS vs. GIS BI-enabled DBMS vs. genuine BI technology Database-enabled CAD vs. GIS 3D-enabled GIS vs. real 3D software 48 Spatially-aware Big Data vs Big Earth Data

Big GeoData Two categories: Geolocalized Big Data Location simply as one additional, accessory data Sources: mostly points (smartphones GPS position, web surfing IP address position, Amazon s clients addresses, etc.) Spatially-centered Big Data Location, shape, size, orientation, spatial relationships are core data, a «raison d être» Sources: ITS, sensor networks, high-resolution imagery (drones, satellites) raw data, interpreted imagery polygonal and line data, terrestrial 3D laser scans, LIDAR, etc. 49

SOLAP and Big GeoData Today s SOLAP is Spatially-centered and Big Data-aware More powerful than simple point location analysis Integrates well in geospatial dataflow ecosystems Fast analysis of large Volumes Does Just-in-Time, very high Velocity expected soon Excellent tool to analyse Veracity The move to Variable, unstructured data hasn t been done yet but is possible (ex. text) 50

Conclusion GIS and BI have evolved in silos for many years R&D bridging both universes started mid-90s Market is reaching maturity A scientific community exists as well as products R&D will bring stronger bridges with Big GeoData We live in complex technological ecosystems where data (geodata) has the potential to deliver new powerful insights

Food for thought As the IT infrastructure inevitably changes over time, analysts and vendors (especially new entrants) become uncomfortable with what increasingly strikes them as a dated term, and want to change it for a newer term that they think will differentiate their coverage/products... When people introduce a new term, they inevitably (and deliberately, cynically?) dismiss the old one as just technology driven and backward looking, while the new term is business oriented and actionable (Elliott, 2011). 52

Thank you! More info at these web sites: http://sirs.scg.ulaval.ca/yvanbedard/ Technology transfer = Map4Decision ( www.intelli3.com )