Reinventing Business Intelligence through Big Data



Similar documents
The Value of Advanced Data Integration in a Big Data Services Company. Presenter: Flavio Villanustre, VP Technology September 2014

90% of your Big Data problem isn t Big Data.

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

Risk Solutions. Industrial Big Data Analytics Lessons from the trenches Flavio Villanustre LexisNexis Risk Solutions

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Big Data & Analytics. The. Deal. About. Jacob Büchler jbuechler@dk.ibm.com Cand. Polit. IBM Denmark, Solution Exec IBM Corporation

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter ML:XI. XI. Cluster Analysis

Database Marketing, Business Intelligence and Knowledge Discovery

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Predictive Analytics Certificate Program

Get More Scalability and Flexibility for Big Data

Data Warehousing and Data Mining in Business Applications

Driving Value From Big Data

IBM Big Data in Government

Integrating a Big Data Platform into Government:

BIG DATA COURSE 1 DATA QUALITY STRATEGIES - CUSTOMIZED TRAINING OUTLINE. Prepared by:

This Symposium brought to you by

Value of. Clinical and Business Data Analytics for. Healthcare Payers NOUS INFOSYSTEMS LEVERAGING INTELLECT

IBM Content Analytics adds value to Cognos BI

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

SQL Server 2005 Features Comparison

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

Navigating Big Data business analytics

Understanding Data Warehouse Needs Session #1568 Trends, Issues and Capabilities

Keywords : Data Warehouse, Data Warehouse Testing, Lifecycle based Testing

How To Make Sense Of Data With Altilia

Architected Blended Big Data with Pentaho

Luncheon Webinar Series May 13, 2013

Ganzheitliches Datenmanagement

SURVEY REPORT DATA SCIENCE SOCIETY 2014

The Big Data Paradigm Shift. Insight Through Automation

Fusion Applications Overview of Business Intelligence and Reporting components

Predictive Analytics: Turn Information into Insights

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

III JORNADAS DE DATA MINING

How To Understand Business Intelligence

Build a Streamlined Data Refinery. An enterprise solution for blended data that is governed, analytics-ready, and on-demand

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Big Data Architect Certification Self-Study Kit Bundle

Hexaware E-book on Predictive Analytics

Solve your toughest challenges with data mining

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK

Descriptive to Predictive to Prescriptive Analytics: Move Up the Value Chain. Suren Nathan CTO

Data Warehouse (DW) Maturity Assessment Questionnaire

The Business Value of Predictive Analytics

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

ANALYTICS STRATEGY: creating a roadmap for success

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Demystifying Big Data Government Agencies & The Big Data Phenomenon

Management Decision Making. Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011

Information Management course

How the oil and gas industry can gain value from Big Data?

Web Data Mining: A Case Study. Abstract. Introduction

The Future of Business Analytics is Now! 2013 IBM Corporation

Data Discovery, Analytics, and the Enterprise Data Hub

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Big Data 101: Harvest Real Value & Avoid Hollow Hype

Big Data and Healthcare Payers WHITE PAPER

Data Mining for Successful Healthcare Organizations

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Advanced In-Database Analytics

Solve Your Toughest Challenges with Data Mining

Formal Methods for Preserving Privacy for Big Data Extraction Software

Prediction of Heart Disease Using Naïve Bayes Algorithm

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Data Mining Analytics for Business Intelligence and Decision Support

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Big Data and Your Data Warehouse Philip Russom

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

Where is... How do I get to...

Data Warehousing in the Age of Big Data

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

BIG Data Analytics Move to Competitive Advantage

The University of Jordan

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

The Next Wave of Data Management. Is Big Data The New Normal?

Data Analytics in Health Care

The Data Mining Process

NEWLY EMERGING BEST PRACTICES FOR BIG DATA

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Session 805 -End-to-End SAP Lumira: Desktop to On-Premise, Cloud, and Mobile

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Transcription:

Reinventing Business Intelligence through Big Data Dr. Flavio Villanustre VP, Technology and lead of the Open Source HPCC Systems initiative LexisNexis Risk Solutions Reed Elsevier LEXISNEXIS From RISK SOLUTIONS Analytics to Action Symposium March 26 th, 2014

A brief history of business intelligence 1958 The term BI is coined 1980 s DSS, Data Warehouses, Executive Information Systems, OLAP 2010 s BI is evolving once again 1960 s Beginnings as Decision Support Systems 2000 s Big Data brings new challenges (integration, cleansing, sematic analysis, new analytics) 2

LexisNexis Risk Solutions: About us Part of Reed Elsevier: worldwide digital information company Provider of data and analytics-based solutions Helps clients across all industries assess, predict and manage risk Leading positions in insurance, financial, legal and collections Products delivered using state of the art technology and analytics Designed and developed its own Big Data analytical platform (the HPCC Systems Big Data Platform) over the past 15 years Released the HPCC Systems Big Data Platform as an Open Source initiative in 2011 Continues to innovate through higher level abstractions, in areas of data integration and linkage, graph analytics, machine learning, data exploration and data visualization 3

The modern BI Customer data scattered among different business units Products: airlines, hotels, bank, insurance Channels: captive, independent or cross-sale agents, call center, internet Functions: marketing, sales, pricing, services Integrating customer data based on Big Data technologies Customer Identity Resolution Customer Relationship Identification Enrich customer profiles with external data Consumption Wealth Credit Social media Single Customer View across entire business Cross-sales Customer 360 view o Current customers from other Divisions o Past customers o Known fraudsters o Black-listed persons o Family members of existing customers o Customers of discounted affinity groups Fraud detection Total customer value Influencers Disrupters Attrition models Up-sale predictors Pricing estimators 4

Big Data, big challenges Data Warehousing: Sources are no longer just internal Data is dirty, incomplete, untrustworthy and structure doesn t reflect semantics anymore Data volumes grow exponentially Real time analytics are still required for operational BI Exploration drives new applications Data Analysis Feature extraction in unstructured data requires novel treatments which may not be easy to generalize Moving the data from the warehouse to the analytical system is no longer an option: in-place analytics Graph Analytics: Certain problems are better represented as graphs, but graph engines can quickly show scalability limitations 5

The modern Data Warehouse Must leverage Big Data and offer in-place analytics Must be able to manage unstructured data (NLP, for example) Must support advanced data mining and analytics Must offer near real time analysis Must be scalable and flexible Based on the TDWI checklist report: THE MODERN DATA WAREHOUSE 6

Challenges of large-scale data integration Data profiling and exploratory data analysis (EDA) are required to assess the semantics and value Quality controls, cleansing, parsing and normalization/standardization are key Rules based data integration show significant limitations: probabilistic record linkage is becoming the standard It is no longer just about Big Data; it s about Linked Data Linking goes beyond connecting attributes with entities: it must also identify links among entities Iterative exploration is paramount 7

Linked data from Big Data Data Sources Profiling Parsing Cleansing Normalization Standardization Data Preparation Processes (ETL) Matching Weights and Threshold Computation Blocking/ Searching Weight Assignment and Record Comparison Record Match Decision Linked Data File Additional Data Ingest Linking Iterations Record Linkage Processes 8

Probabilistic record linkage Others Rules-based matching: Based on logic (IF/ELSE or SWITCH statements) Example: If field values 1, 2 and 5 from source a are equivalent to values 3, 6 and 7 in source b, respectively, then declare a match. SALT (Scalable Automated Linking Technology) Probability-based matching: Based on computation of weights and thresholds; a match is declared only when the sum of all weights surpasses a certain threshold Example: Source A Threshold = 29 Source B Sum of Individual Field Specificity Values 9

Advanced analytics and machine learning OLAP alone is not sufficient anymore Machine learning is becoming prevalent Exploratory data analysis Correlation analysis Unsupervised and semi-supervised learning (clustering, distributions) Supervised learning (regression and classification models) Feature extraction can be challenging (think binary data) Naïve approaches to feature extraction lend themselves to high dimensional data: Manifold hypothesis Automated, generalizable and hierarchical feature extraction through deep learning NLP is integral part of a number of applications (sentiment analysis, entity/relationship extraction, etc.) 10

Graph traversal and analysis Many real world business problems can be modeled as graphs Influence/disruption graphs are important to BI: 360 customer view applications, business risk management, etc. Social Network Analytics is useful for fraud detection Social graphs can help with entity disambiguation (using the topology of the local graph as another attribute) Recommendation systems leverage multiple intertwined graphs (web traversal, customer/object, customer/customer and object/object relationships, etc.) 11

A social graph example Relationships are from public records (nonobvious in the healthcare data domain) MIKE JONES MD Is the prescribing doctor who prescribed Vicodin to patients in the target social cluster (James Anderson) He is a member of the same social cluster Also personally filled a vicodin prescription for himself. Question: Is it normal for you and 15 of your associates to all receive a prescription for vicodin within the same short timespan? 12

Beyond Naïve When referring to graphs, you may think about triple stores and just-in-time traversal and analytics These triple stores tend to require linear and continuous memory spaces But graphs can be also expressed as adjacency matrices Adjacency matrices can be pre-processed to speed up query time: turn O(n^2) or O(n^3) into O(1) or O(log n) And similar operations can be done through more traditional data centric approaches (data join operations, for example) Exponentiation of an adjacency matrix returning a new matrix with all paths of that degree 13

Putting it all together at LexisNexis Linked data at the core on HPCC Thor handles data profiling, parsing, cleansing, standardization and linkage, while Roxie handles real-time linking, analytics and delivery Data integration and entity linkage handled by SALT Regression, classification, clustering using ECL-ML Graph analytics with KEL (Knowledge Engineering Language) Visualization layer for interactive exploration and reporting 14

Final thoughts BI is undergoing a significant change, fuelled by Big Data Horizontal scalability and in-place analytics are increasingly important, particularly for iterative exploration New disciplines are required to make sense of the data: More sophisticated data integration Machine Learning Graph theory OLAP is not going away but proactive analytics are key While the challenge can be significant, the possibilities are endless 15

Questions? The Open Source HPCC Systems platform: http://hpccsystems.com LexisNexis Risk Solutions: http://lexisnexis.com/risk Machine Learning portal: http://hpccsystems.com/ml The HPCC Systems blog: http://hpccsystems.com/blog Community Forums: http://hpccsystems.com/bb Our GitHub portal: https://github.com/hpcc-systems 16