Big Data Big Business - Achieving advantage through technology

Size: px
Start display at page:

Download "Big Data Big Business - Achieving advantage through technology"

Transcription

1 Big Data Big Business - Achieving advantage through technology Prof. Dr. Michael Feindt, Karlsruhe Institute of Technology KIT Chief Scientific Advisor, Blue Yonder GmbH & Co KG European Life & Health Tour, London, October 12, 2012

2 Big Data: Google, Facebook: Unstructured data from the web Map Reduce For many users it means something quite different: Gigantic databases Technology at CERN and other particle accelerators Grid Computing Predictive Analytics, NeuroBayes Data driven decision making in companies Also very interesting for insurances! Statistically relevant insight instead of gut feeling.

3 Predictive Analytics - the IT-topic of the coming years -gigantic value of data stored in data warehouses -optimisation and automatisation of strategic and especially regularly reccurring operative decisons -Predictive Analytics Software uses information in company data bases, combines it with external data sources, and processes it with most modern mathematical methods to calculate predictions about the future, employing probability densities, and on this basis makes optimal decisions.

4

5 Big Data at CERN: 40 million collisions per second. 1 PByte= 1015 Byte= bytes of data / second Prof. Dr. Michael Feindt, KIT and Blue Yonder, Swiss Re Life and Health Insurance Tour, London, Oct. 12,

6 Data rates Trigger: Datareduction 1/10 Mio. 1 PB per year are stored and made available to thousands of physicists worldwide: à GRID

7 1 PetaByte = Byte If 1 bit corresponds to one leaf... 1 Pbyte corresponds to all leaves on earth

8 NeuroBayes A high-tech-algorithm from experimental high energy physics can learn complex dependencies from historical data bases of companies and uses this for predictions of the future. Based on an artificial neural network, but is much more. Extremely high generalistion ability (i.e. the future reality is well described by the prognosed probability density.)

9 NeuroBayes in elementary particle physics: Discovery of new particles and new reactions Used in Online-Trigger at LHCb Full automation of complex scientific analyses (KEK: Equivalent of about 500 PhD theses with 72 NeuroBayes networks: efficiency +100% compared to manual work of 400 scientists in 10 years)

10 Knowledge from world-class research in high energy physics CERN, Fermilab, KEK Teilchenkollisionen the largest particle accelators of the world collisions per second. Gigantic data Peta-Bytes of raw data, to be distributed world-wide 1 interesting event per 10 mio collisions. applied to problems in economy Prof. Dr. Michael Feindt, KIT and Blue Yonder, Swiss Re Life and Health Insurance Tour, London, Oct. 12, 2012 Seite 10

11 Stationary, distance-, online trading Better sales predictions and optimised merchandise planning for articles, to improve ability to deliver and less unsold articles at end of season What were if fashion in the right colour and size were never sold out? Page 11

12 1000 stores, 5000 fresh articles Up to 5 mio. sales predictions per day Up to 1.5 billion sales predictions per year Fully automatic optimal purchase quantity calculation per store/article/ day and automatic ordering What, if a large retailer knew exactly how much fruit of each sort were sold which day? Page 12

13 Millions of customers, hundred millions of historic transactions Fair, risk-adjusted tariffs Precise overview over the risk of an insurance What, if an insurance knew individual risks over months and years into the future? Page 13

14 A company with strong brains Founded in 2008, Blue Yonder with its NeuroBayes Suite belongs to the leading vendors of prediction and pattern recognition software or Predictive Analytics.» Unique predictive analytics suite with a combination of statistical algorithms and highly optimised neural nets.» Distinguished and experienced physicists and information scientists from famous research institutes like CERN are the developing team» NeuroBayes has its origin in experimental particle physics and was developed in more than 400 man-years. Seite 14

15 The difference: better algorithms / employees / results Cyberchampion Award Data Mining Cup Top Product Trading Retail Technology Award Bwcon:CyberOne Award Winner High Potentials CyberChampions judges yooung and expanding companies in the technoloy region around Karlsruhe, Winner 2006,,auction prices (,,ebay ) Winner 2009 Sales prediction ( Libri ) Winner 2010 Intelligent Couponing ( Amazon ) After bronze in 2011 readers of the journal handelsjournal voted for Silver for the NeuroBayes in 2012 in the category cost effectiveness Retail technology award: Best Enterprise Solution for Blue- Yonder-Customer OTTO 2012: Blue Yonder most innovative Medium sized/ growth companies in Baden- Württemberg Seite 15

16 Neural Networks The information (the knowledge, die expertise) is in the connections between the nerve cells. Each neuron takes fuzzy decisions (Fuzzy-logic) NeuroBayes > learns extremely fast from historical data (weeksà minutes) > is extremely robust > suppresses statistical noise (high generalisibility) > Can make binary decisions (classify) > Can calculate complete probability densities > Can predict the future reliably

17 Prediction of the complete probability density Expectation value Mode Standard deviation (Volatility) Deviation from normal distibution (heavy tail)

18 Turnaround prediction for distance selling company

19

20

21

22 The <phi-t> mouse game: or: even your ``free will is predictable //

23 Technology overview: NeuroBayes» System integration ion basis of standard protocols and interfaces» Highly performant, scalable data processing architecture» Handles data in batch or real-time streaming mode» Training at run time possible without degradation of performance Seite 23

24 NeuroBayes applications for insurances

25 e.g. Individual risk predictions for car insurances: Accident probability Claims distribution Large claim prediction Contract cancellation prediction Successfully implemented at

26 Correlations to target variable Ramler II-Plot

27 NeuroBayes constructs risk optimal tariff systems Majority of good customers pays too much and thus subsidize the bad customers not paying enough: 40% pay too much 60% pay too little NeuroBayes adjusts the premium to the individual risk (at constant overall premium) 57% pay less than before 43% pay more Increase of the new NeuroBayes premium by 10%: 50% pay less than before 50% pay more Anzahl Kunden Prämie zu hoch Prämie zu niedrig Anzahl Kunden Anzahl Kunden Risiko/Prämie Prämie, normiert Prämie, normiert

28 NeuroBayes delivers precise prognoses for the customer-individual number and hight of claims Premium differentiation: NeuroBayes adjusts premium to customer-individual risk Customer structure optimisation Bind your good customers and take the bad customers Rentability improvement: Simultaneously increase your total premium volume and decrease your claims rate with a more just tariff system Risiko Premium volume Anzahl Kunden Alter Tarif NeuroBayes Claims rate Bisheriger Tarif Prämie, normiert Alter Tarif NeuroBayes

29 Private health insurance claims per year anything but normally distributed... NeuroBayes has the solution for difficult distributions of type f (t) = (1 " P)# $(t) + P# f (t t > 0)! Many insured persons (fraction1-p) do not generate any claim When there is at least one claim, (fraction P), these are distributed according to f(t t>0). This distribution has fat tails (extremely high claims). t Difficult to handle by classical methods

30 NeuroBayes calculates for each insured person x the individualised Bayesian probability density. NeuroBayes has the solution for difficult distributions of type f (t x) = (1! P( x))"!(t)+ P( x)" f (t t > 0, x ) Insured person x will have no claims with probability 1-P(x) If insured person x will have any claim, the costs will be distributed according to f(t t>0,x) t δ(t) = Dirac- delta-,,function (distribution)

31 Evaluation of prediction methods Typical classical prediction methods e.g. generalised linear models return one value (point estimator). Often the interpretation is not unique. Often the value must be calibrated. Mostly no uncertainty or distribution of the truth around this value are predcicted. NeuroBayes can do more. From probability density economically optimal point estimator can be determined. Often used quality criterion: Mean square deviation or R 2 Not robust for distributions with,,fat tail And economically irrelevant (insurance pays, not 2 ) Quality of prediction results should be evaluated by following criteria: > high individualisation > generalisibility (no overtraining; i.e. individualisation turns out to be correct). >correct prediction of expectation values (Autocalibration). >correct predcition of uncertainty in form of credibility intervals. Much better: Median absolute deviation (MAD)

32 As large as possible individualisation The area between the lift-chart and the diagonal (Gini-coefficient) is as large as possible Gini = 0.41 Gini = 0.32 Sort customers according to predicted risk. Select fraction x with largest predcitions. y= Fraction of cumulated claims in this selection. max Fläche = 0.47 NeuroBayes Prof. Dr. Michael Feindt NeuroBayes-Prognosen als Basis für risikogerechte Wechselmodelle in der PKV classical model The NeuroBayes prediction individualises better! Seite 32

33 Judgement of prediciton methods Good generalisation ability (no overtraining) Gini coefficient (area between lift chart and diagonal) on test sample (green) compatible with expectation of training sample Lift-Chart for true claims Die forbidden regions correspond to a sorting power better than the truth (impossible) or worse than random. Test-Datensatz The area in this case is compapatible wiht the expectation (even a bit better) Trainings-Datensatz Prof. Dr. Michael Feindt NeuroBayes-Prognosen als Basis für risikogerechte Wechselmodelle in der PKV Seite 33

34 Check of calibration (NeuroBayes-diagonal plot) Test-Sample NeuroBayes-expectation value prediction: The mean value of the truth of all insured persons with mean prediction in 100 -bin really is 100 prediction is correct!

35 Quality of prediction Sorting of classical methods seems sensible, there is some correlation. However, without further calibration as prediction of mean value unusable. Red points Mean value of truth in bin Green region contains 68 % of entries with given mean prediction yellow region contains 95 % of entries with given mean Should lie on diagonal Should be as narrow as possible klassisches Modell NeuroBayes Prognose Daten mit Rechnungsbetrag>0 Daten mit Rechnungsbetrag>0 Prof. Dr. Michael Feindt NeuroBayes-Prognosen als Basis für risikogerechte Wechselmodelle in der PKV Seite 35

36 Quality of prediction Test of individual credibility intervals In future we will know the truth. We already now can predict that it will lie in 68 % of all cases in the predicted 1σ-credibility interval in 95 % of all cases in the predicted 2σ-credibility interval Typical test result: Fraction of entries in 1σ-interval : 68% expected Fraction of entries in 2σ-interval : 95% expected 67.9% measured 94.3% measured Tests in many very different applications: NeuroBayes -credibility intervals are very reliable. Most classical methods cannot calculate reliable zuverlässigen confidence or credibility intervals. Experience from NeuroBayes PKV-projekts: Credibility intervals for prediction in 2 years About 9% larger that for next year.

37 Prediction of quantiles is reliable over the complete widths. Multi-quantile-test: Fraction of insured persons in bins of e.g. expected mean costs, whose true claims (future information) will lie below the predicted 30%-, 20%-, 10%-...quantile. Prognostizierte mittlere Kosten The true costs are distributed over the complete width of the predcited probability density just as predicted. The predicted quantiles are reliably reconstructed. Attention: Bayes theorem! If tests are separately performed in subsamples, this separation is not allowed not depend on future information! Allowed are e.g. separation into sex, age, tariff, and any predicted quantities, in short any information known at prediction time.

38 Long-time prediction from anamnesis Target here: Probability that an insured person will claim more than average for his age/sex Simple NeuroBayes text analysis of anamnesis at start of contract (regularised Naive Bayes-ansatz) Measure for classical modelling: Percentage risk loading Comparison of sorting power for different time horizons: Bayes-analysis of anamnesis makes significant predcition even more that 10 years in advance Usual risk loading factors almost no correlation to truth, long term even worse than random Prof. Dr. Michael Feindt NeuroBayes-Prognosen als Basis für risikogerechte Wechselmodelle in der PKV Seite 38

39 Usage of NeuroBayes allows large improvements... Customer-individual claims distribution Probability distributions of indivual insured persons can vary considerably. Sorting powe NeuroBayes successfully sorts insured persons according to expected claims NeuroBayes calculates an individual probability distribution for each single customer. This allows a prediction of all releveant quantiles, thresholds and other statistical quantities. NeuroBayes shows by far best sorting power. The rank correlation coefficient is 3-times larger compare to classical (GLM)- predcition models. rsp: Rangkorrelationskoeffizient nach Spearman

40 Big Data and Predictive Analytics data driven individualisation of reliable predictions is possible using most modern statistical methods and software on large data sets. Prediction of individual risks More justice in tariffs, more profitable and simultaneously for the majority of clients more attractive tariffs. Individual customer scoring Individual optimisation of cross-selling Contract cancellation predictions Churn management Insurances only use a small part of the treasure sleeping in there data bases. Gigantic economic chances!

NeuroBayes Big Data Predictive Analytics for High Energy Physics & "Real Life

NeuroBayes Big Data Predictive Analytics for High Energy Physics & Real Life NeuroBayes Big Data Predictive Analytics for High Energy Physics & "Real Life Prof. Dr. Michael Feindt Karlsruhe Institute of Technology Founder & Chief Scientific Advisor, Blue Yonder GmbH&Co KG Blue

More information

The Best from Two Worlds The Blue Yonder View on Data Analytics

The Best from Two Worlds The Blue Yonder View on Data Analytics The Best from Two Worlds The Blue Yonder View on Data Analytics Prof. Dr. Michael Feindt IEKP, Karlsruhe Institute of Technology Founder, Phi-T GmbH Founder & Chief Scientific Advisor, Blue Yonder GmbH

More information

Maximum Likelihood vs. Least Squares

Maximum Likelihood vs. Least Squares Precondition Basis Maximum Likelihood vs. Least Squares pdf exactly known Height of pdf Mean and variance known Deviation from mean Efficiency Complexity Robustness Correlated measurements Special case

More information

NeuroBayes An advanced statistical tool for high energy physics and business

NeuroBayes An advanced statistical tool for high energy physics and business NeuroBayes An advanced statistical tool for high energy physics and business Prof. Dr. Michael Feindt CETA - Centrum für Elementarteilchen- und Astroteilchenphysik IEKP, Universität Karlsruhe Phi-T GmbH,

More information

Software for data analysis and accurate forecasting. Forecasts for Guaranteed Profits. The Predictive Analytics Software for Insurance Companies

Software for data analysis and accurate forecasting. Forecasts for Guaranteed Profits. The Predictive Analytics Software for Insurance Companies Software for data analysis and accurate forecasting Forecasts for Guaranteed Profits The Predictive Analytics Software for Insurance Companies About Blue Yonder Blue Yonder, established in 2008, is the

More information

Blue Yonder Research Papers

Blue Yonder Research Papers Blue Yonder Research Papers Why cutting edge technology matters for Blue Yonder solutions Prof. Dr. Michael Feindt, Chief Scientific Advisor Abstract This article gives an overview of the stack of predictive

More information

Neural networks in data analysis

Neural networks in data analysis ISAPP Summer Institute 2009 Neural networks in data analysis Michal Kreps ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis p. 1/38 Outline What are the neural networks 1 Basic

More information

Software for data analysis and accurate forecasting. Forecasts for Certain Profits. The Predictive Analytics Software for Insurance Companies

Software for data analysis and accurate forecasting. Forecasts for Certain Profits. The Predictive Analytics Software for Insurance Companies Software for data analysis and accurate forecasting Forecasts for Certain Profits The Predictive Analytics Software for Insurance Companies About Blue Yonder Thanks to its highly successful NeuroBayes

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Big Data. Fast Forward. Putting data to productive use

Big Data. Fast Forward. Putting data to productive use Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize

More information

Prof. Dr. Michael Feindt KCETA - Centrum für Elementarteilchen- und Astroteilchenphysik IEKP, Universität Karlsruhe, KIT Phi-T GmbH, Karlsruhe

Prof. Dr. Michael Feindt KCETA - Centrum für Elementarteilchen- und Astroteilchenphysik IEKP, Universität Karlsruhe, KIT Phi-T GmbH, Karlsruhe NeuroBayes et al.: professional methods for optimised reconstruction algorithms and statistical analysis Prof. Dr. Michael Feindt KCETA - Centrum für Elementarteilchen- und Astroteilchenphysik IEKP, Universität

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

Why is Internal Audit so Hard?

Why is Internal Audit so Hard? Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets

More information

Data-Driven Decisions: Role of Operations Research in Business Analytics

Data-Driven Decisions: Role of Operations Research in Business Analytics Data-Driven Decisions: Role of Operations Research in Business Analytics Dr. Radhika Kulkarni Vice President, Advanced Analytics R&D SAS Institute April 11, 2011 Welcome to the World of Analytics! Lessons

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr. INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr. Meisenbach M. Hable G. Winkler P. Meier Technology, Laboratory

More information

Automated decision-making along the product life cycle saves OTTO millions

Automated decision-making along the product life cycle saves OTTO millions Customer Case Study RETAIL Automated decision-making along the product life cycle saves OTTO millions OTTO is a leader in Smart Data in German retail Overview Customer Online retailer for fashion and lifestyle

More information

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

Neural Network and Genetic Algorithm Based Trading Systems. Donn S. Fishbein, MD, PhD Neuroquant.com

Neural Network and Genetic Algorithm Based Trading Systems. Donn S. Fishbein, MD, PhD Neuroquant.com Neural Network and Genetic Algorithm Based Trading Systems Donn S. Fishbein, MD, PhD Neuroquant.com Consider the challenge of constructing a financial market trading system using commonly available technical

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

Outline. What is Big data and where they come from? How we deal with Big data?

Outline. What is Big data and where they come from? How we deal with Big data? What is Big Data Outline What is Big data and where they come from? How we deal with Big data? Big Data Everywhere! As a human, we generate a lot of data during our everyday activity. When you buy something,

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data Yinong Chen 2 Big Data Big Data Technologies Cloud Computing Service and Web-Based Computing Applications Industry Control Systems

More information

Driving Insurance World through Science - 1 - Murli D. Buluswar Chief Science Officer

Driving Insurance World through Science - 1 - Murli D. Buluswar Chief Science Officer Driving Insurance World through Science - 1 - Murli D. Buluswar Chief Science Officer What is The Science Team s Mission? 2 What Gap Do We Aspire to Address? ü The insurance industry is data rich but ü

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

Corporate Brochure. The best forecasts with Big Data. Software for data analysis and accurate forecasting

Corporate Brochure. The best forecasts with Big Data. Software for data analysis and accurate forecasting Corporate Brochure The best forecasts with Big Data Software for data analysis and accurate forecasting Content About Blue Yonder 3 Analyzing and Using Big Data 4 Blue Yonder Portfolio 5 Demand Planning

More information

Business Intelligence and Decision Support Systems

Business Intelligence and Decision Support Systems Chapter 12 Business Intelligence and Decision Support Systems Information Technology For Management 7 th Edition Turban & Volonino Based on lecture slides by L. Beaubien, Providence College John Wiley

More information

Using Adaptive Random Trees (ART) for optimal scorecard segmentation

Using Adaptive Random Trees (ART) for optimal scorecard segmentation A FAIR ISAAC WHITE PAPER Using Adaptive Random Trees (ART) for optimal scorecard segmentation By Chris Ralph Analytic Science Director April 2006 Summary Segmented systems of models are widely recognized

More information

From Big Data to Smart Data Thomas Hahn

From Big Data to Smart Data Thomas Hahn Siemens Future Forum @ HANNOVER MESSE 2014 From Big to Smart Hannover Messe 2014 The Evolution of Big Digital data ~ 1960 warehousing ~1986 ~1993 Big data analytics Mining ~2015 Stream processing Digital

More information

Data Science Center Eindhoven. Big Data: Challenges and Opportunities for Mathematicians. Alessandro Di Bucchianico

Data Science Center Eindhoven. Big Data: Challenges and Opportunities for Mathematicians. Alessandro Di Bucchianico Data Science Center Eindhoven Big Data: Challenges and Opportunities for Mathematicians Alessandro Di Bucchianico Dutch Mathematical Congress April 15, 2015 Contents 1. Big Data terminology 2. Various

More information

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

High-Performance Analytics

High-Performance Analytics High-Performance Analytics David Pope January 2012 Principal Solutions Architect High Performance Analytics Practice Saturday, April 21, 2012 Agenda Who Is SAS / SAS Technology Evolution Current Trends

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

Spin-Off from Physics Research to Business

Spin-Off from Physics Research to Business From Delphi to Phi-T Spin-Off from Physics Research to Business Prof. Dr. Michael Feindt KCETA - Centrum für Elementarteilchen- und Astroteilchenphysik IEKP, Universität Karlsruhe, Karlsruhe Institute

More information

Magruder Statistics & Data Analysis

Magruder Statistics & Data Analysis Magruder Statistics & Data Analysis Caution: There will be Equations! Based Closely On: Program Model The International Harmonized Protocol for the Proficiency Testing of Analytical Laboratories, 2006

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

How To Use Blue Yonder'S Predictive Analytics Software

How To Use Blue Yonder'S Predictive Analytics Software Blue Yonder in practice Successfully realize Industry 4.0 s potential with accurate forecasts and automated decision-making Examples of applications of Blue Yonder Predictive Analytics in industry Blue

More information

Grabbing Value from Big Data: Mining for Diamonds in Financial Services

Grabbing Value from Big Data: Mining for Diamonds in Financial Services Financial Services Grabbing Value from Big Data: Mining for Diamonds in Financial Services How financial services companies can harness the innovative power of big data 2 Grabbing Value from Big Data:

More information

A Property and Casualty Insurance Predictive Modeling Process in SAS

A Property and Casualty Insurance Predictive Modeling Process in SAS Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly

More information

Practice#1(chapter1,2) Name

Practice#1(chapter1,2) Name Practice#1(chapter1,2) Name Solve the problem. 1) The average age of the students in a statistics class is 22 years. Does this statement describe descriptive or inferential statistics? A) inferential statistics

More information

Statistical Challenges with Big Data in Management Science

Statistical Challenges with Big Data in Management Science Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Data Mining mit der JMSL Numerical Library for Java Applications

Data Mining mit der JMSL Numerical Library for Java Applications Data Mining mit der JMSL Numerical Library for Java Applications Stefan Sineux 8. Java Forum Stuttgart 07.07.2005 Agenda Visual Numerics JMSL TM Numerical Library Neuronale Netze (Hintergrund) Demos Neuronale

More information

Monitoring chemical processes for early fault detection using multivariate data analysis methods

Monitoring chemical processes for early fault detection using multivariate data analysis methods Bring data to life Monitoring chemical processes for early fault detection using multivariate data analysis methods by Dr Frank Westad, Chief Scientific Officer, CAMO Software Makers of CAMO 02 Monitoring

More information

BI SURVEY. The world s largest survey of business intelligence software users

BI SURVEY. The world s largest survey of business intelligence software users 1 The BI Survey 12 KPIs and Dashboards THE BI SURVEY 12 The Customer Verdict The world s largest survey of business intelligence software users 11 This document explains the definitions and calculation

More information

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swiss-prot MSDB, NCBI nr dbest Species specific ORFS

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Real-time PCR: Understanding C t

Real-time PCR: Understanding C t APPLICATION NOTE Real-Time PCR Real-time PCR: Understanding C t Real-time PCR, also called quantitative PCR or qpcr, can provide a simple and elegant method for determining the amount of a target sequence

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

Big Data and utility function in bank services. Nikolay K. Vitanov 1

Big Data and utility function in bank services. Nikolay K. Vitanov 1 Big Data and utility function in bank services Selected aspects Nikolay K. Vitanov 1 1 Institute of Mechanics, Bulgarian Academy of Sciences Sofia, 16. 06. 2015 Vitanov (BAS) Big Data and utility function

More information

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations

More information

Behavioral Segmentation

Behavioral Segmentation Behavioral Segmentation TM Contents 1. The Importance of Segmentation in Contemporary Marketing... 2 2. Traditional Methods of Segmentation and their Limitations... 2 2.1 Lack of Homogeneity... 3 2.2 Determining

More information

Big Data and Analytics:

Big Data and Analytics: responsive, credible, flexible Big Data and Analytics: New data sources create transformation opportunities Mike Davis Principal Analyst All images acknowledged msmd advisors Ltd 2012 1 Running order Why

More information

Congrats to Game Winners. How can computation use data to solve problems? What topics have we covered in CS 202? Part 1: Completed!

Congrats to Game Winners. How can computation use data to solve problems? What topics have we covered in CS 202? Part 1: Completed! CS 202: Introduction to Computation " UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department Professor Andrea Arpaci-Dusseau How can computation use data to solve problems? Congrats to Game Winners

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

See the wood for the trees

See the wood for the trees See the wood for the trees Dr. Harald Schöning Head of Research The world is becoming digital socienty government economy Digital Society Digital Government Digital Enterprise 2 Data is Getting Bigger

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Navigating the big data challenge

Navigating the big data challenge Navigating the big data challenge Do you have lots of data but few insights? By Rasmus Wegener and Velu Sinha Rasmus Wegener is a partner with Bain & Company in Atlanta. Velu Sinha is a partner in Bain

More information

2.500 Threshold. 2.000 1000e - 001. Threshold. Exponential phase. Cycle Number

2.500 Threshold. 2.000 1000e - 001. Threshold. Exponential phase. Cycle Number application note Real-Time PCR: Understanding C T Real-Time PCR: Understanding C T 4.500 3.500 1000e + 001 4.000 3.000 1000e + 000 3.500 2.500 Threshold 3.000 2.000 1000e - 001 Rn 2500 Rn 1500 Rn 2000

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Using Predictive Maintenance to Approach Zero Downtime

Using Predictive Maintenance to Approach Zero Downtime SAP Thought Leadership Paper Predictive Maintenance Using Predictive Maintenance to Approach Zero Downtime How Predictive Analytics Makes This Possible Table of Contents 4 Optimizing Machine Maintenance

More information

How the Past Changes the Future of Fraud

How the Past Changes the Future of Fraud How the Past Changes the Future of Fraud Addressing payment card fraud with models that evaluate multiple risk dimensions through intelligence Card fraud costs the U.S. card payments industry an estimated

More information

Big Data Strategies Creating Customer Value In Utilities

Big Data Strategies Creating Customer Value In Utilities Big Data Strategies Creating Customer Value In Utilities National Conference ICT For Energy And Utilities Sofia, October 2013 Valery Peykov Country CIO Bulgaria Veolia Environnement 17.10.2013 г. One Core

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

CoolaData Predictive Analytics

CoolaData Predictive Analytics CoolaData Predictive Analytics 9 3 6 About CoolaData CoolaData empowers online companies to become proactive and predictive without having to develop, store, manage or monitor data themselves. It is an

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Problem Solving and Data Analysis

Problem Solving and Data Analysis Chapter 20 Problem Solving and Data Analysis The Problem Solving and Data Analysis section of the SAT Math Test assesses your ability to use your math understanding and skills to solve problems set in

More information

Data Mining and Neural Networks in Stata

Data Mining and Neural Networks in Stata Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it

More information

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Professor Paul Cheung Director, United Nations Statistics Division Building the Global Information System Elements of

More information

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist 2015 Analyst and Advisor Summit Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist Agenda Key Facts Offerings and Capabilities Case Studies When to Engage

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

Data Mining and Visualization

Data Mining and Visualization Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research

More information

KNIME UGM 2014 Partner Session

KNIME UGM 2014 Partner Session KNIME UGM 2014 Partner Session DYMATRIX Stefan Weingaertner DYMATRIX CONSULTING GROUP 1 Agenda 1 Company Introduction 2 DYMATRIX Customer Intelligence Offering 3 PMML2SQL / PMML2SAS Converter 4 Uplift

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Algorithmic Trading Session 1 Introduction. Oliver Steinki, CFA, FRM

Algorithmic Trading Session 1 Introduction. Oliver Steinki, CFA, FRM Algorithmic Trading Session 1 Introduction Oliver Steinki, CFA, FRM Outline An Introduction to Algorithmic Trading Definition, Research Areas, Relevance and Applications General Trading Overview Goals

More information

Streaming Analytics and the Internet of Things: Transportation and Logistics

Streaming Analytics and the Internet of Things: Transportation and Logistics Streaming Analytics and the Internet of Things: Transportation and Logistics FOOD WASTE AND THE IoT According to the Food and Agriculture Organization of the United Nations, every year about a third of

More information

Data Centric Computing Revisited

Data Centric Computing Revisited Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data

More information

Text Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com

Text Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com Text Analytics with Ambiverse Text to Knowledge www.ambiverse.com Version 1.0, February 2016 WWW.AMBIVERSE.COM Contents 1 Ambiverse: Text to Knowledge............................... 5 1.1 Text is all Around

More information

About The Express Software Identification Database (ESID)

About The Express Software Identification Database (ESID) About The Express Software Identification Database (ESID) The Express Software Identification Database (ESID) is a comprehensive catalog of commercial and free PC software applications that run on Windows

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

AMS 5 CHANCE VARIABILITY

AMS 5 CHANCE VARIABILITY AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and

More information

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH 205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Predictive modelling around the world 28.11.13

Predictive modelling around the world 28.11.13 Predictive modelling around the world 28.11.13 Agenda Why this presentation is really interesting Introduction to predictive modelling Case studies Conclusions Why this presentation is really interesting

More information

Signature Verification Why xyzmo offers the leading solution.

Signature Verification Why xyzmo offers the leading solution. Dynamic (Biometric) Signature Verification The signature is the last remnant of the hand-written document in a digital world, and is considered an acceptable and trustworthy means of authenticating all

More information

Big Data Introduction, Importance and Current Perspective of Challenges

Big Data Introduction, Importance and Current Perspective of Challenges International Journal of Advances in Engineering Science and Technology 221 Available online at www.ijaestonline.com ISSN: 2319-1120 Big Data Introduction, Importance and Current Perspective of Challenges

More information

Improve Cooperation in R&D. Catalyze Drug Repositioning. Optimize Clinical Trials. Respect Information Governance and Security

Improve Cooperation in R&D. Catalyze Drug Repositioning. Optimize Clinical Trials. Respect Information Governance and Security SINEQUA FOR LIFE SCIENCES DRIVE INNOVATION. ACCELERATE RESEARCH. SHORTEN TIME-TO-MARKET. 6 Ways to Leverage Big Data Search & Content Analytics for a Pharmaceutical Company Improve Cooperation in R&D Catalyze

More information

Insights. Did we spot a black swan? Stochastic modelling in wealth management

Insights. Did we spot a black swan? Stochastic modelling in wealth management Insights Did we spot a black swan? Stochastic modelling in wealth management The use of financial economic models has come under significant scrutiny over the last 12 months in the wake of credit and equity

More information