The Best from Two Worlds The Blue Yonder View on Data Analytics

Similar documents
NeuroBayes Big Data Predictive Analytics for High Energy Physics & "Real Life

Maximum Likelihood vs. Least Squares

Neural networks in data analysis

Blue Yonder Research Papers

How To Use Blue Yonder'S Predictive Analytics Software

Update following the publication of the Bank of England Stress Test. 16 December 2014

SAP Makes Big Data Real Real Time. Real Results.

Big Impacts from Big Data UNION SQUARE ADVISORS LLC

Automated decision-making along the product life cycle saves OTTO millions

CURATING THE ASSORTMENT. Capital Markets Day 25 March 2015

ORACLE UTILITIES ANALYTICS

Among the sponsors: Predictive Analytics in the Manufacturing Industry

Should Costing Version 1.1

Using Predictive Maintenance to Approach Zero Downtime

Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora

the 3 keys to achieving real-time visibility of your customer s experience

Software for data analysis and accurate forecasting. Forecasts for Guaranteed Profits. The Predictive Analytics Software for Insurance Companies

Big Data simplified. SAPSA Impuls, Stockholm Martin Faiss & Niklas Packendorff, SAP

Using In-Memory Data Fabric Architecture from SAP to Create Your Data Advantage

SAP HANA Enterprise Cloud

From Big Data to Smart Data Thomas Hahn

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Voya Financial Advisors, Inc. Registered Representative s Website Terms of Use

Understanding the Value of In-Memory in the IT Landscape

RockWare Click-Wrap Software License Agreement ( License )

WEBSITE TERMS OF USE

Integrated Sales and Operations Business Planning for Chemicals

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

CLSA GLOBAL PORTFOLIO TRADING SERVICES ANNEX. In this Annex, the following capitalised terms have the following meanings:

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

The 4 Pillars of Technosoft s Big Data Practice

Confidently Anticipate and Drive Better Business Outcomes

SAP HANA Vora : Gain Contextual Awareness for a Smarter Digital Enterprise

Spin-Off from Physics Research to Business

UBS Electronic Trading Agreement Global Markets

Big Data. Fast Forward. Putting data to productive use

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

SAP SE - Legal Requirements and Requirements

Business Intelligence and Decision Support Systems

The Scientific Data Mining Process

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Outline. What is Big data and where they come from? How we deal with Big data?

How To Use Merrimack Web Site

1 Performance Moves to the Forefront for Data Warehouse Initiatives. 2 Real-Time Data Gets Real

BIG DATA What it is and how to use?

Introducing Oracle Exalytics In-Memory Machine

The digital future and dealing with disruption

SOFTWARE LICENSE AGREEMENT

App Terms and Conditions!

TERMS & CONDITIONS: LIMITED LICENSE:

Use Advanced Analytics to Guide Your Business to Financial Success

Annual Financial Results Presentation for year ended 30 June October 2014

Driving Insurance World through Science Murli D. Buluswar Chief Science Officer

EUROPE S LEADING ONLINE FASHION DESTINATION Capital Markets Day: Exciting the Customer in March 2015

MACHINE LEARNING IN HIGH ENERGY PHYSICS

The Business Case for Information Management An Oracle Thought Leadership White Paper December 2008

MAGNAVIEW SOFTWARE SUPPORT & MAINTENANCE. TERMS & CONDITIONS September 3, 2015 version

INVESTOR PRESENTATION. Third Quarter 2014

Licence Agreement. Document filename. HSCIC Licence Agreement. Directorate / Programme. Solution, Design, Assurance and Standards. Status.

The Edge Editions of SAP InfiniteInsight Overview

The Bayesian Approach to Forecasting. An Oracle White Paper Updated September 2006

EUROPE S LEADING ONLINE FASHION DESTINATION Capital Markets Day: Advertising Services 25 March 2015

decisions that are better-informed leading to long-term competitive advantage Business Intelligence solutions

ANALYTICS BUILT FOR INTERNET OF THINGS

Oracle Data Integrator and Oracle Warehouse Builder Statement of Direction

These terms and conditions are appended to the General Terms and Conditions

Turning Data into Actionable Insights: Predictive Analytics with MATLAB WHITE PAPER

MASTER SERVICES AGREEMENT - DIGITAL ADVERTISING SERVICES

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Formulate Winning Sales and Operations Strategies Through Integrated Planning

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

The NEC Group: Optimizing the Visitor Experience with Analytics and Business Intelligence Solutions from SAP

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

Analecta Vol. 8, No. 2 ISSN

INVESTOR PRESENTATION. First Quarter 2014

Demand more from your retail marketing. HP Retail Promotion Manager

ELECTRONIC TRADING FACILITIES SUPPLEMENTAL TERMS AND CONDITIONS OF TRADING

The London School of Architecture Website Terms & Conditions

Terms and Conditions of Use of the Mediastore / Data Asset Management Platform of Konica Minolta Business Solutions Europe GmbH

Corporate Brochure. The best forecasts with Big Data. Software for data analysis and accurate forecasting

ETPL Extract, Transform, Predict and Load

Loss Prevention Data Mining Using big data, predictive and prescriptive analytics to enpower loss prevention

Xtract Resources Plc (AIM:XTR) Chepica Gold Mine: Quarterly Results Presentation

Referral Agency and Packaging Agency Agreement

SAP Audit Management A Preview

Infineon Technologies North America Corp. Terms and Conditions of Sale

AUTOMOTIVE AND SERVICE PARTS

Transcription:

The Best from Two Worlds The Blue Yonder View on Data Analytics Prof. Dr. Michael Feindt IEKP, Karlsruhe Institute of Technology Founder, Phi-T GmbH Founder & Chief Scientific Advisor, Blue Yonder GmbH

Personal Vita 1982-1991 DESY (PLUTO, CELLO at PETRA) 1991-1997 CERN (DELPHI at LEP) since 1997 Professor at Univ. Karlsruhe (now KIT) since 1997 CDF II at Fermilab / DELPHI at LEP / (CMS at LHC) since 2008 Belle, Belle II at KEK 1999/2000 invention of NeuroBayes algorithm 2002 foundation of Phi-T 2008 foundation of Blue Yonder, with officesvita in Karlsruhe, Hamburg, London Personal

Blue Yonder provides predictions --- based on data (scientifically sound with quantified uncertainty -- testable and falsifiable, as predictive as possible) Data Mining and conventional Business Intelligence Predictive Analytics WHAT HAS HAPPENED? WHY? WHAT WILL HAPPEN? DATA PATTERN PATTERN PREDICTIONS Influence on business performance

Blue Yonder fastest growing BI company in Germany The Business Application Research Center (BARC) found that Blue Yonder is the fastest growing BI software company in Germany. With 175% turnaround increase in 2012 Blue Yonder is leading the field compared to 250 competitors in the area of Business Intelligence and data management. Career opportunities in Karlsruhe, Hamburg and London: www.blue-yonder.com/unternehmen/ueber-blue-yonder/karriere.html www.blue-yonder.com/en/company/about-blue-yonder/careers.html

Blue Yonder: Awards for Big Science Startup DLD 2013: Best Enterprise Solution Retail Technology Award 2012 3 time winner of the Data Mining Cup Michael Feindt bwcon Hightech Award 2012 CERN OpenLab Analytics Workshop Feb 20, 2014 Finalist 2012 Finalist 2013 Special Prize Deutsche Boerse 2012

Blue Yonder is unique Predictive Analytics Suite with a combination of statistical algorithms and neural networks Experienced, award-winning physicists and computer scientists from renowned institutions such as CERN make up the development team : >80 Ph.D.s Optimal order quantity Probability densityp E(X) 100 120 Sales

General Overview Fundamental Research Insurance Premium Optimization... Sales Forecasts Sports event predictions Fraud Detection NeuroBayes Patient s treatment optimization

Our roots: 30 years of elementary particle physics, peaking at the LHC at CERN. Built to understand how exactly our universe works. Schreiben Sie hier Ihren Text CERN (1960) CERN (2005) LHC: 27km circumference Photo: CERN

Our Background: High Energy Physics Fundamental research at the forefront of science The Large Hadron Collider 100m under ground, 27km circumference Very strong worldwide competition on getting results from data à very strong statistical methods: fast & robust multivariate algos Photo: CERN

NeuroBayes - the basis of Blue Yonder Fundamental research at the forefront of science Significance control Input Preprocessing Postprocessing Output Invented in 2000 for reconstruction of b quark fragmentation in DELPHI experiment. Further development in Phi-T, later Blue Yonder. Several hundred successful applications in DELPHI, CDF II, Belle, CMS, ATLAS, LHCb, H1, AMS experiments. More than 400 men-years development. Robust and fast algorithm for reconstruction (= prediction) of conditional probability densities classifications with extreme generalization ability by means of Bayesian regularization.

NeuroBayes example: The LHCb trigger very fast intelligent decisions with NeuroBayes At the LHC (CERN) per experiment: 40 000 000 events per second, which translates into 1 PetaByte (1,000,000,000,000,000 Byte) per second raw data But only 1 PB of interesting data per year can be stored. Need online reduction by 1 : 10,000,000 At the LHCb experiment 30 000 instances of NeuroBayes running real-time 24/7 filter out the interesting events without introducing lifetime bias Photo: CERN Michael Feindt CERN OpenLab Analytics Workshop Feb 20, 2014

NeuroBayes example: Full reconstruction of B mesons at the Japanese B factory experiment Belle Fundamental research at the forefront of science Ø Belle experiment at KEK/Japan Ø 400 physicists from whole world Ø 10 years of data taking and analysis Ø World record luminosity Ø > 400 publications Ø Automatic hierarchical reconstruction system built from 72 NeuroBayes networks reconstructed about 1100 different reactions with a factor 2 larger efficiency than all analyses before Ø Much cleaner signal Ø Work performed by 2 PhD and 1 master student Ø Corresponds to 500 normal PhD theses Ø Corresponds to another 10 years of data taking

NeuroBayes from Science to Industry Predictive Analytics in High Energy Physics Energy Momentum 10 0 50 Direction 90 E(X) Type Sub-Detector Kaon Calo propabilityp Particle Property Distance 200... Use all available and relevant information as input, e.g. measurements from the various sub-detectors, NeuroBayes will extract statistically significant patterns in the data to derive the prediction. Prediction will return the best estimator for a measurement including a statistically sound estimation of the expected spread.

NeuroBayes from Science to Industry Predictive Analytics in industry Article size Picture size M 21% e.g. Retail colour red E(X) Previous sales brand 24 171 propabilityp Prediction sales price 19,9... Use all available and relevant information as input, e.g. article properties, previous sales, etc NeuroBayes will extract statistically significant patterns in the data to derive the prediction. Prediction will return e.g. the most probable sales rate including a statistically sound estimation of the expected spread. NeuroBayes allows data-driven analysis and forecasts both in science and industry

We are familiar with your questions BIG DATA PREDICTIVE ANALYTICS Each day, companies receive giant quantities of structured and unstructured data from different sources Sample recognition of data for the prediction of risks and development to provide forward-looking bases for decision-making Growth of the data volume MCKINSEY MCKINSEY Increase of operating margins + 40% + 60%

We are the pioneers SOCIAL MEDIA PRODUCT RECOMMENDATION RETARGETING PREDICTIVE MAINTENANCE AREAS OF APPLICATION PREDICTION OF RISKS SALES PREDICTIONS DYNAMIC PRICING CUSTOMER ANALYSIS MERCHANDISE PLANNING

NeuroBayes System Working principle Historic Data Record a =... b =... c =...... t =! NeuroBayes Teacher Expert System Expertise Probability density function for target quantity t Current Data Record a =... b =... c =...... t =? NeuroBayes Expert

Forecasts conditional on many features The forecast depends on several features which themselves have several manifestations Features may be arbitrarily correlated Features can be ordered / unordered sets and continuous variables This results in a complex & high-dimensional space which is impossible to treat with classical methods Example: What s the right dose for a patient, if she is 56 years old, slightly overweight, works out on 2 days a week, enjoys late dinners, has been treated for 2 other diseases already, etc, etc,

Preprocessing Automatic preprocessing is an extremely important process before training a network It involves steps like: Ø smoothing out statistical fluctuations and outliers in input variables Ø transforming variables to unified characteristics (mean, width) Ø Decorrelate variables Ø Find variables with significant impact / throw out others The benefits of a powerful preprocessing algorithm involve Ø increased robustness Ø increased network training results (minima easier to find) Ø increased training speed

NeuroBayes output interpretable as a Bayesian posterior Likelihood Prior Posterior Evidence NB1: Posterior ist the probability that the theory is right under the given data NB2: Pior distributions need to applied carefully. A non-informative prior distribution is not flat (tax authorities know about that). Bayesian regularisation at each analysis step in order to avoid overfitting and select models with good generalisation properties is essential!

Technology Overview OPERATIONAL SYSTEM (Environment) WEB UI & SERVICES NEUROBAYES SYSTEM Simulator (industry-specific) STREAM- BASED INPUT BATCH INPUT High Performance Data Transformation R > R > Teacher Expert Predictions Industryspecific Output REAL-TIME FEED OUTPUT HISTORICAL TRANSACTION DATA EXPERTISE (Industry-specific models) 7/24 operation needed - high safety Software as a Service

Sales Forecast Fashion Example: OTTO Group Per item:» Sales forecast» Two estimates on spread (68% and 95% confidence intervals) Sales [units]

Perishable goods in Supermarkets Meat, fruit & veg, bread, diary,. Around 7% of all perishable foods (e.g. meat, fruit & veg., etc) have to be disposed of in German supermarkets. That s about 89M tons of food wasted per year

Insurance e.g. Individual risk predictions for car insurances: Accident probability Claims distribution Large claim prediction Contract cancellation prediction è Successfully implemented at

Correlations to target variable Ramler II-Plot

NeuroBayes delivers precise prognoses for the customer-individual number and height of claims Premium differentiation: NeuroBayes adjusts premium to customer-individual risk Customer structure optimisation Bind your good customers and take the bad customers Rentability improvement: Simultaneously increase your total premium volume and decrease your claims rate with a more just tariff system Risiko Premium volume Anzahl Kunden Alter Tarif NeuroBayes Claims rate Bisheriger Tarif Prämie, normiert Alter Tarif NeuroBayes

Private health insurance claims per year anything but normally distributed... NeuroBayes has the solution for difficult distributions of type f (t) = (1 P) δ(t) + P f (t t > 0) Many insured persons (fraction1-p) do not generate any claim When there is at least one claim, (fraction P), these are distributed according to f(t t>0). This distribution has fat tails (extremely high claims). t Difficult to handle by classical methods

NeuroBayes calculates for each insured person x the individualised Bayesian probability density. NeuroBayes has the solution for difficult distributions of type f (t x) = (1 P( x)) δ(t)+ P( x) f (t t > 0, x ) Insured person x will have no claims with probability 1-P(x) If insured person x will have any claim, the costs will be distributed according to f(t t>0,x) t δ(t) = Dirac- delta-,,function (distribution)

Healthcare insurance long term prediction from anamnesis NeuroBayes Expert Estimation (risk premium loading) Ø Expert estimations are at best random for patients with a long history even systematically wrong. Ø NeuroBayes forecasts costs correctly and significantly beats expert estimations more than 10 years into the future

Revenue Forecast Example: dm Large German drug-store chain Forecasts for individual stores» Prediction of the full probability density function.» Precise forecast of the expected revenue including exptected spread (68% and 95% confidence intervals) Probability density function revenue

Just released: First big data predictive analytics standard software solution Precise sales forecasts for retail and CPG Access to big data predictive analytics for B2B end users Easy handling through intuitive web-ui Software-as-a-Service allows usage of Forward Demand without high inadvance investments into software, infrastructure or highly skilled personell Michael Feindt CERN OpenLab Analytics Workshop www.blue-yonder.com/forwarddemand Feb 20, 2014

Future (2015): intelligent decisions directly on sensor (Belle II pixel detector), before big data reaches any computer Goal: find (classify) all relevant pixel information Big Data challenge: process approx. 10G bit per s Solution: NeuroBayes on Hardware Blue Yonder

NeuroBayes @ hardware*: 200 million decisions per second à 5ns for one decision *features dedicated hardware board: BELLE II experiment :» NeuroBayes on FPGA» Field Programmable Gate Array: (XILINX Virtex6 VLX75T) utilizes 40 boards: à 8 billion intelligent decisions per second Blue Yonder» Clock frequency: 250 MHz 1 decision per clock cycle» Approx. (fully pipelined architecture)» Probability decision output possible

Just launched: www.datascienceacademy.eu Michael Feindt CERN OpenLab Analytics Workshop Feb 20, 2014

Blue Yonder trends in data analytics... Very important and very difficult to bridge enormous gap between nerdish superscientists and ordinary business people à European Data Science Academy Build solutions that are easy to use by non-scientists and as automatic as possible. Leave strategic decisions to customer, but automate everyday detail decisions. Not single data mining projects, but integrate automated decisions into the customer s work flow

Blue Yonder trends in data analytics: data bases Excel is standard in many departments in most companies Graphical data representations are largely unavailable / unknown SQL already is advanced for many users Many data warehouses are write-only, analysis access too slow / restricted The advanced stuff is: Vertical data bases! (columnar ntuples for 25 years already in HEP) In-memory data bases (mostly SQL-based) Not one optimal solution for all problems of the world. MAP/REDUCE (Hadoop)

Blue Yonder trends in data analytics... Fast interactive parallel data analysis / model development important for professional data scientists: PIAF/PROOF MAP/REDUCE too much forgotten not the answer to everything (good: CPU at data) Combine goodies of both... root / C++ too complicated and not flexible enough (e.g. store new columns) Simplicity (important even for super data scientists): C++ à Python Python: numpy, pandas etc very effective, CPU time is (largely) NOT a limitation Machine Learning community often thinks too deterministic, no good understanding of (no interface for) statistical errrors and weights

Blue Yonder trends in data analytics... Article Why cutting edge technology matters for Blue Yonder solutions http://www.blue-yonder.com/en/resource-center/research-papers.html gives overview on our view on algorithms. We consider as important keywords: To know what makes a good prediction Domain knowledge in different industries NeuroBayes Neural Networks Bayesian statistics Deep learning Reinforcement learning Big data

Disclaimer This Presentation (the Presentation) has been prepared by Blue Yonder GmbH & Co KG (collectively, with any officer, director, employee, advisor or agent of any of them, the Preparers) for the purpose of setting out certain confidential information in respect of Blue Yonder s business activities and strategy. References to the Presentation includes any information which has been or may be supplied in writing or orally in connection with the Presentation or in connection with any further inquiries in respect of the Presentation. This Presentation is for the exclusive use of the recipients to whom it is addressed. This Presentation and the information contained herein is confidential. In addition to the terms of any confidentiality undertaking that a recipient may have entered into with Blue Yonder, by its acceptance of the Presentation, each recipient agrees that it will not, and it will procure that each of its agents, representatives, advisors, directors or employees (collectively, Representatives), will not, and will not permit any third party to, copy, reproduce or distribute to others this Presentation, in whole or in part, at any time without the prior written consent of Blue Yonder, and that it will keep confidential all information contained herein not already in the public domain and will use this Presentation for the sole purpose of setting out [familiarizing itself with] certain limited background information concerning Blue Yonder and its business strategy and activities. The foregoing confidentiality obligation shall be legally binding for the recipient infinitely. This Presentation is not intended to serve as basis for any investment decision. If a recipient has signed a confidentiality undertaking with Blue Yonder, this Presentation also constitutes Confidential Information for the purposes of such undertaking. While the information contained in this Presentation is believed to be accurate, the Preparers have not conducted any investigation with respect to such information. The Preparers expressly disclaim any and all liability for representations or warranties, expressed or implied, contained in, or for omissions from, this Presentation or any other written or oral communication transmitted to any interested party in connection with this Presentation so far as is permitted by law. In particular, but without limitation, no representation or warranty is given as to the achievement or reasonableness of, and no reliance should be placed on, any projections, estimates, forecasts, analyses or forward looking statements contained in this Presentation which involve by their nature a number of risks, uncertainties or assumptions that could cause actual results or events to differ materially from those expressed or implied in this Presentation. Only those particular representations and warranties which may be made in a definitive written agreement, when and if one is executed, and subject to such limitations and restrictions as may be specified in such agreement, shall have any legal effect. By its acceptance hereof, each recipient agrees that none of the Preparers nor any of their respective Representatives shall be liable for any direct, indirect or consequential loss or damages suffered by any person as a result of relying on any statement in or omission from this Presentation, along with other information furnished in connection therewith, and any such liability is expressly disclaimed. Except to the extent otherwise indicated, this Presentation presents information as of the date hereof. The delivery of this Presentation shall not, under any circumstances, create any implication that there will be no change in the affairs of Blue Yonder after the date hereof. In furnishing this Presentation, the Preparers reserve the right to amend or replace this Presentation at any time and undertake no obligation to update any of the information contained in the Presentation or to correct any inaccuracies that may become apparent. This Presentation shall remain the property of Blue Yonder. Blue Yonder may, at any time, request any recipient, or its Representatives, shall promptly deliver to Blue Yonder or, if directed in writing by Blue Yonder, destroy all confidential information relating to this Presentation received in written, electronic or other tangible form whatsoever, including without limitation all copies, reproductions, computer diskettes or written materials which contain such confidential information. At such time, all other notes, analyses or compilations constituting or containing confidential information in the recipient s, or their Representatives, possession shall be destroyed. Such destruction shall be certified to Blue Yonder by the recipient in writing. Neither the dissemination of this Presentation nor any part of its contents is to be taken as any form of commitment on the part of the Preparers or any of their respective affiliates to enter any contract or otherwise create any legally binding obligation or commitment. The Preparers expressly reserve the right, in their absolute discretion, without prior notice and without any liability to any recipient to terminate discussions with any recipient or any other parties. The distribution of this Presentation in certain jurisdictions may be restricted by law and, accordingly, recipients of this Presentation represent that they are able to receive this Presentation without contravention of any unfulfilled registration requirements or other legal restrictions in the jurisdiction in which they reside or conduct business.