Spin-Off from Physics Research to Business



Similar documents
Maximum Likelihood vs. Least Squares

Neural networks in data analysis

Measurement of the Mass of the Top Quark in the l+ Jets Channel Using the Matrix Element Method

NeuroBayes Big Data Predictive Analytics for High Energy Physics & "Real Life

The Best from Two Worlds The Blue Yonder View on Data Analytics

Calorimetry in particle physics experiments

Top-Quark Studies at CMS

PHYSICS WITH LHC EARLY DATA

Study of the B D* ℓ ν with the Partial Reconstruction Technique

How To Teach Physics At The Lhc

High Energy Physics. Lecture 4 More kinematics and a picture show of particle collisions

A new inclusive secondary vertex algorithm for b-jet tagging in ATLAS

Theoretical Particle Physics FYTN04: Oral Exam Questions, version ht15

Measurement of Neutralino Mass Differences with CMS in Dilepton Final States at the Benchmark Point LM9

Top rediscovery at ATLAS and CMS

variables to investigate Monte Carlo methods of t t production

Data Mining Algorithms Part 1. Dejan Sarka

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.

Bounding the Higgs width at the LHC

Azure Machine Learning, SQL Data Mining and R

Theory versus Experiment. Prof. Jorgen D Hondt Vrije Universiteit Brussel jodhondt@vub.ac.be

How To Find The Higgs Boson

Blue Yonder Research Papers

Jets energy calibration in ATLAS

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Real Time Tracking with ATLAS Silicon Detectors and its Applications to Beauty Hadron Physics

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

Representing Uncertainty by Probability and Possibility What s the Difference?

Organizing Your Approach to a Data Analysis

Jet Reconstruction in CMS using Charged Tracks only

Physik des Higgs Bosons. Higgs decays V( ) Re( ) Im( ) Figures and calculations from A. Djouadi, Phys.Rept. 457 (2008) 1-216

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Progress in understanding quarkonium polarization measurements

Risk and return (1) Class 9 Financial Management,

A Guide to Detectors Particle Physics Masterclass. M. van Dijk

A Process for ATLAS Software Development

Using Predictive Analytics to Detect Fraudulent Claims

ATLAS NOTE ATLAS-CONF July 21, Search for top pair candidate events in ATLAS at s = 7 TeV. The ATLAS Collaboration.

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization

Particle Physics. Bryan Webber University of Cambridge. IMPRS, Munich November 2007

Protein Protein Interaction Networks

Tracking/Vertexing/BeamSpot/b-tag Results from First Collisions (TRK )

Advanced analytics at your hands

Quantitative Methods for Finance

Data Mining + Business Intelligence. Integration, Design and Implementation

Image Processing Techniques applied to Liquid Argon Time Projection Chamber(LArTPC) Data

FCC JGU WBS_v0034.xlsm

Predictive Modeling Techniques in Insurance

Data Mining mit der JMSL Numerical Library for Java Applications

Towards running complex models on big data

Algorithmic Trading Session 1 Introduction. Oliver Steinki, CFA, FRM

Highlights of Recent CMS Results. Dmytro Kovalskyi (UCSB)

Information and Decision Sciences (IDS)

Pattern Recognition and Prediction in Equity Market

Cross section, Flux, Luminosity, Scattering Rates

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

DYNAMIC PROJECT MANAGEMENT WITH COST AND SCHEDULE RISK

Among the sponsors: Predictive Analytics in the Manufacturing Industry

SAS Certificate Applied Statistics and SAS Programming

Software for data analysis and accurate forecasting. Forecasts for Guaranteed Profits. The Predictive Analytics Software for Insurance Companies

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Interstellar Cosmic-Ray Spectrum from Gamma Rays and Synchrotron

A three dimensional stochastic Model for Claim Reserving

The TOTEM experiment at the LHC: results and perspective

Concepts in Theoretical Physics

Simulation and Risk Analysis

The OPERA Emulsions. Jan Lenkeit. Hamburg Student Seminar, 12 June Institut für Experimentalphysik Forschungsgruppe Neutrinophysik

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Higher Diploma Programme in Financial Services. Syllabus

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Java Modules for Time Series Analysis

Dealing with large datasets

Implications of CMS searches for the Constrained MSSM A Bayesian approach

Portfolio Management for institutional investors

DIRECT MAIL: MEASURING VOLATILITY WITH CRYSTAL BALL

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Predictive modelling around the world

APPLIED MATHEMATICS ADVANCED LEVEL

MSc Finance and Economics detailed module information

Masses in Atomic Units

MICROSOFT EXCEL FORECASTING AND DATA ANALYSIS

Some Essential Statistics The Lure of Statistics

Using simulation to calculate the NPV of a project

Quantitative Inventory Uncertainty

Customer Classification And Prediction Based On Data Mining Technique

Prediction of Stock Performance Using Analytical Techniques

Gamma Distribution Fitting

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Chapter 12 Discovering New Knowledge Data Mining

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

Transcription:

From Delphi to Phi-T Spin-Off from Physics Research to Business Prof. Dr. Michael Feindt KCETA - Centrum für Elementarteilchen- und Astroteilchenphysik IEKP, Universität Karlsruhe, Karlsruhe Institute of Technology KIT & Phi-T GmbH, Karlsruhe DELPHI Closing Meeting CERN, May 29, 2008

HPC C( (1991-92) Initially energy and spatial resolutions catastrophic Sl Solve problems by looking IN DETAIL (144 modules ) into the real DATA. Correct by software

HPC design performance almost reached, physics analysis possible

Introduction of fsoftware Tasks e.g. ELEPHANT, MAMMOTH packages

(Inclusively reconstr.) B physics B* B**

Start to work with neural networks (in1993) BSAURUS: b flavour tagging g MACRIB: Combined Particle ID

Gi Gained dlt lots of experience with neural network applications in DELPHI (1993-2003) Electron id (ELEPHANT) Kaon, proton id (MACRIB) (BSAURUS:: Optimisation of resolutions inclusive B- E,,, Q-value Inclusive B production flavour tagging Inclusive B decay flavour tagging Inclusive B+/B0/Bs/Lambda_b separation B**, B s ** enrichment B fragmentation function Limit it on B s -mixing ii B 0 -mixing B- F/B-asymmetry B-> >wrongsigncharm)

Neural reconstruction ction of a direction Resolution of azimuthal angle of inclusively reconstructed B-hadrons in the DELPHI- detector first neural reconstruction of a direction NeuroBayes phi-direction Best classical" chi**2- fit (BSAURUS) After selection cut on estimated error: No selection: Resolution massively improved, no tails Improved resolution ==> allows reliable selection of good events

Nï Naïve neural networks and criticizm itii We ve tried that but it didn t give good results - Stuck in local minimum - Learning not robust We ve tried that but it was worse than our 100 person-years analytical high tech algorithm - Selected too naive input variables - Use your fancy algorithm as INPUT! We ve tried that but the predictions were wrong - Overtraining: the net learned statistical fluctuations Yeah but how can you estimate systematic errors? - How can you with cuts when variables are correlated? - Tests on data, data/mc agreement possible and done.

Address all these topics and build a professional robust and flexible neural network package for physics and general prediction tasks: (2000-2002) 2002) NeuroBayes turned out to be a very strong tool

After having looked outside the ivory tower, realized: These methods are not only applicable in physics <phi-t>: Foundation of private company out of University of Karlsruhe, initially sponsored by exist-seedprogramme of the federal ministery for Education and Research BMBF

History 2000-2002 NeuroBayes -specialisation for economy at the University of Karlsruhe Oct. 2002: GmbH founded, first industrial i projects June 2003: Removal into new office 199 sqm IT-Portal Karlsruhe May 2008: Expansion to 700 sqm office Foundation of sub-company Phi-T products&services Exclusive rights for NeuroBayes Staff all physicists (almost all from HEP) Continuous further development of NeuroBayes

Customers: (among others): BGV and VKB car insurances AXA and Central health insurances Lupus Alpha Asset Management dm drogerie markt (drugstore chain) Otto Versand (mail order business) Libri (book wholesale) Thyssen Krupp (steel industry)

NeuroBayes es principle Input Preprocessing NeuroBayes Teacher: Learning of complex relationships from existing data bases (e.g. Monte Carlo) NeuroBayes Expert: Prognosis for unknown data Signific cance co ontrol Postprocessing Output

How it works: training and application Historic or simulated data Data set a =... b =... c =...... t =! NeuroBayes Teacher Actual (new real) data Expert system Expertise Probability that hypothesis is correct (classification) or probability density for variable t Data set a =... b =... c =...... t =? NeuroBayes Expert f t t

NeuroBayes es task 1: Classifications Classification: Binary targets: Each single outcome will be yes or no NeuroBayes output is the Bayesian posterior probability that answer is yes (given that inclusive i rates are the same in training i and test sample, otherwise simple transformation necessary). Examples: > This elementary particle is a K meson. > This event is a Higgs candidate. > Germany will become soccer world champion in 2010. > Customer Meier will have liquidity problems next year. > This equity price will rise.

NeuroBayes es task 2: Conditional probability densities Probability density for real valued targets: For each possible (real) value a probability (density) is given. From that all statistical quantities like mean value, median, mode, standard deviation, percentiles etc can be deduced. Examples: > Energy of an elementary particle (e.g a semileptonically decaying B meson with missing neutrino) > Q value of a decay > Lifetime i of a decay > Price change of an equity or option > Company turnaround or earnings

Prediction of the complete probability distribution event by event unfolding - f r ( t x) Expectation ti value Standard deviation volatility Mode Deviations from normal distribution, e.g. crash probability t

Conditional probability densities in particle physics What is the probability density of the true B momentum in this semileptonic B candidate event taken with the CDF II detector with these n tracks with those momenta and rapidities in the hemisphere, which are forming this secondary vertex with this decay length and probability, this invariant mass and dtransverse momentum, this lepton information, this missing transverse momentum, this difference in Phi and Theta between momentum sum and vertex topology, etc pp f t r x r ( t x)

<phi-t> NeuroBayes es > is based on neural 2nd generation algorithms, Bayesian regularisation, optimised preprocessing with transformations and decorrelation of input variables and linear correlation to output. > learns extremely fast due to 2nd order methods and 0-iteration mode > can train with weights and background subtraction ti > is extremly robust against outliers > is immune against learning by heart statistical noise > tll tells you ifthere is nothing relevant tto be learned > delivers sensible prognoses already with small statistics > has advanced boost and cross validation features > is steadily further developed d

Ramler-plot (extended correlation matrix)

Ramler-II-plot (visualize correlation to target)

Visualisation of single input-variables

Visualisation of correlation matrix Variable 1: Training target

Visualisation of network performance Purity vs. efficiency Signal-effiziency vs. total efficiency (Lift chart)

Visualisation of NeuroBayes network topology

Successful applications in High Energy Physics: at CDF II: Electron ID, muon ID, kaon/proton ID Optimisation of resonance reconstruction in many channels (X, Y, D, D s, D s **, B, B s, B**,BB s **) Spin parity analysis of X(3182) Inclusion of NB output in likelihood fits B-tagging for high pt physics (top, Higgs, etc.) Single top quark production search Higgs search B-Flavour Tagging for mixing analyses (new combined tagging) B0, B s -lifetime, DG, mixing, CP violation

Further NeuroBayes applications in HEP: AMS: Particle ID design studies CMS (ongoing) : B-tagging Belle (ongoing) : Continuum suppression H1 (ongoing) : Hadron calorimeter energy resolution optimisation

More than 35 diploma and PhD Ph.D. theses from experiments DELPHI, CDF II, AMS and CMS used NeuroBayes or predecessors very successfully. Many of these can be found at www.phi-t.de Wissenschaft NeuroBayes Talks about NeuroBayes and applications: www-ekp.physik.uni-karlsruhe.de/~feindt i k h i dt Forschung

Just a few examples NeuroBayes soft electron identification for CDF II (Ulrich Kerzel, Michael Milnik, ik M.F.) Thesis U. Kerzel: on basis of Soft Electron Collection (much more efficient than cut selection or JetNet with same inputs) - after clever preprocessing p by hand and careful learning parameter choice this could also be as good as NeuroBayes

Just a few examples NeuroBayes selection

Just a few examples First observation of B_s1 and most precise of B_s2* Selection using NeuroBayes

Presse (Die Welt vom 21. April 2006)

Successful in competition with other data-miningmethods World s largest students competion: Dt Data-Mining-Cup i 2005: Fraud detection in internet trading 2006: price prediction in ebay auctions 2007: coupon redemption prediction

Start Idee NeuroBayes Idee Hintergrund Ziele NeuroBayes f(t x) Beispiele Historie Anwendung Prinzip Funktion Beispiel Konkurrenz Projekt l Forschung Projekt ll Ablauf Spiel Summary A BA Phi-T: Applications of NeuroBayes in Economy > Medicine and Pharma research e.g. effects and undesirable effects of drugs early tumor recognition > Banks e.g. credit-scoring (Basel II), finance time series prediction, valuation of derivates, risk minimised trading strategies, client valuation > Insurances e.g. risk and cost prediction for individual clients, probability of contract cancellation, fraud recognition, justice in tariffs > Trading chain stores: turnover prognosis for individual articles/stores Necessary prerequisite: Historic or simulated data must be available! 37

Start Idee NeuroBayes Idee Hintergrund Ziele NeuroBayes f(t x) Beispiele Historie Anwendung Prinzip Funktion Beispiel Konkurrenz Projekt l Forschung Projekt ll Ablauf Spiel Summary A BA Individual risk prognoses for car insurances: Accident probability Cost probability bilit distribution ib ti Large damage prognosis Contract cancellation prob. very successful at 38

Start Idee NeuroBayes Idee Hintergrund Ziele NeuroBayes f(t x) Beispiele Historie Anwendung Prinzip Funktion Beispiel Konkurrenz Projekt l Forschung Projekt ll Ablauf Spiel Summary A BA The unjustice of insurance premiums ms An nzahl Kun nden Ratio of the accident risk calculated using NeuroBayes to premium paid (normalised to same total premium sum): The majority of customers (with low risk) are paying too much. Less than half of the customers (with larger risk) do not pay enough, some by far not enough. These are currently subsidised by the more careful customers. Risiko/Prämie 39

Start Idee NeuroBayes Idee Hintergrund Ziele NeuroBayes f(t x) Beispiele Historie Anwendung Prinzip Funktion Beispiel Konkurrenz Projekt l Forschung Projekt ll Ablauf Spiel Summary A BA 40

Turnover prognosis for mail order business Start Idee NeuroBayes Idee Hintergrund Ziele NeuroBayes f(t x) Beispiele Historie Anwendung Prinzip Funktion Beispiel Konkurrenz Projekt l Forschung Projekt ll Ablauf Spiel Summary A BA 41

Start Idee NeuroBayes Idee Hintergrund Ziele NeuroBayes f(t x) Beispiele Historie Anwendung Prinzip Funktion Beispiel Konkurrenz Projekt l Forschung Projekt ll Ablauf Spiel Summary A BA Presse Physicists modernise sales management 42

Start Idee NeuroBayes Idee Hintergrund Ziele NeuroBayes f(t x) Beispiele Historie Anwendung Prinzip Funktion Beispiel Konkurrenz Projekt l Forschung Projekt ll Ablauf Spiel Summary A BA Test Test 43

Start Idee NeuroBayes Idee Hintergrund Ziele NeuroBayes f(t x) Beispiele Historie Anwendung Prinzip Funktion Beispiel Konkurrenz Projekt l Forschung Projekt ll Ablauf Spiel Summary A BA Prognosis of individual health costs Pilot project for a large private health insurance Prognosis of costs in following year for each person insured with confidence intervals 4 years of training, test on following year Results: Probability density for each customer/tarif combination Kunde N. 00000 Mann, 44 Tarif XYZ123 seit ca. 17 Jahre Very good test results! Has potential for a real and objective cost reduction in health management 44

Start Idee NeuroBayes Idee Hintergrund Ziele NeuroBayes f(t x) Beispiele Historie Anwendung Prinzip Funktion Beispiel Konkurrenz Projekt l Forschung Projekt ll Ablauf Spiel Summary A BA Prognosis of financial markets NeuroBayes based risk averse market neutral fonds for institutional investors Lupus Alpha NeuroBayes Short Term Trading Fonds (up to now 20 Mio, aim: 250 Mio Test end of 2008) VDI-Nachrichten, 9.3.2007 Test Börsenzeitung, 6.2.2008 45

Start Idee NeuroBayes Idee Hintergrund Ziele NeuroBayes f(t x) Beispiele Historie Anwendung Prinzip Funktion Beispiel Konkurrenz Projekt l Forschung Projekt ll Ablauf Spiel Summary A BA The <phi-t> mouse game: or: even your ``free will is predictable //www.phi-t.de/mousegame 46

Bindings and Licenses NeuroBayes is commercial software. Special rates for public research. Essentially free for high energy physics research. License files needed. Separately for expert and teacher. Please contact Phi-T. NeuroBayes core code written in Fortran. Libraries for many platforms (Linux, Windows, ) available. Bindings exist for C++, Fortran, java, lisp, etc. Code generator for easy usage exists for Fortran and C++. New: Interface to root-tmva available (classification only) Integration into root planned

Documentation Basics: M. Feindt, A Neural Bayesian Estimator for Conditional Probability Densities, E-preprint-archive physics 0402093 M. Feindt, U. Kerzel, The NeuroBayes Neural Network Package, NIM A 559(2006) 190 Web Sites: www.phi-t.de (Company web site, German & English) www.neurobayes.de (English site on physics results with NeuroBayes & all diploma and PhD theses using NeuroBayes) www-ekp.physik.uni-karlsruhe.de/~feindt (some NeuroBayes talks can be found here under -> Forschung)

Start Idee NeuroBayes Idee Hintergrund Ziele NeuroBayes f(t x) Beispiele Historie Anwendung Prinzip Funktion Beispiel Konkurrenz Projekt l Forschung Projekt ll Ablauf Spiel Summary A BA At DELPHI I have learned how to extract information and learn from data. With a physicist s pragmatic analytical and quantitative thinking, data-analysis and programming expertise and NeuroBayes, we have achieved a lot in physics and with Phi-T in very different areas of industry and economics. 49

Start Idee NeuroBayes Idee Hintergrund Ziele NeuroBayes f(t x) Beispiele Historie Anwendung Prinzip Funktion Beispiel Konkurrenz Projekt l Forschung Projekt ll Ablauf Spiel Summary A BA It is not physics, but the understanding, forecasting and steering of complex systems, with the tools of a physicist. It is (after( many years of being a physicist at least) ) equally fascinating. What is absolutely great: our influence is larger! First years were hard and risky: you have to sell! There is no public money just coming in But with many very successful projects we now have many contacts to high-ranked people e convinced of our capabilities. 50

Start Idee NeuroBayes Idee Hintergrund Ziele NeuroBayes f(t x) Beispiele Historie Anwendung Prinzip Funktion Beispiel Konkurrenz Projekt l Forschung Projekt ll Ablauf Spiel Summary A BA In general: HEP community very effective and well-trained. Standard and motivation much higher than in industry. Phi-T, NeuroBayes and its industrial applications are a real spin-off from High Energy Physics with its origin in DELPHI. btw: There is much to do! Fascinating and challenging projects and permanent positions available. Phi-T will grow! 51