Analyzing large and complex data sets in fusion: Modern tools from probability theory and pattern recognition
|
|
- Abel Wilson
- 8 years ago
- Views:
Transcription
1 FACULTY OF ENGINEERING AND ARCHITECTURE Analyzing large and complex data sets in fusion: Modern tools from probability theory and pattern recognition Geert Verdoolaege Department of Applied Physics Ghent University, Ghent, Belgium Laboratory for Plasma Physics, Royal Military Academy, Brussels, Belgium PRL Seminar Canberra, December 18, 2013
2 Overview 1 Data Science 2 Bayesian methods and integrated analysis Motivation Bayesian methodology Integrated data analysis Example applications 3 Pattern recognition in stochastic data sets Probabilistic manifolds Confinement regime visualization and identification Scaling laws Disruption prediction 4 Conclusion and outlook 1
3 Outline 1 Data Science 2 Bayesian methods and integrated analysis Motivation Bayesian methodology Integrated data analysis Example applications 3 Pattern recognition in stochastic data sets Probabilistic manifolds Confinement regime visualization and identification Scaling laws Disruption prediction 4 Conclusion and outlook 2
4 Evolution of science (1) 1000 years ago: experimental science Last few 100 years: theoretical science 3
5 Evolution of science (2) Last few decades: Computational science Today: Data science 4
6 The data deluge Big data : High-throughput scientific experiments: Telescopes Particle accelerators Fusion devices Sensor networks, satellite surveys Simulations Biology: e.g. human genome Sociology Digital archives... 5
7 The fourth paradigm (1) Jim Gray, Microsoft Research (Turing Award, 1998): Fourth paradigm The new model is for the data to be captured by instruments or generated by simulations before being processed by software and for the resulting information or knowledge to be stored in computers. Scientists only get to look at their data fairly late in this pipeline. The techniques and technologies for such data-intensive science are so different that it is worth distinguishing data-intensive science from computational science as a new, fourth paradigm for scientific exploration. 6
8 The fourth paradigm (2) The Fourth Paradigm: Data-Intensive Scientific Discovery, T. Hey, S. Tansley, K. Tolle, eds., Microsoft Research, Redmond, WA,
9 Data science Challenges: Large data sets Heterogeneous sources and formats Complex nonlinear dependencies Activities: Data analysis: extracting useful information Data visualization Data archiving and retrieval 8
10 What is a data scientist? Opinions vary: Hilary Mason, chief data scientist at bitly: Harvard Business Review, 2012: The sexiest job of the 21 st Century 9
11 Data analysis Level: low intermediate high Descriptive statistics for data visualization Analysis of time series, images, videos and fields (e.g. magnetic) Resampling, (motion) correction,... Fourier/wavelet analysis Inverse problems: Abel inversion, tomography,... Estimation and prediction Event detection, object recognition, object tracking, shape analysis Extracting patterns in data spaces: clusters, regression lines/surfaces Probability theory and statistics for modeling, estimation, hypothesis testing, prediction, error analysis,... 10
12 Fusion data characteristics Challenge Remedy/opportunity Massive databases Real-time requirements (plasma control) Clever and fast algorithms Substantial and heterogeneous uncertainties, stochasticity Probability theory Error propagation studies e.g. Bayesian probability theory Redundancy Pattern recognition: (nonlinear) relations, cluster structure High dimensionality Dimensionality reduction Data visualization 11
13 Outline 1 Data Science 2 Bayesian methods and integrated analysis Motivation Bayesian methodology Integrated data analysis Example applications 3 Pattern recognition in stochastic data sets Probabilistic manifolds Confinement regime visualization and identification Scaling laws Disruption prediction 4 Conclusion and outlook 12
14 Outline 1 Data Science 2 Bayesian methods and integrated analysis Motivation Bayesian methodology Integrated data analysis Example applications 3 Pattern recognition in stochastic data sets Probabilistic manifolds Confinement regime visualization and identification Scaling laws Disruption prediction 4 Conclusion and outlook 13
15 Motivation and objectives Methods: Bayesian probability theory (BPT) Integrated data analysis (IDA) Motivation and objectives: Enhance reliability and robustness of physics results Identify and reduce uncertainty sources Study non-gaussian error propagation Model and quantify systematic uncertainty Integrate heterogeneous data sets: exploit redundancy and interdependencies Diagnostic design ITER and fusion reactors: Reduced space: limited data Reduced access: systematic uncertainty In situ calibration 14
16 Outline 1 Data Science 2 Bayesian methods and integrated analysis Motivation Bayesian methodology Integrated data analysis Example applications 3 Pattern recognition in stochastic data sets Probabilistic manifolds Confinement regime visualization and identification Scaling laws Disruption prediction 4 Conclusion and outlook 15
17 Bayesian recipe Forward model: link (physical) model with data x, e.g.: x = f ( θ ) Assume statistical (random) measurement error, often Gaussian, e.g.: x = f ( θ ) + ν, with ν N (0, σν) 2 = p(x 1 θ ) = exp 2πσν [x f ( θ )] 2 2σ 2 ν Likelihood Propose prior information p( θ ), possibly uninformative (e.g. uniform) Apply Bayes theorem: solve the inverse problem (also profile reconstruction!) 16
18 Bayes theorem p( θ x, I ) = p( x θ, I )p( θ I ) p( x I ) x = data vector, θ = parameter vector, I = implicit assumptions Posterior Likelihood: misfit between model and data Prior: expert or diffuse knowledge Evidence: normalization 17
19 Bayes theorem p( θ x, I ) = p( x θ, I )p( θ I ) p( x I ) x = data vector, θ = parameter vector, I = implicit assumptions Posterior Likelihood: misfit between model and data Prior: expert or diffuse knowledge Evidence: normalization 17
20 Bayes theorem p( θ x, I ) = p( x θ, I )p( θ I ) p( x I ) x = data vector, θ = parameter vector, I = implicit assumptions Posterior Likelihood: misfit between model and data Prior: expert or diffuse knowledge Evidence: normalization 17
21 Bayes theorem p( θ x, I ) = p( x θ, I )p( θ I ) p( x I ) x = data vector, θ = parameter vector, I = implicit assumptions Posterior Likelihood: misfit between model and data Prior: expert or diffuse knowledge Evidence: normalization 17
22 Outline 1 Data Science 2 Bayesian methods and integrated analysis Motivation Bayesian methodology Integrated data analysis Example applications 3 Pattern recognition in stochastic data sets Probabilistic manifolds Confinement regime visualization and identification Scaling laws Disruption prediction 4 Conclusion and outlook 18
23 Principle of IDA Integrate heterogeneous (conflicting?) data sets: likelihoods Model statistical and systematic uncertainties: Measured data Calibration Underlying physical model Estimate model parameters using Bayes theorem Sample the posterior: Markov Chain Monte Carlo 19
24 Example applications n e from Thomson scattering, soft X-ray and interferometry n e profiles from Li-beam and interferometry: factor 400 increase in temporal resolution! Magnetic equilibrium reconstruction: current filaments + Grad-Shafranov Impurity concentrations from CXS + BES Antenna design EM wave propagation via stochastic FDTD: uncertainty in boundary conditions, medium properties, etc. 20
25 Outline 1 Data Science 2 Bayesian methods and integrated analysis Motivation Bayesian methodology Integrated data analysis Example applications 3 Pattern recognition in stochastic data sets Probabilistic manifolds Confinement regime visualization and identification Scaling laws Disruption prediction 4 Conclusion and outlook 21
26 Thomson scattering + soft X-ray + interferometer Fischer et al., Plasma Phys. Control. Fusion, 44, pp ,
27 Bremsstrahlung + CX impurity lines p (x ) n (x cm 3 ) e,1 2 p (x ) n (x cm 3 ) e,2 2 p p Z eff, s ε p p Z eff, s δ Verdoolaege et al., IEEE Trans. Plasma Sci.,, pp.,
28 CXS + BES Helium concentration profile at ITER: 10% error Spectrometer at TEXTOR and later AUG BES: emission from beam Error reduction from 40% 10% He concentration profile with 68% credibility bounds (TEXTOR): Planned at AUG 24
29 Outline 1 Data Science 2 Bayesian methods and integrated analysis Motivation Bayesian methodology Integrated data analysis Example applications 3 Pattern recognition in stochastic data sets Probabilistic manifolds Confinement regime visualization and identification Scaling laws Disruption prediction 4 Conclusion and outlook 25
30 Pattern recognition opportunities Clustering/classification: grouping of data points Dimensionality reduction: data visualization, better learning efficiency Regression: (nonlinear) deterministic relation between variables Objectives 1 Contribute to physics studies by extracting patterns, structure and relations from data 2 Contribute to plasma control through real-time data interpretation Note: Both model-based and purely data-driven approaches are possible One approach does not exclude the other 26
31 Outline 1 Data Science 2 Bayesian methods and integrated analysis Motivation Bayesian methodology Integrated data analysis Example applications 3 Pattern recognition in stochastic data sets Probabilistic manifolds Confinement regime visualization and identification Scaling laws Disruption prediction 4 Conclusion and outlook 27
32 A different view on measurement Traditional measurement: value + error bar Uncertainty is quantized by probability Measurement = sample from underlying (non-gaussian?) probability distribution Goal of measurement = probing the underlying distribution Stochastic component of data descriptive model = probability density function (PDF) Examples: y = β 0 + β 1 x } {{ } Deterministic component y = β 0 + β 1 (x + + ɛ }{{} Stochastic component η ) + ɛ }{{} Error in (independent) variable PDF contains all information about measurement! 28
33 The challenge The probabilistic nature of data The fundamental object resulting from a measurement is a probability distribution. Any further processing of the data (statistical inference, pattern recognition) should respect this inherent probabilistic nature. Pattern recognition in PDF spaces Pattern recognition is based on geometry, primarily distance Geometry of probability distributions Obtain PDF from Repeated measurements (Bayesian) probability theory... 29
34 Probability + geometry: a happy marriage Probabilistic manifold: PDF = point on manifold Coordinates = PDF parameters Distance between PDFs? Information geometry 30
35 Information geometry Riemannian differential geometry Fisher information = unique metric tensor: ) ( Parametric probability model: p x θ = ( ) [ θ g µν = E 2 θ µ θ ν ln p θ = N-dimensional parameter vector ( x θ )], µ, ν = 1... N Line element: ds 2 = g µν dθ µ dθ ν Minimum-length curve: geodesic Geodesic distance (GD) Natural and theoretically well motivated distance between PDFs 31
36 Univariate Gaussian distribution PDF: p(x µ, σ) = 1 ] (x µ)2 exp [ 2πσ 2σ 2 Line element: ds 2 = dµ2 σ 2 + 2dσ2 σ 2 Hyperbolic geometry: Poincaré half-plane model 32
37 Poincaré half-plane p 1 : µ 1 = 4, σ 1 = 0.7; p 2 : µ 2 = 3, σ 2 =
38 Poincaré half-plane p 1 : µ 1 = 4, σ 1 = 0.7; p 2 : µ 2 = 3, σ 2 =
39 Outline 1 Data Science 2 Bayesian methods and integrated analysis Motivation Bayesian methodology Integrated data analysis Example applications 3 Pattern recognition in stochastic data sets Probabilistic manifolds Confinement regime visualization and identification Scaling laws Disruption prediction 4 Conclusion and outlook 35
40 ITPA Confinement Database ITPA Global H Mode Confinement Database (DB3) ITER H-Mode Database Working Group D.C. McDonald et al., Nucl. Fusion 47, pp , entries from 19 tokamaks Approximate error estimates: limited information on PDF! Assume standard deviations Gaussian PDFs (maximum entropy) Different machines different error estimates: difficult to handle using classic approach! 36
41 Confinement regime classification Distinguish between L- and H-mode: 3845 L and 6207 H Identify edge localized mode (ELM) behavior 8 global engineering variables: I p, B t, n e, P loss, R, a, M eff, κ Variables statistically independent product of Gaussians Note: this does not exclude the variables to be related through a deterministic relation! 37
42 Gaussian product manifold Plasma and machine variables: x i x, i = 1,..., 8 Distribution parameters: µ i, σ i Gaussian product distribution: ( x p µ 1,..., µ 8, σ 1,..., σ 8) N = N ( x i µ i, σ i) Measurements ( A and B: µa,b = µ 1 A,B,..., µ8 A,B GD in closed form: ( µa ) GD, σ A µ B, σ B = 2 i=1 ) ( ), σ A,B = σa,b 1,..., σ8 A,B [ 8 i=1 [( µ δab i i = A µ i B ( µ i A µ i B ( 1 + δ ln 2 i ) ] 1/2 AB 1 δab i, ) 2 ( + 2 σ i A σb) i 2 ] 1/2 ) 2 ( ) + 2 σ i A + σb i 2 38
43 Dimensionality reduction for visualization Step 1. Calculate all pairs of GDs proximity matrix [D ij ] Step 2. Plot points arbitrarily in 2D Euclidean space Step 3. Calculate Euclidean proximity matrix [E ij ] Step 4. Minimize i,j (D ij E ij ) 2 Multidimensional scaling Step 5. Plot final configuration 39
44 Confinement visualization Tokamaks Confinement regime Euclidean no errors GD with errors Verdoolaege et al., Fusion Sci. Technol., 62, pp ,
45 ELM behavior Verdoolaege et al., Rev. Sci. Instrum., 83, art. no. 10D715,
46 k-nearest neighbor classification Confinement mode identification Training: 5%, testing: 95% k = 1: nearest neighbor Correct classification rates (%) Mode Euclidean GD GD with w/o errors with errors random errors L H
47 Outline 1 Data Science 2 Bayesian methods and integrated analysis Motivation Bayesian methodology Integrated data analysis Example applications 3 Pattern recognition in stochastic data sets Probabilistic manifolds Confinement regime visualization and identification Scaling laws Disruption prediction 4 Conclusion and outlook 43
48 Geodesic regression p y p x Minimize sum of squared GD Use geodesic centroid 44
49 Synthetic data Original Data Simple regression Errors in variables Geodesic regression y x 45
50 Regression results ITPA global confinement: 10 0 Simple regression Geodesic regression τ E (s) τ reg (s) R 2 OLS = 0.44 R2 EIV = 0.52 R2 GR = 0.71 GR yields full probability distributions Precise error estimates are not required, but may improve estimates 46
51 Outline 1 Data Science 2 Bayesian methods and integrated analysis Motivation Bayesian methodology Integrated data analysis Example applications 3 Pattern recognition in stochastic data sets Probabilistic manifolds Confinement regime visualization and identification Scaling laws Disruption prediction 4 Conclusion and outlook 47
52 Wavelet distributions 1D/2D discrete wavelet transform (DWT) Retain time information Spectral energy distribution identifies signal/image Histograms: zero-mean + heavy-tailed Generalized Gaussian distribution (GGD): p(x α, β) = [ ( ) ] β x β 2αΓ(1/β) exp : features α and β α 48
53 Geodesic distance for GGDs Gaussian (β = 2): GD[N (0, σ 1 ) N (0, σ 2 )] = 2 ln ( α2 α 1 ) Laplace (β = 1): GD[L(0, α 1 ) L(0, α 2 )] = ln ( α2 α 1 ) 49
54 Experiment 1: setup (1) JET campaigns C15 C20 Disruptive shots: 334 Known time of disruption (ToD) Time series for 13 plasma variables Sliding window: 30 ms Regular features: until 1 s before ToD Disruptive features: from 210 ms before ToD Similar to APODIS: Rattá et al., Nucl. Fusion 50, ,
55 Experiment 1: setup (2) Fourier vs. wavelet: Fourier Power spectrum Standard deviation Euclidean and GD Wavelet Detail coefficients Daubechies 4, 3 scales Laplace α GD k-nearest neighbor classifier Training: 65%, testing: 35% Reproduce 20 times 51
56 Experiment 1: results TPR FPR MA FA SR AVG True positive rate False positive rate Missed alarms False alarms Success rate (= 1 (MA + FA)) Average detection time before ToD Performance Fourier Fourier Wavelet measure Euclidean GD GD TPR (%) 76.5 ± ± ± 1.3 FPR (%) 17.4 ± ± ± 0.7 MA (%) 0.6 ± ± ± 0.7 FA (%) 48.0 ± ± ± 2.9 SR (%) 51.4 ± ± ± 2.8 AVG (ms) 61.0 ± ± ± 3.2 Verdoolaege et al., Fusion Sci. Technol., 62, pp , 2012 Verdoolaege et al., SOFT
57 Experiment 2: generalization Generalize to campaigns C21 C27 53
58 Disruptive trajectory Landmark MDS: real-time projections Visual tool in control room JET # disrupted at s due to NTM: 54
59 Outline 1 Data Science 2 Bayesian methods and integrated analysis Motivation Bayesian methodology Integrated data analysis Example applications 3 Pattern recognition in stochastic data sets Probabilistic manifolds Confinement regime visualization and identification Scaling laws Disruption prediction 4 Conclusion and outlook 55
60 Conclusion Emerging field of data science Huge potential for probability theory and pattern recognition in fusion: Physics studies Plasma control Quantification and reduction of uncertainties Probability distributions are maximally informative Full probability structure determines patterns Patterns reflect physics 56
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationA Statistical Framework for Operational Infrasound Monitoring
A Statistical Framework for Operational Infrasound Monitoring Stephen J. Arrowsmith Rod W. Whitaker LA-UR 11-03040 The views expressed here do not necessarily reflect the views of the United States Government,
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationStatistics, Data Mining and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data. and Alex Gray
Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data Željko Ivezić, Andrew J. Connolly, Jacob T. VanderPlas University of Washington and Alex
More informationMachine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC
Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain
More informationTutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationSection 5. Stan for Big Data. Bob Carpenter. Columbia University
Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model
More informationModel-based Synthesis. Tony O Hagan
Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationTHE MULTIVARIATE ANALYSIS RESEARCH GROUP. Carles M Cuadras Departament d Estadística Facultat de Biologia Universitat de Barcelona
THE MULTIVARIATE ANALYSIS RESEARCH GROUP Carles M Cuadras Departament d Estadística Facultat de Biologia Universitat de Barcelona The set of statistical methods known as Multivariate Analysis covers a
More informationCalculation of Minimum Distances. Minimum Distance to Means. Σi i = 1
Minimum Distance to Means Similar to Parallelepiped classifier, but instead of bounding areas, the user supplies spectral class means in n-dimensional space and the algorithm calculates the distance between
More informationGaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
More informationReal-time Visual Tracker by Stream Processing
Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationA Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method
More informationLocal Electron Thermal Transport in the MST Reversed-Field Pinch
Local Electron Thermal Transport in the MST Reversed-Field Pinch T.M. Biewer,, J.K., B.E. Chapman, N.E. Lanier,, S.R. Castillo, D.J. Den Hartog,, and C.B. Forest University of Wisconsin-Madison Recent
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationIntroduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization
Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Wolfram Burgard, Maren Bennewitz, Diego Tipaldi, Luciano Spinello 1 Motivation Recall: Discrete filter Discretize
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationMeasurement and Simulation of Electron Thermal Transport in the MST Reversed-Field Pinch
1 EX/P3-17 Measurement and Simulation of Electron Thermal Transport in the MST Reversed-Field Pinch D. J. Den Hartog 1,2, J. A. Reusch 1, J. K. Anderson 1, F. Ebrahimi 1,2,*, C. B. Forest 1,2 D. D. Schnack
More informationBayesian Image Super-Resolution
Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More information11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationTracking in flussi video 3D. Ing. Samuele Salti
Seminari XXIII ciclo Tracking in flussi video 3D Ing. Tutors: Prof. Tullio Salmon Cinotti Prof. Luigi Di Stefano The Tracking problem Detection Object model, Track initiation, Track termination, Tracking
More informationExploiting A Constellation of Narrowband RF Sensors to Detect and Track Moving Targets
Exploiting A Constellation of Narrowband RF Sensors to Detect and Track Moving Targets Chris Kreucher a, J. Webster Stayman b, Ben Shapo a, and Mark Stuff c a Integrity Applications Incorporated 900 Victors
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
More informationContinuous Time Bayesian Networks for Inferring Users Presence and Activities with Extensions for Modeling and Evaluation
Continuous Time Bayesian Networks for Inferring Users Presence and Activities with Extensions for Modeling and Evaluation Uri Nodelman 1 Eric Horvitz Microsoft Research One Microsoft Way Redmond, WA 98052
More informationTracking and Recognition in Sports Videos
Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationStatistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationA Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationProposal 1: Model-Based Control Method for Discrete-Parts machining processes
Proposal 1: Model-Based Control Method for Discrete-Parts machining processes Proposed Objective: The proposed objective is to apply and extend the techniques from continuousprocessing industries to create
More informationKristine L. Bell and Harry L. Van Trees. Center of Excellence in C 3 I George Mason University Fairfax, VA 22030-4444, USA kbell@gmu.edu, hlv@gmu.
POSERIOR CRAMÉR-RAO BOUND FOR RACKING ARGE BEARING Kristine L. Bell and Harry L. Van rees Center of Excellence in C 3 I George Mason University Fairfax, VA 22030-4444, USA bell@gmu.edu, hlv@gmu.edu ABSRAC
More informationValidation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT
Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software
More informationData Science at U of U
Data Science at U of U Je M. Phillips Assistant Professor, School of Computing Center for Extreme Data Management, Analysis, and Visualization Director, Data Management and Analysis Track University of
More informationPS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationA Simple Feature Extraction Technique of a Pattern By Hopfield Network
A Simple Feature Extraction Technique of a Pattern By Hopfield Network A.Nag!, S. Biswas *, D. Sarkar *, P.P. Sarkar *, B. Gupta **! Academy of Technology, Hoogly - 722 *USIC, University of Kalyani, Kalyani
More informationOutline. Multitemporal high-resolution image classification
IGARSS-2011 Vancouver, Canada, July 24-29, 29, 2011 Multitemporal Region-Based Classification of High-Resolution Images by Markov Random Fields and Multiscale Segmentation Gabriele Moser Sebastiano B.
More informationMaster s thesis tutorial: part III
for the Autonomous Compliant Research group Tinne De Laet, Wilm Decré, Diederik Verscheure Katholieke Universiteit Leuven, Department of Mechanical Engineering, PMA Division 30 oktober 2006 Outline General
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationDetector-related. related software development in the HEPP project. Are Strandlie Gjøvik University College and University of Oslo
Detector-related related software development in the HEPP project Are Strandlie Gjøvik University College and University of Oslo Outline Introduction The ATLAS New Tracking project HEPP contributions Summary
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationDetection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationCell Phone based Activity Detection using Markov Logic Network
Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart
More informationPHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS
PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUM OF REFERENCE SYMBOLS Benjamin R. Wiederholt The MITRE Corporation Bedford, MA and Mario A. Blanco The MITRE
More information3F3: Signal and Pattern Processing
3F3: Signal and Pattern Processing Lecture 3: Classification Zoubin Ghahramani zoubin@eng.cam.ac.uk Department of Engineering University of Cambridge Lent Term Classification We will represent data by
More informationJPEG compression of monochrome 2D-barcode images using DCT coefficient distributions
Edith Cowan University Research Online ECU Publications Pre. JPEG compression of monochrome D-barcode images using DCT coefficient distributions Keng Teong Tan Hong Kong Baptist University Douglas Chai
More informationCS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationBig Data, Statistics, and the Internet
Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes
More informationMachine Learning in Statistical Arbitrage
Machine Learning in Statistical Arbitrage Xing Fu, Avinash Patra December 11, 2009 Abstract We apply machine learning methods to obtain an index arbitrage strategy. In particular, we employ linear regression
More informationVEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS
VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS Aswin C Sankaranayanan, Qinfen Zheng, Rama Chellappa University of Maryland College Park, MD - 277 {aswch, qinfen, rama}@cfar.umd.edu Volkan Cevher, James
More informationProbing Dark Energy with Baryon Acoustic Oscillations from Future Large Galaxy Redshift Surveys
Probing Dark Energy with Baryon Acoustic Oscillations from Future Large Galaxy Redshift Surveys Hee-Jong Seo (Steward Observatory) Daniel J. Eisenstein (Steward Observatory) Martin White, Edwin Sirko,
More informationA Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad
More informationA Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails
12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationTracking Groups of Pedestrians in Video Sequences
Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal
More informationTail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification
Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Presented by Work done with Roland Bürgi and Roger Iles New Views on Extreme Events: Coupled Networks, Dragon
More informationEvaluation of Machine Learning Techniques for Green Energy Prediction
arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques
More informationNon-Inductive Startup and Flux Compression in the Pegasus Toroidal Experiment
Non-Inductive Startup and Flux Compression in the Pegasus Toroidal Experiment John B. O Bryan University of Wisconsin Madison NIMROD Team Meeting July 31, 2009 Outline 1 Introduction and Motivation 2 Modeling
More informationNeural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationPrivate Equity Fund Valuation and Systematic Risk
An Equilibrium Approach and Empirical Evidence Axel Buchner 1, Christoph Kaserer 2, Niklas Wagner 3 Santa Clara University, March 3th 29 1 Munich University of Technology 2 Munich University of Technology
More informationWeb-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More information99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm
Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the
More informationNEURAL NETWORKS A Comprehensive Foundation
NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments
More informationFunctional Data Analysis of MALDI TOF Protein Spectra
Functional Data Analysis of MALDI TOF Protein Spectra Dean Billheimer dean.billheimer@vanderbilt.edu. Department of Biostatistics Vanderbilt University Vanderbilt Ingram Cancer Center FDA for MALDI TOF
More informationProbabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References
More informationPrediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationSupplement to Call Centers with Delay Information: Models and Insights
Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290
More informationExponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009
Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths
More information