Searching for Gravitational Waves from the Coalescence of High Mass Black Hole Binaries 2015 SURE Presentation September 22 nd, 2015 Lau Ka Tung Department of Physics, The Chinese University of Hong Kong Mentors: Surabhi Sachdev, Tjonnie Li, Kent Blackburn, Alan Weinstein LIGO Laboratory, California Institute of Technology LIGO Scientific Collaboration 1
Objectives Improve signal to noise discrimination using machine learning LIGO Scientific Collaboration 2
Machine Learning The computer is presented with example inputs and their desired outputs, given by a teacher, and the goal is to learn a general rule that maps inputs to outputs. The essence of machine learning A pattern exist. We cannot pin it down mathematically. We have data on it. 3
Example of machine learning Question: 1 or 5? Features: Intensity, symmetry Training: Intensity Symmetry 1 or 5 3.3 0.5 5 0.8 4.5 1 5.6 0.3 5 Classifier: Evaluation: Intensity Symmetry 1 or 5 0.6 3.8? 2.7 1.2? 1 5 4
Use machine learning for signal to noise discrimination Question: Signal (1) or noise (0)? Features: Training: mass1 SNR 0/1 1.5 18.5 0.9 1 15.7 35.6 3.5 1 7.2 5.4 4.2 0 Classifier: Evaluation: Background Signal mass1 SNR 0/1 27.8 37.3 3.8? 3.4 6.5 6.7? 5
gstlal pipeline 36
Work flow of machine learning in ranking 7
Training data Signals: Simulated signal injections are used to train as signals Background: Construct coincident triggers from single detector triggers to train as background H1 t m1 m2 s1 s2 1.3 4.2 3.6-0.2-0.3 3.7 5.3 1.9-0.1 0.1 4.2 10.3 6.4 0.8 0.0 4.9 27.3 17.5 0.1 0.4 5.8 3.2 3.1-0.4-0.9 7.4 8.9 6.4 0.2 0.3 L1 t m1 m2 s1 s2 0.3 8.7 3.4-0.4-0.5 1.6 3.2 3.1-0.4-0.9 2.9 6.9 3.2 0.2-0.6 3.2 15.7 12.4 0.8 0.9 4.5 5.3 1.9-0.1 0.1 6.4 35.4 18.5-0.3-0.2 8
Learning algorithm (Classifier) Artificial Neural Network Support Vector Machine Random Forest 9
Decision Tree Training setevent >4 Find a feature and threshold to optimize some criterion Feature vector Split set Leaf >8 Leaf Splitting no longer optimize criterion/number of events at node < l Leaf contains signals, background 10
Random Forest of Bootstrap aggregated Decision Tree (RFBDT) algorithm 1 Decision Tree: weak classifier Many (forest): good classifier Bootstrap AGGregatING (Bagging) Each tree trains with subset of whole data set. 11
Ranking statistic from RFBDT Probability of event given signal signal: 8 background: 3 p = 0.73 12
Tuning of RFBDT Parameter Features Description Characteristic of an event. Number of decision trees The number of decision trees in a forest. Number of sampled features The number of sampled parameters which are chosen randomly to from a subset of original feature vector. Minimal entires per leaf When the number of events in a node reaches the minimum leaf size, the data stop from splitting into two nodes and the node becomes a leaf. 13
Binary Classification Classification False Alarm Probability: Class 1 False Positive (FP) True Positive (TP) threshold eg. 0.4 Class 0 True Negative (TN) False Negative (FN) True Positive Probability (Efficiency): Background (Class 0) True Class Signal (Class 1) 14
Expand feature space by transforming to features with physical meaning 15
9 Features: Comparison using ROC curve 12 Features: 14 Features: 16 Features: 16
Additional tuning num of trees min entries per leaf Num of trees 100 Num of sampled features 4 Min entries per leaf 5 Optimization criterion num of sampled features optimization criterion Gini index 17
Pipeline for calculating likelihood from RFBDT 18
Future work Compare performance of RFBDT ranking with likelihood-ratio ranking currently use in gstlal Systematic way to select features e.g. RELIEFF Principal component analysis (PCA): extract linear transform of original features Choose RFBDT options by validation automatically in pipeline Include data quality information from other channels 19
Acknowledgements I would like to thank my mentors: Tjonnie, Alan, Surabhi, Kent I would like to thank Prof. Chu LIGO Scientific Collaboration Caltech SURF NSF Department of Physics, CUHK 20
Backup Slides 21
Small in-sample error, huge out of sample error data: 2nd order polynomial + noise fit the data with 10th order polynomial Overtraining/overfitting fitting curve passed through all points, in-sample error = 0 out of sample error is huge poor generalization 22
Optimization Criterion: Gini index Gini index is large when we have ~equal amount of signals and background at node Classification: want only signal/background (1 class) at node => minimize Gini index 23
Tunable options in RFBDT 24
25
26
27