ealtime Cloud Screening with VIRIS-NG



Similar documents
Learning from Big Data in

Conquering the Astronomical Data Flood through Machine

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

LSST and the Cloud: Astro Collaboration in 2016 Tim Axelrod LSST Data Management Scientist

Knowledge-based systems and the need for learning

Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Introduction to Data Mining

Specific Usage of Visual Data Analysis Techniques

The Tonnabytes Big Data Challenge: Transforming Science and Education. Kirk Borne George Mason University

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA

Adaptive Anomaly Detection for Network Security

Migration Scenario: Migrating Batch Processes to the AWS Cloud

New development of automation for agricultural machinery

Social Media Mining. Data Mining Essentials

Final Project Report

Introduction to Data Mining

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

User Authentication/Identification From Web Browsing Behavior

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM

Machine Learning: Overview

Intelligent Heuristic Construction with Active Learning

Transform-based Domain Adaptation for Big Data

SIERRA COLLEGE OBSERVATIONAL ASTRONOMY LABORATORY EXERCISE NUMBER III.F.a. TITLE: ASTEROID ASTROMETRY: BLINK IDENTIFICATION

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Pallas Ludens. We inject human intelligence precisely where automation fails. Jonas Andrulis, Daniel Kondermann

Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning

Distributed forests for MapReduce-based machine learning

Einstein Rings: Nature s Gravitational Lenses

Pattern Insight Clone Detection

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

Theme 1 Software Processes. Software Configuration Management

Probabilistic topic models for sentiment analysis on the Web

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

The Cyber Threat Profiler

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

The Scientific Data Mining Process

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Information Management course

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing



Copyright 1980 by the Association for Supervision and Curriculum Development. All rights reserved.




Galaxy Morphological Classification

The Minor Planet Center Status Report

Big Data Challenges for Large Radio Arrays

False alarm in outdoor environments

Introduction to Pattern Recognition

Data Mining Applications in Higher Education

Automatic Labeling of Lane Markings for Autonomous Vehicles

Detecting client-side e-banking fraud using a heuristic model

Data Mining and Machine Learning in Bioinformatics

Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information

A Survey on Product Aspect Ranking

Summary of Data Management Principles Dark Energy Survey V2.1, 7/16/15

Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar anoop/

Online Semi-Supervised Learning

Linear Threshold Units

Foundations of Artificial Intelligence. Introduction to Data Mining

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Machine Learning.

Real-time Monitoring Platform to Improve the Drinking Water Network Efficiency

Data Mining Part 5. Prediction

People centric sensing Leveraging mobile technologies to infer human activities. Applications. History of Sensing Platforms

NASA s Software Architecture Review Board s (SARB) Findings from the Review of GSFC s core Flight Executive/Core Flight Software (cfe/cfs)

Application of Predictive Analytics for Better Alignment of Business and IT

Maschinelles Lernen mit MATLAB

Big Data Era Challenges and Opportunities in Astronomy: How SOM/LVQ and Related Learning Methods Can Contribute?

Sensor Integration in the Security Domain

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Domain Classification of Technical Terms Using the Web

B A S I C S C I E N C E S

DATA MANAGEMENT FOR THE INTERNET OF THINGS

Bootstrapping Big Data

1 What is Machine Learning?

EXAM PREPARATION GUIDE

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

BIG. Big Data Analysis John Domingue (STI International and The Open University) Big Data Public Private Forum

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Transcription:

pyright 2013 California Institute of Technology. Jet Propulsion Laboratory California Institute of Technology Leveraging Annotated Archival Data with Domain Adaptation to Improve Data Triage in Optical Astronomy ealtime Cloud Screening with VIRIS-NG Brian D. Bue, Umaa D. Rebbapragada vid R. Thompson, Robert O. Green, Didier Keymeulen, rah K. Machine Lundeen, Learning Yasha and Mouradi, Instrument Daniel Autonomy Nunes, Group Rebecca staño, Jet Propulsion Steve A. Chien Laboratory, California Institute of Technology Tools for Astronomical Big Data Tucson, AZ. March 9, 2015 Propulsion Laboratory California Institute of Technology Copyright 2014-2015 California Institute of Technology. All Rights Reserved. Government sponsorship acknowledged.

Automated Triage of Astronomical Data Data Stream Candidate Detections Data Triage (Classifier) Human Follow-up N~10e22+ f: X! Y X: extracted features Y: {+1:REAL, -1:BOGUS} N~10e6+ N~100 N~10 2

Supervised Learning for Data Triage Training Data (labeled) (X, Y ) train = {(x i,y i )} N i=1 x i 2 R n (features) y i 2 {real, bogus} Test Data (unlabeled) (X, Y ) test = {(x i,y i )} M x i 2 R n i=1 y i unknown Trained Classifier f(x)! y Test Predictions {y i } M i=1 Follow-up Real Bogus 3

Assumption: (X,Y ) train, (X,Y ) test drawn i.i.d. from the same distribution P(X,Y ) Violated when measurement conditions change Due to: sensor modifications / changes to imaging pipeline How can we use existing labeled data when such changes occur? Applications Bootstrap detection systems for new surveys using old data Provide data continuity for ongoing surveys 4

Domain Adaptation Proposal: use domain adaptation to adapt data/classifiers from earlier surveys with similar science goals Given: labeled source (training) samples D S P S (X,Y ) (mostly) unlabeled target (test) samples D T P T (X,Y ) Compute: mapping between source and target feature spaces Source Domain Feature Space Target Domain Feature Space Common Feature Space Unlabeled Sample Labeled Sample 5

Case study: Transient Detection with the Intermediate Palomar Transient Factory (iptf) Fully-automated, wide-field optical transient survey Focus on supernovae (short duration), variable stars (persistent) iptf succeeded Palomar Transient Factory (PTF) in January 2013 Palomar P48 + Large field camera captures 7.8 deg 2 on the sky Detects 500K 1M candidate sources per night 0.1% of all point source detections real transients http://ptf.caltech.edu/iptf/ 6

iptf Candidate Detection + Classification Point source detection: via image subtraction pipeline New Image Reference Image Subtraction Real-Bogus classifier: (Bloom et al., 2011; Brink et al., 2012) Classifies point source detections; prioritizes high-confidence detections for human review/confirmation Trained using PTF data iptf upgrade changed image processing pipeline produced suboptimal predictions on new iptf data 7

Feature drift due to iptf pipeline upgrade 1.0 Correlation 0.8 0.6 0.4 0.2 0.0 a_image a_ref b_image b_ref ellipticity ellipticity_ref flux flux_err fwhm lmt_mg lmt_mg_ref mag mag_err mag_limit mag_ref mag_ref_err medsky medsky_ref n2sig3 n2sig5 n3sig3 n3sig5 norm_fwhm norm_fwhm_ref objs_ext objs_sav seeing seeing_ratio seeing_ref skysig skysig_ref Correlation between PTF and iptf features for real (spectroscopically-verified) transient sources 8

Experiment: PTF iptf Transient Detection Goal: adapt PTF-trained classifier to iptf domain Proof of concept for future surveys PTF/iPTF Zwicky Transient Facility (ZTF, 2017) Validation Data PTF: 37,028 verified reals, 39,613 bogus iptf: 18,433 verified reals, 19,000 bogus Features (31 total) candidate: mag, mag_err, flux, flux_err, a_image, b_image, fwhm, mag_ref, mag_ref_err, a_ref, b_ref, n2sig3, n3sig3, n2sig5, n3sig5 subtraction: objs_extracted, objs_saved, lmt_mg_new, lmt_mg_ref, medsky_new, medsky_ref, seeing_new, seeing_ref, skysig_new, skysig_ref misc: mag_from_limit, ellipticity, ellipticity_ref, normalized_fwhm, normalized_fwhm_ref, seeing_ratio 9

PTF iptf Algorithm Comparison Geodesic Flow Kernel (GFK, Gong et al., CVPR12) Computes domain-invariant features from intermediate subspaces between source & target domains Unsupervised: requires no labeled target data Co-Training for Domain Adaptation (CODA, Chen et al., NIPS11) Partitions feature space, trains a classifier for each partition Adds high-confidence predictions on target data to training set Learns informative features across domains x: xa: xb: Source Target Figure credit: Gong et al., 2012 Figure credit: Chen et al., 2011 f(xa) f(xb) 10

PTF iptf Prediction Accuracy 11

Summary Automated transient detection accuracy suffers when training/test distributions differ Domain adaptation can help GFK and CODA show 5-20% improvements in accuracy over baseline Ongoing/future work Diagnose iptf image processing pipeline artifacts Use PTF/iPTF data to bootstrap Real-Bogus classifier for Zwicky Transient Facility bbue@jpl.nasa.gov http://imbue.jpl.nasa.gov 12