Commentary on Techniques for Massive- Data Machine Learning in Astronomy

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Commentary on Techniques for Massive- Data Machine Learning in Astronomy"

Transcription

1 1 of 24 Commentary on Techniques for Massive- Data Machine Learning in Astronomy Nick Ball Herzberg Institute of Astrophysics Victoria, Canada

2 The Problem 2 of 24 Astronomy faces enormous datasets Their size, dimensionality, and complexity require intelligent, automated investigation Exponential increase in data size: algorithms cannot scale worse than O(N log N) Most data mining algorithms naïvely scale as N 2 or worse

3 The Solution 3 of 24 Make data mining algorithms that scale as N log N! (or better) May have to compromise accuracy slightly Deploy them so that astronomers are willing and able to use them They must work on real astronomical data

4 Collaboration is Vital 4 of 24 Successful use of astrostatistics and data mining requires expertise in computer science, statistics, and astronomy Collaboration enables novelty that would not arise from a single group So, computer scientists supplying algorithms in this way is excellent

5 But 5 of expertise in computer science, statistics, and astronomy Successful collaborations have involved astronomers who are experts in computing/statistics, or who are working closely and over time with these experts

6 And 6 of 24 Astronomy data are messy: - Large, complex, increasingly high-dimensional, timedomain - Missing data: non-observation or non-detection - Heteroscedastic, non-gaussian, underestimated errors - Outliers, artifacts, false detections, systematic effects - Correlated inputs - Etc.

7 An Example 7 of 24 How do you apply astrostatistics and fast algorithms to this?

8

9 The Next Generation Virgo Cluster Survey 9 of 24 10σ point source limiting magnitude g = 25.7 (faint!) Photometric (few spectra), ~100 deg 2, 5 bands (ugriz, like Sloan) galaxies, 2.6 terabytes data 40 people at at 23 institutions in Canada, France, etc. (PI Laura HIA)

10 Virgo is an actual cluster of galaxies, the nearest large one to us

11 NGVS Statistical Challenges 11 of 24 Object detection and classification Photometric redshifts (photo-z) Virgo cluster membership / background Missing data Field-to-field variation Multi-wavelength data Completeness(mag, SB, etc. etc.)

12 Object detection: low surface brightness galaxies

13 13 of 24 Cluster membership: photometric redshift using k nearest neighbours

14 14 of 24 Missing data: NGVS fields (not final) don t all contain all 5 bands ugriz

15 Multi-wavelength data

16 Canadian Astronomy Data Centre CADC is one of the world s largest astronomy data centres ~500 terabytes of data (will grow to petabytes) Uses Virtual Observatory standards Staffed by astronomers and computer specialists, but not statisticians 16 of 24

17 CANFAR 17 of 24 Canadian Advanced Network for Astronomical Research, at CADC Combines cluster job scheduling with cloud computing resources Users manage their own virtual machines

18 So 18 of 24 Put fast data mining tools on the CANFAR infrastructure... but early days, not much to say yet

19 Guide to Data Mining in Astronomy 19 of 24 Virtual Observatory KDD-IG guide: IvoaKDDguide Emphasizes data mining, which is part of astroinformatics But this overlaps with astrostatistics -> potential outreach channel to wider community

20 knn Quasar Photometric Redshifts 20 of 24 Use kd-tree for fast knn assignment of photo-zs to Sloan Digital Sky Survey quasars Single neighbour, perturb input features to make a PDF in redshift Removing multi-peaked PDFs removes almost all catastrophic outliers

21 knn Quasar Photometric Redshifts 21 of z mean = z spec 20

22 knn Quasar Photometric Redshifts 22 of z one peak = z spec 20

23 Questions 23 of 24 Can we overcome the problems of real data? Will there be data of high intrinsic dimension? Will astronomers be able to deploy the algorithms? Where do GPUs fit? (GPU+brute force may be just as fast?)

24 Conclusions 24 of 24 Provided the data can be suitably prepared, and the science-driven usage of the algorithm intelligently motivated, the fast algorithms presented here have excellent potential for advancing astronomical research

Learning from Big Data in

Learning from Big Data in Learning from Big Data in Astronomy an overview Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ From traditional astronomy 2 to Big Data

More information

Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research

Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research Astrophysics with Terabyte Datasets Alex Szalay, JHU and Jim Gray, Microsoft Research Living in an Exponential World Astronomers have a few hundred TB now 1 pixel (byte) / sq arc second ~ 4TB Multi-spectral,

More information

Data Mining Challenges and Opportunities in Astronomy

Data Mining Challenges and Opportunities in Astronomy Data Mining Challenges and Opportunities in Astronomy S. G. Djorgovski (Caltech) With special thanks to R. Brunner, A. Szalay, A. Mahabal, et al. The Punchline: Astronomy has become an immensely datarich

More information

Conquering the Astronomical Data Flood through Machine

Conquering the Astronomical Data Flood through Machine Conquering the Astronomical Data Flood through Machine Learning and Citizen Science Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ The Problem:

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Efficient data reduction and analysis of DECam images using multicore architecture Poor man s approach to Big data

Efficient data reduction and analysis of DECam images using multicore architecture Poor man s approach to Big data Efficient data reduction and analysis of DECam images using multicore architecture Poor man s approach to Big data Instituto de Astrofísica Pontificia Universidad Católica de Chile Thomas Puzia, Maren

More information

Data analysis of L2-L3 products

Data analysis of L2-L3 products Data analysis of L2-L3 products Emmanuel Gangler UBP Clermont-Ferrand (France) Emmanuel Gangler BIDS 14 1/13 Data management is a pillar of the project : L3 Telescope Caméra Data Management Outreach L1

More information

A Preliminary Summary of The VLA Sky Survey

A Preliminary Summary of The VLA Sky Survey A Preliminary Summary of The VLA Sky Survey Eric J. Murphy and Stefi Baum (On behalf of the entire Science Survey Group) 1 Executive Summary After months of critical deliberation, the Survey Science Group

More information

The Sloan Digital Sky Survey. From Big Data to Big Database to Big Compute. Heidi Newberg Rensselaer Polytechnic Institute

The Sloan Digital Sky Survey. From Big Data to Big Database to Big Compute. Heidi Newberg Rensselaer Polytechnic Institute The Sloan Digital Sky Survey From Big Data to Big Database to Big Compute Heidi Newberg Rensselaer Polytechnic Institute Summary History of the data deluge from a personal perspective. The transformation

More information

Data Literacy For All: Astrophysics and Beyond (Astronomy is evidence-based forensic science, thus it is a data & information science)

Data Literacy For All: Astrophysics and Beyond (Astronomy is evidence-based forensic science, thus it is a data & information science) Data Literacy For All: Astrophysics and Beyond (Astronomy is evidence-based forensic science, thus it is a data & information science) Kirk Borne George Mason University, Fairfax, VA www.kirkborne.net

More information

Visualization of Large Multi-Dimensional Datasets

Visualization of Large Multi-Dimensional Datasets ***TITLE*** ASP Conference Series, Vol. ***VOLUME***, ***PUBLICATION YEAR*** ***EDITORS*** Visualization of Large Multi-Dimensional Datasets Joel Welling Department of Statistics, Carnegie Mellon University,

More information

Cosmic Variability Study in Taiwan

Cosmic Variability Study in Taiwan Cosmic Variability Study in Taiwan Wen-Ping Chen Institute of Astronomy National Central University, Taiwan 2010 November 16@Jena/YETI Advantages in Taiwan: - Many high mountains - Western Pacific longitude

More information

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014 Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Virtual Observatory tools for the detection of T dwarfs. Enrique Solano, LAEFF / SVO Eduardo Martín, J.A. Caballero, IAC

Virtual Observatory tools for the detection of T dwarfs. Enrique Solano, LAEFF / SVO Eduardo Martín, J.A. Caballero, IAC Virtual Observatory tools for the detection of T dwarfs Enrique Solano, LAEFF / SVO Eduardo Martín, J.A. Caballero, IAC T dwarfs Low-mass (60-13 MJup), low-temperature (< 1300-1500 K), low-luminosity brown

More information

1 About the Book and Supporting Material. This chapter introduces terminology and nomenclature, reviews a few relevant

1 About the Book and Supporting Material. This chapter introduces terminology and nomenclature, reviews a few relevant 1 About the Book and Supporting Material Even the longest journey starts with the first step. (Lao-tzu paraphrased) This chapter introduces terminology and nomenclature, reviews a few relevant contemporary

More information

Virtual Observatories A New Era for Astronomy. Reinaldo R. de Carvalho DAS-INPE/MCT 2010

Virtual Observatories A New Era for Astronomy. Reinaldo R. de Carvalho DAS-INPE/MCT 2010 Virtual Observatories Virtual Observatories 1%%&'&$#-&6!&9:#,*3),!#,6!6#$C!&,&$D2 *:#%&+-3;& D&);&-$2!!"! "!" &,&$D2 %),-&,-!"#$%&'&#()*! $#%&!(!!! $ '!%&$ $! (% %)'6!6#$C!;#--&$G $! '!!! $#63#-3),G $!

More information

Canadian Astronomy Data Centre. Séverin Gaudet David Schade Canadian Astronomy Data Centre

Canadian Astronomy Data Centre. Séverin Gaudet David Schade Canadian Astronomy Data Centre Canadian Astronomy Data Centre Séverin Gaudet David Schade Canadian Astronomy Data Centre Data Activities in Astronomy Features of the astronomy data landscape Multi-wavelength datasets are increasingly

More information

Galaxy Morphological Classification

Galaxy Morphological Classification Galaxy Morphological Classification Jordan Duprey and James Kolano Abstract To solve the issue of galaxy morphological classification according to a classification scheme modelled off of the Hubble Sequence,

More information

Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy

Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy Astronomical Data Analysis Software and Systems XIV ASP Conference Series, Vol. XXX, 2005 P. L. Shopbell, M. C. Britton, and R. Ebert, eds. P2.1.25 Making the Most of Missing Values: Object Clustering

More information

Software challenges in the implementation of large surveys: the case of J-PAS

Software challenges in the implementation of large surveys: the case of J-PAS Software challenges in the implementation of large surveys: the case of J-PAS 1/21 Paulo Penteado - IAG/USP pp.penteado@gmail.com http://www.ppenteado.net/ast/pp_lsst_201204.pdf (K. Taylor) (A. Fernández-Soto)

More information

The Virtual Observatory: What is it and how can it help me? Enrique Solano LAEFF / INTA Spanish Virtual Observatory

The Virtual Observatory: What is it and how can it help me? Enrique Solano LAEFF / INTA Spanish Virtual Observatory The Virtual Observatory: What is it and how can it help me? Enrique Solano LAEFF / INTA Spanish Virtual Observatory Astronomy in the XXI century The Internet revolution (the dot com boom ) has transformed

More information

Migrating a (Large) Science Database to the Cloud

Migrating a (Large) Science Database to the Cloud The Sloan Digital Sky Survey Migrating a (Large) Science Database to the Cloud Ani Thakar Alex Szalay Center for Astrophysical Sciences and Institute for Data Intensive Engineering and Science (IDIES)

More information

Peter Quinn, Director, Astrophysical Virtual Observatory; European Southern Observatory, Garching, Germany

Peter Quinn, Director, Astrophysical Virtual Observatory; European Southern Observatory, Garching, Germany The International Virtual Observatory Robert Hanisch, Project Manager, US National Virtual Observatory, and Chair, International Virtual Observatory Alliance; Space Telescope Science Institute, Baltimore,

More information

The Tonnabytes Big Data Challenge: Transforming Science and Education. Kirk Borne George Mason University

The Tonnabytes Big Data Challenge: Transforming Science and Education. Kirk Borne George Mason University The Tonnabytes Big Data Challenge: Transforming Science and Education Kirk Borne George Mason University Ever since we first began to explore our world humans have asked questions and have collected evidence

More information

DAME Astrophysical DAta Mining Mining & & Exploration Exploration GRID

DAME Astrophysical DAta Mining Mining & & Exploration Exploration GRID DAME Astrophysical DAta Mining & Exploration on GRID M. Brescia S. G. Djorgovski G. Longo & DAME Working Group Istituto Nazionale di Astrofisica Astronomical Observatory of Capodimonte, Napoli Department

More information

MANAGING AND MINING THE LSST DATA SETS

MANAGING AND MINING THE LSST DATA SETS MANAGING AND MINING THE LSST DATA SETS Astronomy is undergoing an exciting revolution -- a revolution in the way we probe the universe and the way we answer fundamental questions. New technology enables

More information

Data Mining Techniques in CRM

Data Mining Techniques in CRM Data Mining Techniques in CRM Inside Customer Segmentation Konstantinos Tsiptsis CRM 6- Customer Intelligence Expert, Athens, Greece Antonios Chorianopoulos Data Mining Expert, Athens, Greece WILEY A John

More information

Indiana University Science with the WIYN One Degree Imager

Indiana University Science with the WIYN One Degree Imager Indiana University Science with the WIYN One Degree Imager Katherine Rhode (Indiana University, WIYN SAC member) Indiana University Department of Astronomy Nine faculty members, plus active emeritus faculty

More information

Storm Prediction in a Cloud. Ian Davis, Hadi Hemmati, Ric Holt, Mike Godfrey Douglas Neuse, Serge Mankovskii

Storm Prediction in a Cloud. Ian Davis, Hadi Hemmati, Ric Holt, Mike Godfrey Douglas Neuse, Serge Mankovskii Storm Prediction in a Cloud Ian Davis, Hadi Hemmati, Ric Holt, Mike Godfrey Douglas Neuse, Serge Mankovskii Load Balancing in Clouds The goal / balancing act: Want to maximise delivery of cloud services

More information

MAST: The Mikulski Archive for Space Telescopes

MAST: The Mikulski Archive for Space Telescopes MAST: The Mikulski Archive for Space Telescopes Richard L. White Space Telescope Science Institute 2015 April 1, NRC Space Science Week/CBPSS A model for open access The NASA astrophysics data archives

More information

Data Mining and Pattern Recognition for Large-Scale Scientific Data

Data Mining and Pattern Recognition for Large-Scale Scientific Data Data Mining and Pattern Recognition for Large-Scale Scientific Data Chandrika Kamath Center for Applied Scientific Computing Lawrence Livermore National Laboratory October 15, 1998 We need an effective

More information

Big Analytics: A Next Generation Roadmap

Big Analytics: A Next Generation Roadmap Big Analytics: A Next Generation Roadmap Cloud Developers Summit & Expo: October 1, 2014 Neil Fox, CTO: SoftServe, Inc. 2014 SoftServe, Inc. Remember Life Before The Web? 1994 Even Revolutions Take Time

More information

DATA MINING TECHNIQUES TO CLASSIFY ASTRONOMY OBJECTS

DATA MINING TECHNIQUES TO CLASSIFY ASTRONOMY OBJECTS DATA MINING TECHNIQUES TO CLASSIFY ASTRONOMY OBJECTS Project Report Submitted by V.SUBHASHINI Under the guidance of Dr. Ananthanarayana V. S. Professor and Head Department of Information Technology DEPARTMENT

More information

The Past, Present, and Future of Data Science Education

The Past, Present, and Future of Data Science Education The Past, Present, and Future of Data Science Education Kirk Borne @KirkDBorne http://kirkborne.net George Mason University School of Physics, Astronomy, & Computational Sciences Outline Research and Application

More information

Einstein Rings: Nature s Gravitational Lenses

Einstein Rings: Nature s Gravitational Lenses National Aeronautics and Space Administration Einstein Rings: Nature s Gravitational Lenses Leonidas Moustakas and Adam Bolton Taken from: Hubble 2006 Science Year in Review The full contents of this book

More information

DS6 Phase 4 Napoli group Astroneural 1,0 is available and includes tools for supervised and unsupervised data mining:

DS6 Phase 4 Napoli group Astroneural 1,0 is available and includes tools for supervised and unsupervised data mining: DS6 Phase 4 Napoli group Astroneural 1,0 is available and includes tools for supervised and unsupervised data mining: Preprocessing & visualization Supervised (MLP, RBF) Unsupervised (PPS, NEC+dendrogram,

More information

Lecture 3 The Future of Search and Discovery in Big Data Analytics: Ultrametric Information Spaces

Lecture 3 The Future of Search and Discovery in Big Data Analytics: Ultrametric Information Spaces Lecture 3 The Future of Search and Discovery in Big Data Analytics: Ultrametric Information Spaces Themes 1) Big Data and analytics: the potential for metric (geometric) and ultrametric (topological) analysis.

More information

Data Driven Initiatives in Astronomy and Biology

Data Driven Initiatives in Astronomy and Biology Data Driven Initiatives in Astronomy and Biology An NKN Project By Kaustubh Vaghmare (IUCAA, Pune) 22nd January, 2016 th [4 NKN Workshop, Hyderabad] Partnership with NKN and... IUCAA (Inter-University

More information

Description of the Dark Energy Survey for Astronomers

Description of the Dark Energy Survey for Astronomers Description of the Dark Energy Survey for Astronomers May 1, 2012 Abstract The Dark Energy Survey (DES) will use 525 nights on the CTIO Blanco 4-meter telescope with the new Dark Energy Camera built by

More information

Galaxy Survey data analysis using SDSS-III as an example

Galaxy Survey data analysis using SDSS-III as an example Galaxy Survey data analysis using SDSS-III as an example Will Percival (University of Portsmouth) showing work by the BOSS galaxy clustering working group" Cosmology from Spectroscopic Galaxy Surveys"

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Data Pipelines & Archives for Large Surveys. Peter Nugent (LBNL)

Data Pipelines & Archives for Large Surveys. Peter Nugent (LBNL) Data Pipelines & Archives for Large Surveys Peter Nugent (LBNL) Overview Major Issues facing any large-area survey/search: Computational power for search - data transfer, processing, storage, databases

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

SEYMOUR SLOAN IDEAS THAT MATTER

SEYMOUR SLOAN IDEAS THAT MATTER SEYMOUR SLOAN IDEAS THAT MATTER The value of Big Data: How analytics differentiate winners A DATA DRIVEN FUTURE Big data is fast becoming the term keeping senior executives up at night. The promise of

More information

Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data. and Alex Gray

Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data. and Alex Gray Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data Željko Ivezić, Andrew J. Connolly, Jacob T. VanderPlas University of Washington and Alex

More information

NetApp Big Content Solutions: Agile Infrastructure for Big Data

NetApp Big Content Solutions: Agile Infrastructure for Big Data White Paper NetApp Big Content Solutions: Agile Infrastructure for Big Data Ingo Fuchs, NetApp April 2012 WP-7161 Executive Summary Enterprises are entering a new era of scale, in which the amount of data

More information

Analytics-as-a-Service: From Science to Marketing

Analytics-as-a-Service: From Science to Marketing Analytics-as-a-Service: From Science to Marketing Data Information Knowledge Insights (Discovery & Decisions) Kirk Borne George Mason University, Fairfax, VA www.kirkborne.net @KirkDBorne Big Data: What

More information

Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications

Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications Ching-Wa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers

More information

What is the Sloan Digital Sky Survey?

What is the Sloan Digital Sky Survey? What is the Sloan Digital Sky Survey? Simply put, the Sloan Digital Sky Survey is the most ambitious astronomical survey ever undertaken. The survey will map one-quarter of the entire sky in detail, determining

More information

Astronomical data Mining DAMEWARE and beyond

Astronomical data Mining DAMEWARE and beyond Astronomical data Mining DAMEWARE and beyond Giuseppe Longo Università Federico II Napoli (Italy) M. Brescia INAF OAC G.S. Djorgovski Caltech S. Cavuoti INAF UFII & the DAMEWARE people Astroinformatics

More information

arxiver Dealing with the big data of scientific literature Vanessa Moss and Aidan Hotan

arxiver Dealing with the big data of scientific literature Vanessa Moss and Aidan Hotan arxiver Dealing with the big data of scientific literature Vanessa Moss and Aidan Hotan 1 Why is the literature important? Science is fundamentally built upon previous work - astrophysics is no exception

More information

Hexaware E-book on Predictive Analytics

Hexaware E-book on Predictive Analytics Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,

More information

Taming the Internet of Things: The Lord of the Things

Taming the Internet of Things: The Lord of the Things Taming the Internet of Things: The Lord of the Things Kirk Borne @KirkDBorne School of Physics, Astronomy, & Computational Sciences College of Science, George Mason University, Fairfax, VA Taming the Internet

More information

Australian Virtual Observatory

Australian Virtual Observatory Australian Virtual Observatory International Astronomical Union GA 2003 Joint Discussion 08 17th-18th July 2003 Sydney David Barnes The University of Melbourne Our take on virtual observatories bring legacy

More information

Data Science for Dynamic Data-Driven Application Systems in the Internet of Things (IoT)

Data Science for Dynamic Data-Driven Application Systems in the Internet of Things (IoT) Data Science for Dynamic Data-Driven Application Systems in the Internet of Things (IoT) Kirk Borne @KirkDBorne Principal Data Scientist Booz Allen Hamilton, Strategic Innovation Group http://www.boozallen.com/datascience

More information

Some Basic Principles from Astronomy

Some Basic Principles from Astronomy Some Basic Principles from Astronomy The Big Question One of the most difficult things in every physics class you will ever take is putting what you are learning in context what is this good for? how do

More information

International Journal of Innovative Research in Computer and Communication Engineering

International Journal of Innovative Research in Computer and Communication Engineering FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,

More information

The 2012 Data Informed Analytics and Data Survey

The 2012 Data Informed Analytics and Data Survey The 2012 Data Informed Analytics and Data Survey Table of Contents Page 2: Page 2: Page 4: Page 21: Page 36: Page 39 Introduction Who Responded? What They Want to Know What They Don t Understand Managing

More information

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Data Mining: Introduction Lecture Notes for Chapter 1 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused - Web

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 1/28/15 Astronomy 6523 Modeling, Inference & Data Mining in Astrophysics Professor J. Cordes 622 Space Sciences Building Tuesday Thursday 1:25-2:40 pm http:/www.astro.cornell.edu/~cordes/a6523 Bayesian

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

HOW WILL ASTRONOMY ARCHIVES SURVIVE THE DATA TSUNAMI?

HOW WILL ASTRONOMY ARCHIVES SURVIVE THE DATA TSUNAMI? HOW WILL ASTRONOMY ARCHIVES SURVIVE THE DATA TSUNAMI? Astronomers are collecting more data than ever. What practices can keep them ahead of the flood? G. Bruce Berriman, NASA Exoplanet Science Institute,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Unsupervised Learning: Clustering with DBSCAN Mat Kallada

Unsupervised Learning: Clustering with DBSCAN Mat Kallada Unsupervised Learning: Clustering with DBSCAN Mat Kallada STAT 2450 - Introduction to Data Mining Supervised Data Mining: Predicting a column called the label The domain of data mining focused on prediction:

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

PLANCK'S VIEW OF THE UNIVERSE IN FRONT OF THE MICROWAVE BACKGROUND

PLANCK'S VIEW OF THE UNIVERSE IN FRONT OF THE MICROWAVE BACKGROUND PLANCK'S VIEW OF THE UNIVERSE IN FRONT OF THE MICROWAVE BACKGROUND Gravitational lensing and Sunyaev-Zeldovich signals The Planck Consortium (presented by Simon White) Paris 21/03/2013 PLANCK'S FIRST IMAGE

More information

Shroudbase Technical Overview

Shroudbase Technical Overview Shroudbase Technical Overview Differential Privacy Differential privacy is a rigorous mathematical definition of database privacy developed for the problem of privacy preserving data analysis. Specifically,

More information

Top 10 Discoveries by ESO Telescopes

Top 10 Discoveries by ESO Telescopes Top 10 Discoveries by ESO Telescopes European Southern Observatory reaching new heights in astronomy Exploring the Universe from the Atacama Desert, in Chile since 1964 ESO is the most productive astronomical

More information

Bringing the Night Sky Closer: Discoveries in the Data Deluge

Bringing the Night Sky Closer: Discoveries in the Data Deluge EARTH AND ENVIRONMENT Bringing the Night Sky Closer: Discoveries in the Data Deluge Alyssa A. Goodman Harvard University Curtis G. Wong Microsoft Research Th r o u g h o u t h i s t o r y, a s t r o n

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence Augmented Search for Web Applications New frontier in big log data analysis and application intelligence Business white paper May 2015 Web applications are the most common business applications today.

More information

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Archival Science with the ESAC Science Archives and Virtual Observatory

Archival Science with the ESAC Science Archives and Virtual Observatory Archival Science with the ESAC Science Archives and Virtual Observatory Deborah Baines Science Archives and VO Team Scientist European Space Agency (ESA) European Space Astronomy Centre (ESAC) Science

More information

VisIVO, an open source, interoperable visualization tool for the Virtual Observatory

VisIVO, an open source, interoperable visualization tool for the Virtual Observatory Claudio Gheller (CINECA) 1, Ugo Becciani (OACt) 2, Marco Comparato (OACt) 3 Alessandro Costa (OACt) 4 VisIVO, an open source, interoperable visualization tool for the Virtual Observatory 1: c.gheller@cineca.it

More information

The World-Wide Telescope, an Archetype for Online Science

The World-Wide Telescope, an Archetype for Online Science The World-Wide Telescope, an Archetype for Online Science Jim Gray, Microsoft Research Alex Szalay, Johns Hopkins University June 2002 Technical Report MSR-TR-2002-75 Microsoft Research Microsoft Corporation

More information

An ArrayLibraryforMS SQL Server

An ArrayLibraryforMS SQL Server An ArrayLibraryforMS SQL Server Scientific requirements and an implementation László Dobos 1,2 --dobos@complex.elte.hu Alex Szalay 2, José Blakeley 3, Tamás Budavári 2, István Csabai 1,2, Dragan Tomic

More information

The Big Picture: Information 01100 Technology Revolution, and 1010011 Science in the 21st Century 00101000

The Big Picture: Information 01100 Technology Revolution, and 1010011 Science in the 21st Century 00101000 011 The Big Picture: Information 01100 Technology Revolution, and 1010011 Science in the 21st Century 00101000 Roy & George s Excellent Adventure 1110100011 001001110110110 100101010001011101 Lecture 4

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

locuz.com Big Data Services

locuz.com Big Data Services locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.

More information

Data Mining and Analytics in Realizeit

Data Mining and Analytics in Realizeit Data Mining and Analytics in Realizeit November 4, 2013 Dr. Colm P. Howlin Data mining is the process of discovering patterns in large data sets. It draws on a wide range of disciplines, including statistics,

More information

CADC and CANFAR: Extending the role of the data centre. Séverin Gaudet Canadian Astronomy Data Centre

CADC and CANFAR: Extending the role of the data centre. Séverin Gaudet Canadian Astronomy Data Centre CADC and CANFAR: Extending the role of the data centre Séverin Gaudet Canadian Astronomy Data Centre February 2012 Canadian Astronomy Data Centre Heterogeneous collection: Multiple missions, facilities

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

Big Data. Fast Forward. Putting data to productive use

Big Data. Fast Forward. Putting data to productive use Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize

More information

Astrophysics Syllabus

Astrophysics Syllabus Astrophysics Syllabus Center for Talented Youth Johns Hopkins University Text: Astronomy Today: Stars and Galaxies, Volume II Author: Chaisson and McMillan Course Objective: The purpose of this course

More information

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros David Moses January 2014 Paper on Cloud Computing I Background on Tools and Technologies in Amazon Web Services (AWS) In this paper I will highlight the technologies from the AWS cloud which enable you

More information

Data Management Plan Extended Baryon Oscillation Spectroscopic Survey

Data Management Plan Extended Baryon Oscillation Spectroscopic Survey Data Management Plan Extended Baryon Oscillation Spectroscopic Survey Experiment description: eboss is the cosmological component of the fourth generation of the Sloan Digital Sky Survey (SDSS-IV) located

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Yuji Shirasaki (JVO NAOJ)

Yuji Shirasaki (JVO NAOJ) Yuji Shirasaki (JVO NAOJ) A big table : 20 billions of photometric data from various survey SDSS, TWOMASS, USNO-b1.0,GSC2.3,Rosat, UKIDSS, SDS(Subaru Deep Survey), VVDS (VLT), GDDS (Gemini), RXTE, GOODS,

More information

DAME: A Distributed Data Mining & Exploration Framework. within the Virtual Observatory

DAME: A Distributed Data Mining & Exploration Framework. within the Virtual Observatory DAME: A Distributed Data Mining & Exploration Framework within the Virtual Observatory Massimo Brescia a*, Stefano Cavuoti b Longo b Raffaele D Abrusco c, Omar Laurino d, Giuseppe a INAF Osservatorio Astronomico

More information

LSST and the Cloud: Astro Collaboration in 2016 Tim Axelrod LSST Data Management Scientist

LSST and the Cloud: Astro Collaboration in 2016 Tim Axelrod LSST Data Management Scientist LSST and the Cloud: Astro Collaboration in 2016 Tim Axelrod LSST Data Management Scientist DERCAP Sydney, Australia, 2009 Overview of Presentation LSST - a large-scale Southern hemisphere optical survey

More information

Adaptive Optics (AO) TMT Partner Institutions Collaborating Institution Acknowledgements

Adaptive Optics (AO) TMT Partner Institutions Collaborating Institution Acknowledgements THIRTY METER TELESCOPE The past century of astronomy research has yielded remarkable insights into the nature and origin of the Universe. This scientific advancement has been fueled by progressively larger

More information

Computational Science and Informatics (Data Science) Programs at GMU

Computational Science and Informatics (Data Science) Programs at GMU Computational Science and Informatics (Data Science) Programs at GMU Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ Outline Graduate Program

More information

Big Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs

Big Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs 1 Big Data Analytics Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs Montevideo, 22 nd November 4 th December, 2015 INFORMATIQUE

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Alignment and Preprocessing for Data Analysis

Alignment and Preprocessing for Data Analysis Alignment and Preprocessing for Data Analysis Preprocessing tools for chromatography Basics of alignment GC FID (D) data and issues PCA F Ratios GC MS (D) data and issues PCA F Ratios PARAFAC Piecewise

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information