# Learning from Big Data in

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Learning from Big Data in Astronomy an overview Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences

3 to Big Data Astronomy is a Big Challenge 3

4 Scary News: Big Data is taking us to a Tipping Point 4

5 Good News: Big Data is Sexy 5

6 Characteristics of Big Data If the only distinguishing characteristic was that we have lots of data, we would call it Lots of Data. Big Data characteristics: the 3+n V s = 1. Volume (lots of data = Tonnabytes ) 2. Variety (complexity, curse of dimensionality) 3. Velocity (rate of data and information flow) 4. Veracity (verifying inference-based models from comprehensive data collections) 5. Variability 6. Venue 7. Vocabulary 6

7 Characteristics of Big Data If the only distinguishing characteristic was that we have lots of data, we would call it Lots of Data. Big Data characteristics: the 3+n V s = 1. Volume 2. Variety : this one helps us to discriminate subtle new classes (= Class Discovery) 3. Velocity 4. Veracity 5. Variability 6. Venue 7. Vocabulary 7

8 Insufficient Variety: stars & galaxies are not separated in this parameter 8

9 Sufficient Variety: stars & galaxies are separated in this parameter 9

10 Class Discovery: feature separation and discrimination of classes Reference: The separation of classes improves when the correct features are chosen for investigation, as in the following star-galaxy discrimination test: the Star-Galaxy Separation Problem Not good Good 10

11 3 Categories of Scientific Knowledge Discovery in Databases Class Discovery Clustering of scientific parameters Finding new classes of objects and behaviors Novelty Discovery Finding new, rare, one-in-a-million(billion)(trillion) objects and events Correlation Discovery Finding new patterns and dependencies, which reveal new physics or new scientific principles 11

12 Big Data Science: Scientific KDD (Knowledge Discovery from Data) Characterize the known (clustering, unsupervised learning) Assign the new (classification, supervised learning) Discover the unknown (outlier detection, semi-supervised learning) Graphic from S. G. Djorgovski Benefits of very large datasets: best statistical analysis of typical events automated search for rare events

13 This graphic says it all Clustering examine the data and find the data clusters (clouds), without considering what the items are = Characterization! Classification for each new data item, try to place it within a known class (i.e., a known category or cluster) = Classify! Outlier Detection identify those data items that don t fit into the known classes or clusters = Surprise! Graphic provided by Professor S. G. Djorgovski, Caltech 13

14 Astroinformatics Astroinformatics is data-oriented astronomy for research and education (similar to Bioinformatics, Geoinformatics, Cheminformatics, and many others). Informatics is the discipline of organizing, accessing, mining, & analyzing information describing complex systems (e.g., the human genome, the Earth, or the Universe). X-informatics: key enabler of scientific discovery in the era of Big Data science. (X = Bio, Geo, Astro,...) (Jim Gray, KDD-2003) Intelligent data discovery, integration, mining, and analysis tools will enable exponential knowledge discovery within exponentially growing data collections. 14

15 Astroinformatics Research paper available! Addresses the data science challenges, research agenda, application areas, use cases, and recommendations for the new science of Astroinformatics. Borne (2010): Astroinformatics: Data-Oriented Astronomy Research and Education, Journal of Earth Science Informatics, vol. 3, pp See also

16 Astronomy Data Environment: Sky Surveys To avoid biases caused by limited samples, astronomers now study the sky systematically = Sky Surveys Surveys are used to measure and collect data from all objects that are contained in large regions of the sky, in a systematic, controlled, repeatable fashion. Brief(!!) history of surveys (... this is just a subset): MACHO and related surveys for dark matter objects: ~ 1 Terabyte Digitized Palomar Sky Survey: 3 Terabytes 2MASS (2-Micron All-Sky Survey): 10 Terabytes GALEX (ultraviolet all-sky survey): 30 Terabytes Sloan Digital Sky Survey (1/4 of the sky): 40 Terabytes Pan-STARRS: 40 Petabytes! Leading up to the big survey next decade: LSST (Large Synoptic Survey Telescope): 100 Petabytes!

17 LSST = Large Synoptic Survey Telescope 8.4-meter diameter primary mirror = 10 square degrees! Hello! Petabyte image archive Petabyte database catalog

18 Observing Strategy: One pair of images every 40 seconds for each spot on the sky, then continue across the sky continuously every night for 10 years (~ ), with time domain sampling in log(time) intervals (to capture dynamic range of transients). LSST (Large Synoptic Survey Telescope): Ten-year time series imaging of the night sky mapping the Universe! ~2,000,000 events each night anything that goes bump in the night! Cosmic Cinematography! The New Education and Public Outreach have been an integral and key feature of the project since the beginning the EPO program includes formal Ed, informal Ed, Citizen Science projects, and Science Centers / Planetaria.

19 LSST Key Science Drivers: Mapping the Dynamic Universe Solar System Inventory (moving objects, NEOs, asteroids: census & tracking) Nature of Dark Energy (distant supernovae, weak lensing, cosmology) Optical transients (of all kinds, with alert notifications within 60 seconds) Digital Milky Way (proper motions, parallaxes, star streams, dark matter) LSST in time and space: When? ~ Where? Cerro Pachon, Chile Architect s design of LSST Observatory

20 3-Gigapixel camera LSST Summary One 6-Gigabyte image every 20 seconds 30 Terabytes every night for 10 years 100-Petabyte final image data archive anticipated all data are public!!! 20-Petabyte final database catalog anticipated Real-Time Event Mining: 1-10 million events per night, every night, for 10 yrs Follow-up observations required to classify these Repeat images of the entire night sky every 3 nights: Celestial Cinematography

21 LSST Science Collaborations The original 10 teams: 1. Solar System 2. Milky Way & the Local Volume 3. Transients/Variable Stars 4. Stellar Populations 5. Supernovae 6. Galaxies 7. Active Galactic Nuclei (AGN, quasars) 8. Strong Lensing 9. Weak Lensing 10. Large-scale Structure / Baryon Oscillations

22 LSST Science Collaborations Original 10 teams: 1. Solar System 2. Milky Way & the Local Volume 3. Transients/Variable Stars 4. Stellar Populations 5. Supernovae 6. Galaxies 7. Active Galactic Nuclei (AGN, quasars) 8. Strong Lensing 9. Weak Lensing 10. Large-scale Structure / Baryon Oscillations Now one more: theissc

23 The LSST ISSC Informatics and Statistics Science Collaboration Core Leadership team: Kirk Borne Jogesh Babu Tom Loredo Eric Feigelson Alex Gray ~50 members include astronomers, statisticians, and machine learning data miners

24 The LSST ISSC research focus In discussing the data-related research challenges posed by LSST, we identified several research areas: Statistics Data & Information Visualization Data mining (machine learning) Data-intensive computing & analysis Informatics Large-scale scientific data management Semantics and Ontologies These areas represent Statisticsand the science of Informatics(Astroinformatics) = Data-intensive Science = the 4 th Paradigm of Scientific Research Addressing the LSST Data to Knowledge challenge Helping to discover the unknown unknowns

25 The LSST Big Data Challenges 1. Massive data stream: ~2 Terabytes of image data per hour that must be mined in real time (for 10 years). 2. Massive 20-Petabyte database: more than 50 billion objects need to be classified, and most will be monitored for important variations in real time. 3. Massive event stream: knowledge extraction in real time for ~2,000,000 events each night. Challenge #1 includes both the static data mining aspects of #2 and the dynamic data mining aspects of #3. Look at #3 in more detail...

26 LSST data mining challenge # 3 Approximately 2,000,000 times each night for 10 years LSST will obtain the following data on a new sky event, and we will be challenged with classifying these data: flux time

27 LSST data mining challenge # 3 Approximately 2,000,000 times each night for 10 years LSST will obtain the following data on a new sky event, and we will be challenged with classifying these data: more data points help! flux time

28 LSST data mining challenge # 3 Approximately 2,000,000 times each night for 10 years LSST will obtain the following data on a new sky event, and we will be challenged with classifying these data: more data points help! flux time

29 LSST data mining challenge # 3 Approximately 2,000,000 times each night for 10 years LSST will obtain the following data on a new sky event, and we will be challenged with classifying these data: What will we do with all of these events?

30 LSST data mining challenge # 3 Approximately 2,000,000 times each night for 10 years LSST will obtain the following data on a new sky event, and we will be challenged with classifying these data: What will we do with all of these events? Characterize first! (Unsupervised Learning) Classify later.

31 Characterization includes Feature Detection and Extraction: Identifying and describing features in the data via machine algorithms or human inspection (including the potentially huge contributions from Citizen Science) Extracting feature descriptors from the data Curating these features for search, re-use, & discovery Finding other parameters and features from other archives, other databases, other information sources and using those to help characterize (ultimately classify) each new event. This introduces more variety and complexity to the data... but this should help us to discriminate subtle distinctions between classes and hopefully discover new classes!

32 Data-driven Discovery (Unsupervised Learning) i.e., What can I do with characterizations? 1. Class Discovery Clustering 2. Principal Component Analysis Dimension Reduction 3. Outlier (Anomaly/ Deviation / Novelty) Detection Surprise Discovery 4. Link Analysis Association Analysis Network Analysis 5. and more.

33 Data-driven Discovery (Unsupervised Learning) Class Discovery Clustering Distinguish different classes of behavior or different types of objects Find new classes of behavior or new types of objects Describe a large data collection by a small number of condensed representations Principal Component Analysis Dimension Reduction Find the dominant features among all of the data attributes Generate low-dimensional descriptions of events and behaviors, while revealing correlations and dependencies among parameters Addresses the Curse of Dimensionality Outlier Detection Surprise / Anomaly / Deviation / Novelty Discovery Find the unknown unknowns (the rare one-in-a-billion or one-in-a-trillion event) Find objects and events that are outside the bounds of our expectations These could be garbage (erroneous measurements) or true discoveries Used for data quality assurance and/or for discovery of new / rare / interesting data items Link Analysis Association Analysis Network Analysis Identify connections between different events (or objects) Find unusual (improbable) co-occurring combinations of data attribute values Find data items that have much fewer than 6 degrees of separation

34 Why do all of this? for 4 very simple reasons: (1) Any real data table may consist of thousands, or millions, or billions of rows of numbers. (2) Any real data table will probably have many more (perhaps hundreds more) attributes (features), not just two. (3) Humans can make mistakes when staring for hours at long lists of numbers, especially in a dynamic data stream. (4) The use of a data-driven model provides an objective, scientific, rational, and justifiable test of a hypothesis. 34

35 Why do all of this? for 4 very simple reasons: (1) Any real data table may consist of thousands, or millions, or billions of rows of numbers. Volume (2) Any real data table will probably have many more (perhaps hundreds more) attributes (features), not just two. Variety (3) Humans can make mistakes when staring for hours at long lists of numbers, especially in a dynamic data stream. Velocity (4) The use of a data-driven model provides an objective, scientific, rational, and justifiable test of a hypothesis. 35 Veracity

36 Why do all of this? for 4 very simple reasons: (1) Any real data table may consist of thousands, It is or too millions, much or! billions of rows of numbers. Volume (2) Any real data table will probably have many more It is (perhaps too complex hundreds! more) attributes (features), not just two. Variety (3) Humans can make mistakes when staring for It hours keeps at on long coming lists! of numbers, especially in a dynamic data stream. Velocity (4) The use of a data-driven model provides Veracity an objective, Can scientific, you prove rational, your results and? justifiable test of a hypothesis. 36

37 Rationale for BIG DATA 1 Consequently, if we collect a thorough set of parameters (high-dimensional data) for a complete set of items within our domain of study, then we would have a perfect statistical model for that domain. In other words, Big Data becomes the model for a domain X = we call this X-informatics. Anything we want to know about that domain is specified and encoded within the data. The goal of Big Data Science is to find those encodings, patterns, and knowledge nuggets. Recall what we said before 37

38 Rationale for BIG DATA 2 one of the two major benefits of BIG DATA is to provide the best statistical analysis ever(!) for the domain of study. Benefits of very large datasets: 1. best statistical analysis of typical events 2. automated search for rare events 38

39 The other great rationale for Big Data Outlier detection: Anomaly detection / novelty discovery Rare event discovery Surprise Discovery Finding the objects and events that are outside the bounds of our expectations (outside known clusters) Discovering the unknown unknowns These may be real scientific discoveries or garbage Outlier detection is therefore useful for: Anomaly Detection is the detector system working? Data Quality Assurance is the data pipeline working? Novelty Discovery is my Nobel prize waiting?

40 Data Science addresses the Data-to-Knowledge Challenges: Finding order in Big Data and Learning from Big Data

### The Tonnabytes Big Data Challenge: Transforming Science and Education. Kirk Borne George Mason University

The Tonnabytes Big Data Challenge: Transforming Science and Education Kirk Borne George Mason University Ever since we first began to explore our world humans have asked questions and have collected evidence

### The Past, Present, and Future of Data Science Education

The Past, Present, and Future of Data Science Education Kirk Borne @KirkDBorne http://kirkborne.net George Mason University School of Physics, Astronomy, & Computational Sciences Outline Research and Application

### Data Literacy For All: Astrophysics and Beyond (Astronomy is evidence-based forensic science, thus it is a data & information science)

Data Literacy For All: Astrophysics and Beyond (Astronomy is evidence-based forensic science, thus it is a data & information science) Kirk Borne George Mason University, Fairfax, VA www.kirkborne.net

### Conquering the Astronomical Data Flood through Machine

Conquering the Astronomical Data Flood through Machine Learning and Citizen Science Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ The Problem:

### Analytics-as-a-Service: From Science to Marketing

Analytics-as-a-Service: From Science to Marketing Data Information Knowledge Insights (Discovery & Decisions) Kirk Borne George Mason University, Fairfax, VA www.kirkborne.net @KirkDBorne Big Data: What

### Taming the Internet of Things: The Lord of the Things

Taming the Internet of Things: The Lord of the Things Kirk Borne @KirkDBorne School of Physics, Astronomy, & Computational Sciences College of Science, George Mason University, Fairfax, VA Taming the Internet

### LSST and the Cloud: Astro Collaboration in 2016 Tim Axelrod LSST Data Management Scientist

LSST and the Cloud: Astro Collaboration in 2016 Tim Axelrod LSST Data Management Scientist DERCAP Sydney, Australia, 2009 Overview of Presentation LSST - a large-scale Southern hemisphere optical survey

### Computational Science and Informatics (Data Science) Programs at GMU

Computational Science and Informatics (Data Science) Programs at GMU Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ Outline Graduate Program

### Data analysis of L2-L3 products

Data analysis of L2-L3 products Emmanuel Gangler UBP Clermont-Ferrand (France) Emmanuel Gangler BIDS 14 1/13 Data management is a pillar of the project : L3 Telescope Caméra Data Management Outreach L1

### Data Mining Challenges and Opportunities in Astronomy

Data Mining Challenges and Opportunities in Astronomy S. G. Djorgovski (Caltech) With special thanks to R. Brunner, A. Szalay, A. Mahabal, et al. The Punchline: Astronomy has become an immensely datarich

### Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

### Commentary on Techniques for Massive- Data Machine Learning in Astronomy

1 of 24 Commentary on Techniques for Massive- Data Machine Learning in Astronomy Nick Ball Herzberg Institute of Astrophysics Victoria, Canada The Problem 2 of 24 Astronomy faces enormous datasets Their

### College of Science George Mason University Fairfax, VA 22030

College of Science George Mason University Fairfax, VA 22030 Dr. Sidney Wolff and the LSST Board of Directors LSST Corporation 933 N. Cherry Avenue Tucson, AZ 85721-0009 June 14, 2010 Dear Dr. Wolff and

### MANAGING AND MINING THE LSST DATA SETS

MANAGING AND MINING THE LSST DATA SETS Astronomy is undergoing an exciting revolution -- a revolution in the way we probe the universe and the way we answer fundamental questions. New technology enables

### Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research

Astrophysics with Terabyte Datasets Alex Szalay, JHU and Jim Gray, Microsoft Research Living in an Exponential World Astronomers have a few hundred TB now 1 pixel (byte) / sq arc second ~ 4TB Multi-spectral,

### Cosmic Variability Study in Taiwan

Cosmic Variability Study in Taiwan Wen-Ping Chen Institute of Astronomy National Central University, Taiwan 2010 November 16@Jena/YETI Advantages in Taiwan: - Many high mountains - Western Pacific longitude

### Visualization of Large Multi-Dimensional Datasets

***TITLE*** ASP Conference Series, Vol. ***VOLUME***, ***PUBLICATION YEAR*** ***EDITORS*** Visualization of Large Multi-Dimensional Datasets Joel Welling Department of Statistics, Carnegie Mellon University,

### Information Management course

Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

### curation, analyses and interpretation of massive datasets opportunities are varied across disciplines

! Efficiency in scientific discovery through curation, analyses and interpretation of massive datasets! Uptake level and concentration on Big Data opportunities are varied across disciplines The nature

### Bringing the Night Sky Closer: Discoveries in the Data Deluge

EARTH AND ENVIRONMENT Bringing the Night Sky Closer: Discoveries in the Data Deluge Alyssa A. Goodman Harvard University Curtis G. Wong Microsoft Research Th r o u g h o u t h i s t o r y, a s t r o n

### The Scientific Data Mining Process

Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

### Data Driven Initiatives in Astronomy and Biology

Data Driven Initiatives in Astronomy and Biology An NKN Project By Kaustubh Vaghmare (IUCAA, Pune) 22nd January, 2016 th [4 NKN Workshop, Hyderabad] Partnership with NKN and... IUCAA (Inter-University

### The Challenge of Data in an Era of Petabyte Surveys Andrew Connolly University of Washington

The Challenge of Data in an Era of Petabyte Surveys Andrew Connolly University of Washington We acknowledge support from NSF IIS-0844580 and NASA 08-AISR08-0081 The science of big data sets Big Questions

### LSST Data Management. Tim Axelrod Project Scientist - LSST Data Management. Thursday, 28 Oct 2010

LSST Data Management Tim Axelrod Project Scientist - LSST Data Management Thursday, 28 Oct 2010 Outline of the Presentation LSST telescope and survey Functions and architecture of the LSST data management

### Class 2 Solar System Characteristics Formation Exosolar Planets

Class 1 Introduction, Background History of Modern Astronomy The Night Sky, Eclipses and the Seasons Kepler's Laws Newtonian Gravity General Relativity Matter and Light Telescopes Class 2 Solar System

### Big Data Era Challenges and Opportunities in Astronomy: How SOM/LVQ and Related Learning Methods Can Contribute?

Big Data Era Challenges and Opportunities in Astronomy: How SOM/LVQ and Related Learning Methods Can Contribute? Prof. Pablo Estévez, Ph.D. Department of Electrical Engineering, Universidad de Chile &

### The World-Wide Telescope, an Archetype for Online Science

The World-Wide Telescope, an Archetype for Online Science Jim Gray, Microsoft Research Alex Szalay, Johns Hopkins University June 2002 Technical Report MSR-TR-2002-75 Microsoft Research Microsoft Corporation

### Big Data Hope or Hype?

Big Data Hope or Hype? David J. Hand Imperial College, London and Winton Capital Management Big data science, September 2013 1 Google trends on big data Google search 1 Sept 2013: 1.6 billion hits on big

### First Discoveries. Asteroids

First Discoveries The Sloan Digital Sky Survey began operating on June 8, 1998. Since that time, SDSS scientists have been hard at work analyzing data and drawing conclusions. This page describes seven

### Introduction. A. Bellaachia Page: 1

Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

### DATA MINING: AN OVERVIEW

DATA MINING: AN OVERVIEW Samir Farooqi I.A.S.R.I., Library Avenue, Pusa, New Delhi-110 012 samir@iasri.res.in 1. Introduction Rapid advances in data collection and storage technology have enables organizations

### The Large Synoptic Survey Telescope: Status Update

The Large Synoptic Survey Telescope: Status Update Steven M. Kahn LSST Director Mid-Decadal Review Committee December 13, 2015 LSST in a Nutshell The LSST is an integrated survey system designed to conduct

### Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

### Concept and Project Objectives

3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

### Exploring Space in Cyberspace: Cyber-Enabled Research and

011 Exploring Space in Cyberspace: 01100 Cyber-Enabled Research and 1010011 Discovery in Astronomy 00101000 S. George Djorgovski 1110100011 (Caltech) 001001110110110 100101010001011101 CDI Workshop, Seattle,

### What is the Sloan Digital Sky Survey?

What is the Sloan Digital Sky Survey? Simply put, the Sloan Digital Sky Survey is the most ambitious astronomical survey ever undertaken. The survey will map one-quarter of the entire sky in detail, determining

### Challenges in e-science: Research in a Digital World

Challenges in e-science: Research in a Digital World Thom Dunning National Center for Supercomputing Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

### Science Standard 4 Earth in Space Grade Level Expectations

Science Standard 4 Earth in Space Grade Level Expectations Science Standard 4 Earth in Space Our Solar System is a collection of gravitationally interacting bodies that include Earth and the Moon. Universal

### Einstein Rings: Nature s Gravitational Lenses

National Aeronautics and Space Administration Einstein Rings: Nature s Gravitational Lenses Leonidas Moustakas and Adam Bolton Taken from: Hubble 2006 Science Year in Review The full contents of this book

### Data-Driven Discovery through e-science Technologies

Data-Driven Discovery through e-science Technologies Kirk D. Borne George Mason University, QSS Group Inc., and NASA-GSFC Kirk.borne@gsfc.nasa.gov Abstract Future space missions and science programs will

### UCLA Graduate School of Education and Information Studies UCLA

UCLA Graduate School of Education and Information Studies UCLA Peer Reviewed Title: Slides for When use cases are not useful: Data practices, astronomy, and digital libraries Author: Wynholds, Laura, University

### Undergraduate Studies Department of Astronomy

WIYN 3.5-meter Telescope at Kitt Peak near Tucson, AZ Undergraduate Studies Department of Astronomy January 2014 Astronomy at Indiana University General Information The Astronomy Department at Indiana

### The New Computational and Data Sciences Undergraduate Program at George Mason University

The New Computational and Data Sciences Undergraduate Program at George Mason University Kirk Borne, John Wallin, and Robert Weigel Computational and Data Sciences, George Mason University, Fairfax, VA

### DAME Astrophysical DAta Mining Mining & & Exploration Exploration GRID

DAME Astrophysical DAta Mining & Exploration on GRID M. Brescia S. G. Djorgovski G. Longo & DAME Working Group Istituto Nazionale di Astrofisica Astronomical Observatory of Capodimonte, Napoli Department

### The Virtual Observatory: What is it and how can it help me? Enrique Solano LAEFF / INTA Spanish Virtual Observatory

The Virtual Observatory: What is it and how can it help me? Enrique Solano LAEFF / INTA Spanish Virtual Observatory Astronomy in the XXI century The Internet revolution (the dot com boom ) has transformed

### Introduction to Data Mining

Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

### A Preliminary Summary of The VLA Sky Survey

A Preliminary Summary of The VLA Sky Survey Eric J. Murphy and Stefi Baum (On behalf of the entire Science Survey Group) 1 Executive Summary After months of critical deliberation, the Survey Science Group

### Summary of Data Management Principles Dark Energy Survey V2.1, 7/16/15

Summary of Data Management Principles Dark Energy Survey V2.1, 7/16/15 This Summary of Data Management Principles (DMP) has been prepared at the request of the DOE Office of High Energy Physics, in support

### A short history of telescopes and astronomy: Galileo to the TMT

A short history of telescopes and astronomy: Galileo to the TMT Telescopes in the last 400 years Galileo 1608 Hans Lippershey applied for a patent for seeing things far away as if they were nearby 1609

### OBSERVING THE UNIVERSE

OBSERVING THE UNIVERSE Overview: Galaxies are classified by their morphology. Objectives: The student will: classify 15 images of distant galaxies using a galaxy classification table; sketch, classify

### Data, Measurements, Features

Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

### Galaxy Morphological Classification

Galaxy Morphological Classification Jordan Duprey and James Kolano Abstract To solve the issue of galaxy morphological classification according to a classification scheme modelled off of the Hubble Sequence,

### 165 points. Name Date Period. Column B a. Cepheid variables b. luminosity c. RR Lyrae variables d. Sagittarius e. variable stars

Name Date Period 30 GALAXIES AND THE UNIVERSE SECTION 30.1 The Milky Way Galaxy In your textbook, read about discovering the Milky Way. (20 points) For each item in Column A, write the letter of the matching

### Software challenges in the implementation of large surveys: the case of J-PAS

Software challenges in the implementation of large surveys: the case of J-PAS 1/21 Paulo Penteado - IAG/USP pp.penteado@gmail.com http://www.ppenteado.net/ast/pp_lsst_201204.pdf (K. Taylor) (A. Fernández-Soto)

### Astronomy & Physics Resources for Middle & High School Teachers

Astronomy & Physics Resources for Middle & High School Teachers Gillian Wilson http://www.faculty.ucr.edu/~gillianw/k12 A cosmologist is.... an astronomer who studies the formation and evolution of the

### Data Mining and Machine Learning in Bioinformatics

Data Mining and Machine Learning in Bioinformatics PRINCIPAL METHODS AND SUCCESSFUL APPLICATIONS Ruben Armañanzas http://mason.gmu.edu/~rarmanan Adapted from Iñaki Inza slides http://www.sc.ehu.es/isg

### Visualizing and Analyzing Massive Astronomical Datasets with Partiview

Visualizing and Analyzing Massive Astronomical Datasets with Partiview Brian P. Abbott 1, Carter B. Emmart 1, Stuart Levy 2, and Charles T. Liu 1 1 American Museum of Natural History & Hayden Planetarium,

### Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy

Astronomical Data Analysis Software and Systems XIV ASP Conference Series, Vol. XXX, 2005 P. L. Shopbell, M. C. Britton, and R. Ebert, eds. P2.1.25 Making the Most of Missing Values: Object Clustering

### Data Pipelines & Archives for Large Surveys. Peter Nugent (LBNL)

Data Pipelines & Archives for Large Surveys Peter Nugent (LBNL) Overview Major Issues facing any large-area survey/search: Computational power for search - data transfer, processing, storage, databases

### ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open

### The Minor Planet Center Status Report

The Minor Planet Center Status Report Matthew J Holman Interim Director, Minor Planet Center Smithsonian Astrophysical Observatory Harvard-Smithsonian Center for Astrophysics 8 November 2015 MPC Organization

### Organization and Staffing

Large Synoptic Survey Telescope (LSST) Organization and Staffing Robert McKercher LPM-103 Latest Revision Date: September 3, 2013 Change Record Version Date Description Owner name 1 9/3/2013 Initial Version

### Data Intensive Scalable Computing. Harnessing the Power of Cloud Computing

Data Intensive Scalable Computing Harnessing the Power of Cloud Computing Randal E. Bryant February, 2009 Our world is awash in data. Millions of devices generate digital data, an estimated one zettabyte

### Top 10 Discoveries by ESO Telescopes

Top 10 Discoveries by ESO Telescopes European Southern Observatory reaching new heights in astronomy Exploring the Universe from the Atacama Desert, in Chile since 1964 ESO is the most productive astronomical

### Introduction to Data Mining

Bioinformatics Ying Liu, Ph.D. Laboratory for Bioinformatics University of Texas at Dallas Spring 2008 Introduction to Data Mining 1 Motivation: Why data mining? What is data mining? Data Mining: On what

### Statistics for BIG data

Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

### Big Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs

1 Big Data Analytics Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs Montevideo, 22 nd November 4 th December, 2015 INFORMATIQUE

### 8.1: Touring the Night Sky pg. 305

Unit D: The Study of the Universe 8.1: Touring the Night Sky pg. 305 Key Concepts: 1. Careful observation of the night sky can offer clues about the motion of celestial objects. 2. Celestial objects in

### Introduction to LSST Data Management. Jeffrey Kantor Data Management Project Manager

Introduction to LSST Data Management Jeffrey Kantor Data Management Project Manager LSST Data Management Principal Responsibilities Archive Raw Data: Receive the incoming stream of images that the Camera

### Astrophysics Syllabus

Astrophysics Syllabus Center for Talented Youth Johns Hopkins University Text: Astronomy Today: Stars and Galaxies, Volume II Author: Chaisson and McMillan Course Objective: The purpose of this course

### OUTLIER ANALYSIS. Data Mining 1

OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,

### Machine Learning/Data Mining for Cancer Genomics

Machine Learning/Data Mining for Cancer Genomics Bernard Manderick, Vrije Universiteit Brussel Henry Nyongesa, University of the Western Cape Collaboration: Artificial Intelligence Laboratory VUB Intelligent

### Big Data a threat or a chance?

Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

### Big Data, Enormous Opportunity

Big Data, Enormous Opportunity Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington The 27th EllioH Organick Memorial Lectures University of Utah April 2014

### ealtime Cloud Screening with VIRIS-NG

pyright 2013 California Institute of Technology. Jet Propulsion Laboratory California Institute of Technology Leveraging Annotated Archival Data with Domain Adaptation to Improve Data Triage in Optical

### Size and Scale of the Universe

Size and Scale of the Universe (Teacher Guide) Overview: The Universe is very, very big. But just how big it is and how we fit into the grand scheme can be quite difficult for a person to grasp. The distances

### Is Big Data a Big Deal? What Big Data Does to Science

Is Big Data a Big Deal? What Big Data Does to Science Netherlands escience Center Wilco Hazeleger Wilco Hazeleger Student @ Wageningen University and Reading University Meteorology PhD @ Utrecht University,

### Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

### Migrating a (Large) Science Database to the Cloud

The Sloan Digital Sky Survey Migrating a (Large) Science Database to the Cloud Ani Thakar Alex Szalay Center for Astrophysical Sciences and Institute for Data Intensive Engineering and Science (IDIES)

### INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer

### CHAPTER 1 INTRODUCTION

1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

### and the VO-Science Francisco Jiménez Esteban Suffolk University

The Spanish-VO and the VO-Science Francisco Jiménez Esteban CAB / SVO (INTA-CSIC) Suffolk University The Spanish-VO (SVO) IVOA was created in June 2002 with the mission to facilitate the international

### CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

### The Data Mining Process

Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

### MODULE P7: FURTHER PHYSICS OBSERVING THE UNIVERSE OVERVIEW

OVERVIEW More than ever before, Physics in the Twenty First Century has become an example of international cooperation, particularly in the areas of astronomy and cosmology. Astronomers work in a number

### International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

### Challenges and Solutions for Big Data in the Public Sector:

Challenges and Solutions for Big Data in the Public Sector: Digital Government Institute s Annual Big Data Conference, October 9, Washington, DC Reagan Building Dr. Brand Niemann Director and Senior Data

### International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

### Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

### LSST All Hands Meeting SLAC, December 4-8 2006 (MAP)

LSST All Hands Meeting SLAC, December 4-8 2006 (MAP) Monday, December 4 th Plenary Session Day One, Kavli Auditorium Project Status 1:00 Welcome; Project and MREFC Status D. Sweeney 1:40 Directors Report

### What Are Stars? continued. What Are Stars? How are stars formed? Stars are powered by nuclear fusion reactions.

What Are Stars? How are stars formed? Stars are formed from clouds of dust and gas, or nebulas, and go through different stages as they age. star: a large celestial body that is composed of gas and emits

### Data Mining: Overview. What is Data Mining?

Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

### Big Data Processing and Analytics for Mouse Embryo Images

Big Data Processing and Analytics for Mouse Embryo Images liangxiu han Zheng xie, Richard Baldock The AGILE Project team FUNDS Research Group - Future Networks and Distributed Systems School of Computing,

### Science@ESA vodcast series. Script for Episode 6 Charting the Galaxy - from Hipparcos to Gaia

Science@ESA vodcast series Script for Episode 6 Charting the Galaxy - from Hipparcos to Gaia Available to download from http://sci.esa.int/gaia/vodcast Hello, I m Rebecca Barnes and welcome to the Science@ESA

### The Cosmic Origins of the Universe - the Stars, the Elements, and Earth.

The Cosmic Origins of the Universe - the Stars, the Elements, and Earth. Pomona College Astronomy 9 Dr. Bryan Penprase, Frank P. Brackett Professor of Astronomy, Pomona College Department of Physics and

### 4 Formation of the Universe

CHAPTER 2 4 Formation of the Universe SECTION Stars, Galaxies, and the Universe BEFORE YOU READ After you read this section, you should be able to answer these questions: What is the big bang theory? How

### Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Introduction Lecture Notes for Chapter 1 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused - Web

### LSST Data Management System Applications Layer Simulated Data Needs Description: Simulation Needs for DC3

LSST Data Management System Applications Layer Simulated Data Needs Description: Simulation Needs for DC3 Draft 25 September 2008 A joint document from the LSST Data Management Team and Image Simulation