Lecture 5b: Data Mining. Peter Wheatley
|
|
|
- Daniel Jefferson
- 9 years ago
- Views:
Transcription
1 Lecture 5b: Data Mining Peter Wheatley
2 Data archives Most astronomical data now available via archives Raw data and high-level products usually available Data reduction software often specific to individual missions/instruments MAST Multimission Archive at STScI HST and ultraviolet missions HEASARC - High Energy Astrophysics Science Archive Research Center X-ray and gamma-ray missions ESO Science Archive Facility All ESO instruments And many others
3
4
5
6 ASCII text files Data file formats Commonly used for small / simple datasets FITS - Flexible Image Transport System The most common format for astronomical data Ascii headers and binary data storage Designed for images, but extended for 2d (and even 3d) tables HDS - Hierarchical Data System UK Starlink standard, still used by many starlink tools Highly versatile, with arbitrarily complex data structures No longer widely used VOTable virtual observatory table XML standard for astronomical data tables Designed for easy exchange of data for Virtual Observatory tools Software can understand rather than just read (e.g. defn of exposure time)
7 Accessing FITS files cfitsio - set of sub-routines for reading/writing fits files Often built into scripting languages, e.g. python FV- Useful tool for viewing contents of FITS files Part of the HEASOFT software, or available stand-alone
8 Astronomical catalogues CDS Simbad data and bibliography for objects mentioned in papers
9 Astronomical catalogues CDS Vizier inc. sky survey catalogues and tables from papers
10 Image and catalogue display ds9 Image display and comparison Image and catalogue download Catalogue overlay
11 Image and catalogue display Gaia Starlink software Includes photometry and astrometry software
12 Lecture 5 Assignment, Part 1 of 2 Use SIMBAD to determine the nature of the star HD and any companion objects Download from HEASARC a ROSAT PSPC image of the star HD Display the ROSAT image aligned with an optical image of the same field (downloaded from within ds9 or Gaia) Overlay SIMBAD sources (also downloaded within ds9/gaia) Send us a screenshot Determine the name and nature of the famous nebula nearby
13 TOPCAT Java GUI h)p:// STIL library for dealing with tabular data h)p:// STILTS command- line tool h)p:// TOPCAT & STIL(TS) Key features: Import/export catalogs in various formats Cross- matching and merging PloGng and manipula?ng data Search and retrieve data from archives
14 TOPCAT : VO retrieval
15 add custom columns Table Data
16 3 tables loaded TOPCAT Example sca)er plot sky matched pairs table matching
17 TOPCAT sky plot, colour coded subsets
18 SQL Queries SQL refers to query syntax allowing you to search and interact with large databases stored by a SQL-compatible server The Structured Query Language is very popular, originating outside of astronomy, and can handle very large databases plus many clients (commercial and open source implementations available) Many astronomical databases powered by SQL even if you may query it by other means (a webform or VO query) When simpler frontends fail, or once tables involved become very large, you may need to resort to performing your own SQL queries
19 Database lingo Tables with rows and columns = Database with records and fields ID Source Class Magnitude Filter Records in separate tables may be connected to each other: Class ID Value 1 Star 2 Galaxy 3 Unknown Relational database Filter ID 1 U 2 B 3 V Value 4 R
20 SQL Queries : SDSS SDSS prime and early example of a large astronomical data resource that offered multiple ways of retrieving data, including custom SQL queries Tutorial covering performing SQL queries on SDSS data:
21 Table browser Schema Browser Need to know what Tables are available and which fields are stored in those tables
22 Basic Query SELECT ra,dec FROM specobj WHERE ra BETWEEN 140 and 141 AND dec BETWEEN 20 and 21 AND class='star' which fields do you want returned? from which table? boolean constraints on fields that records must sa?sfy
23 Relational Query SELECT specobj.fiberid, PhotoObj.modelMag_u, PhotoObj.modelMag_g, PhotoObj.modelMag_r, PhotoObj.modelMag_i, PhotoObj.modelMag_z, PhotoObj.ra, PhotoObj.dec, specobj.z, PhotoObj.ObjID FROM PhotoObj, specobj WHERE specobj.bestobjid = PhotoObj.ObjID AND specobj.class = 'qso' AND specobj.zwarning = 0 AND specobj.z between 0.3 and 0.4 which fields do you want returned? [Table.Field] from which tables? need to ensure we compare same object boolean constraints on fields that records must sa?sfy [Table.field = Value]
24 Result
25 Powerful Queries SELECT TOP 10 run,camcol,rerun,field,objid,ra,dec FROM Galaxy WHERE ( ( flags & (dbo.fphotoflags('binned1') dbo.fphotoflags('binned2') dbo.fphotoflags('binned4')) ) > 0 and ( flags & (dbo.fphotoflags('blended') dbo.fphotoflags('nodeblend') dbo.fphotoflags('child')) )!= dbo.fphotoflags('blended') and ( flags & (dbo.fphotoflags('edge') dbo.fphotoflags('saturated')) ) = 0 and petromag_i > 17.5 and (petromag_r > 15.5 or petror50_r > 2) and (petromag_r > 0 and g > 0 and r > 0 and i > 0) and ( (petromag_r-extinction_r) < 19.2 and (petromag_r - extinction_r < ( (7/3) * (dered_g - dered_r) + 4 * (dered_r - dered_i) - 4 * 0.18) ) and ( (dered_r - dered_i - (dered_g - dered_r)/4-0.18) < 0.2) and ( (dered_r - dered_i - (dered_g - dered_r)/4-0.18) > -0.2) -- dered_ quantities already include reddening and ( (petromag_r - extinction_r * LOG10(2 * * petror50_r * petror50_r)) < 24.2) ) or ( (petromag_r - extinction_r < 19.5) and ( (dered_r - dered_i - (dered_g - dered_r)/4-0.18) > ( * (dered_g - dered_r)) ) and ( (dered_g - dered_r) > ( * (dered_r - dered_i)) ) ) and ( (petromag_r - extinction_r * LOG10(2 * * petror50_r * petror50_r) ) < 23.3 ) )
26 CasJobs No limit on number of records returned or the time it takes for your Query to run You can also upload and create your own Tables, then join those in SQL queries Note: it is very easy to write a very complex Query so appreciate what you are asking from the server The SDSS/SQL model is used by others, such as GALEX, MAST, VISTA etc
27 Other example ; WFAU
28 Lecture 5 Assignment (part 2 of 2) Using TOPCAT Retrieve a 5 cone of data from the SDSS DR7 photometry (sdssdr7-sda service, PhotoObjAll catalog) centered on 01:30:56, +43:55:49 Merge the resulting table with 2MASS (2mass-psc) using a sky matching radius of 2 Make a scatter plot of SDSS (u-g) colour versus 2MASS (J-K) colour, limited to sources brighter than SDSS g=20 Consider the source that is most red in the near-infrared and comment on its optical colour relative to other sources in the field
Exploring Gaia data with TOPCAT and the Virtual Observatory
Exploring Gaia data with TOPCAT and the Virtual Observatory Mark Taylor (University of Bristol) Gaia and the Unseen Brown Dwarf Question GREAT-ESF Workshop Torino University 26 March 2014 $Id: tcvo.tex,v
The Astronomical Data Warehouse
The Astronomical Data Warehouse Clive G. Page Department of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, UK Abstract. The idea of the astronomical data warehouse has arisen as the
MAST: The Mikulski Archive for Space Telescopes
MAST: The Mikulski Archive for Space Telescopes Richard L. White Space Telescope Science Institute 2015 April 1, NRC Space Science Week/CBPSS A model for open access The NASA astrophysics data archives
Organization of VizieR's Catalogs Archival
Organization of VizieR's Catalogs Archival Organization of VizieR's Catalogs Archival Table of Contents Foreword...2 Environment applied to VizieR archives...3 The archive... 3 The producer...3 The user...3
VisIVO, an open source, interoperable visualization tool for the Virtual Observatory
Claudio Gheller (CINECA) 1, Ugo Becciani (OACt) 2, Marco Comparato (OACt) 3 Alessandro Costa (OACt) 4 VisIVO, an open source, interoperable visualization tool for the Virtual Observatory 1: [email protected]
AST 4723 Lab 3: Data Archives and Image Viewers
AST 4723 Lab 3: Data Archives and Image Viewers Objective: The purpose of the lab this week is for you to learn how to acquire images from online archives, display images using a standard image viewer
Software Development for Virtual Observatories
Software Development for Virtual Observatories BRAVO Workshop February 2007 Rafael Santos 1 Warning! This presentation is biased. I'll talk about VO software development, including some under the hood
This work was done in collaboration with the Centre de Données astronomiques de Strasbourg (CDS) and in particular with F. X. Pineau.
Chapter 5 Database and website This work was done in collaboration with the Centre de Données astronomiques de Strasbourg (CDS) and in particular with F. X. Pineau. 5.1 Introduction The X-shooter Spectral
The Virtual Observatory: What is it and how can it help me? Enrique Solano LAEFF / INTA Spanish Virtual Observatory
The Virtual Observatory: What is it and how can it help me? Enrique Solano LAEFF / INTA Spanish Virtual Observatory Astronomy in the XXI century The Internet revolution (the dot com boom ) has transformed
How To Understand And Understand The Science Of Astronomy
Introduction to the VO [email protected] ESAVO ESA/ESAC Madrid, Spain The way Astronomy works Telescopes (ground- and space-based, covering the full electromagnetic spectrum) Observatories Instruments
ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS)
ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) Jessica Chapman, Data Workshop March 2013 ASKAP Science Data Archive Talk outline Data flow in brief Some radio
Fermi LAT and GBM Data Access, File Structure and the Standard Tools. Fermi Solar Analysis Workshop GSFC, August 2012 Elizabeth Ferrara - FSSC 1
Fermi LAT and GBM Data Access, File Structure and the Standard Tools Fermi Solar Analysis Workshop GSFC, August 2012 Elizabeth Ferrara - FSSC 1 Fermi Science Support Center Home Current News, FAQ, Helpdesk
on the establishment of a Brazilian Science Data Center (BSDC) General Guidelines
on the establishment of a Brazilian Science Data Center (BSDC) General Guidelines 1 Introduction Since the entrance of Brazil in ICRANet a variety of projects have been started to be developed a) in the
A reduction pipeline for the polarimeter of the IAG
A reduction pipeline for the polarimeter of the IAG Edgar A. Ramírez December 18, 2015 Abstract: This is an informal documentation for an interactive data language (idl) reduction pipeline (version 16.1)
VisIVO, a VO-Enabled tool for Scientific Visualization and Data Analysis: Overview and Demo
Claudio Gheller (CINECA), Marco Comparato (OACt), Ugo Becciani (OACt) VisIVO, a VO-Enabled tool for Scientific Visualization and Data Analysis: Overview and Demo VisIVO: Visualization Interface for the
MultiAlign Software. Windows GUI. Console Application. MultiAlign Software Website. Test Data
MultiAlign Software This documentation describes MultiAlign and its features. This serves as a quick guide for starting to use MultiAlign. MultiAlign comes in two forms: as a graphical user interface (GUI)
Oracle Database 10g Express
Oracle Database 10g Express This tutorial prepares the Oracle Database 10g Express Edition Developer to perform common development and administrative tasks of Oracle Database 10g Express Edition. Objectives
Multidimensional Data in the Virtual Observatory
IX Reunión Científica de la SEA Madrid- 15/09/2010 Red Temática SVO Multidimensional Data in the Virtual Observatory José Enrique Ruiz Grupo AMIGA Instituto de Astrofísica de Andalucía CSIC Contextual
INSPIRE Dashboard. Technical scenario
INSPIRE Dashboard Technical scenario Technical scenarios #1 : GeoNetwork catalogue (include CSW harvester) + custom dashboard #2 : SOLR + Banana dashboard + CSW harvester #3 : EU GeoPortal +? #4 :? + EEA
Visualization of Semantic Windows with SciDB Integration
Visualization of Semantic Windows with SciDB Integration Hasan Tuna Icingir Department of Computer Science Brown University Providence, RI 02912 [email protected] February 6, 2013 Abstract Interactive Data
SSIS Training: Introduction to SQL Server Integration Services Duration: 3 days
SSIS Training: Introduction to SQL Server Integration Services Duration: 3 days SSIS Training Prerequisites All SSIS training attendees should have prior experience working with SQL Server. Hands-on/Lecture
SQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
Migrating a (Large) Science Database to the Cloud
The Sloan Digital Sky Survey Migrating a (Large) Science Database to the Cloud Ani Thakar Alex Szalay Center for Astrophysical Sciences and Institute for Data Intensive Engineering and Science (IDIES)
IBM DB2 XML support. How to Configure the IBM DB2 Support in oxygen
Table of Contents IBM DB2 XML support About this Tutorial... 1 How to Configure the IBM DB2 Support in oxygen... 1 Database Explorer View... 3 Table Explorer View... 5 Editing XML Content of the XMLType
Constructing the Subaru Advanced Data and Analysis Service on VO
Constructing the Subaru Advanced Data and Analysis Service on VO Yuji Shirasaki on behalf of ADC National Astronomical Observatory of Japan Astronomy Data Center Contents Subaru Telescope and Instruments
MSSQL quick start guide
C u s t o m e r S u p p o r t MSSQL quick start guide This guide will help you: Add a MS SQL database to your account. Find your database. Add additional users. Set your user permissions Upload your database
Audit TM. The Security Auditing Component of. Out-of-the-Box
Audit TM The Security Auditing Component of Out-of-the-Box This guide is intended to provide a quick reference and tutorial to the principal features of Audit. Please refer to the User Manual for more
NaviCell Data Visualization Python API
NaviCell Data Visualization Python API Tutorial - Version 1.0 The NaviCell Data Visualization Python API is a Python module that let computational biologists write programs to interact with the molecular
Mass limit on Nemesis
Bull. Astr. Soc. India (2005) 33, 27 33 Mass limit on Nemesis Varun Bhalerao 1 and M.N. Vahia 2 1 Indian Institute of Technology Bombay, Powai, Mumbai 400 076, India 2 Tata Institute of Fundamental Research,
Description of the Dark Energy Survey for Astronomers
Description of the Dark Energy Survey for Astronomers May 1, 2012 Abstract The Dark Energy Survey (DES) will use 525 nights on the CTIO Blanco 4-meter telescope with the new Dark Energy Camera built by
Reduced data products in the ESO Phase 3 archive (Status: 15 May 2015)
Reduced data products in the ESO Phase 3 archive (Status: 15 May 2015) The ESO Phase 3 archive provides access to reduced and calibrated data products. All those data are stored in standard formats. The
MOC HEALPix Multi-Order Coverage map Version 1.0
International Virtual Observatory Alliance MOC HEALPix Multi-Order Coverage map Version 1.0 IVOA Recommendation 2 June 2014 This version: 1.0: Recommendation 2014-06-02 Previous version(s): None Interest/Working
MySQL for Beginners Ed 3
Oracle University Contact Us: 1.800.529.0165 MySQL for Beginners Ed 3 Duration: 4 Days What you will learn The MySQL for Beginners course helps you learn about the world's most popular open source database.
Visualizing Data: Scalable Interactivity
Visualizing Data: Scalable Interactivity The best data visualizations illustrate hidden information and structure contained in a data set. As access to large data sets has grown, so has the need for interactive
Implementing a Data Warehouse with Microsoft SQL Server
This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse 2014, implement ETL with SQL Server Integration Services, and
Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research
Astrophysics with Terabyte Datasets Alex Szalay, JHU and Jim Gray, Microsoft Research Living in an Exponential World Astronomers have a few hundred TB now 1 pixel (byte) / sq arc second ~ 4TB Multi-spectral,
Introduction To Hive
Introduction To Hive How to use Hive in Amazon EC2 CS 341: Project in Mining Massive Data Sets Hyung Jin(Evion) Kim Stanford University References: Cloudera Tutorials, CS345a session slides, Hadoop - The
Welcome to PowerClaim Net Services!
Welcome to PowerClaim Net Services! PowerClaim Net Services provides a convenient means to manage your claims over the internet and provides detailed reporting services. You can access PowerClaim Net Services
Planning and Scheduling Software for the Hobby Eberly Telescope. Niall I. Gaffney Hobby Eberly Telescope Mark E. Cornell McDonald Observatory
Planning and Scheduling Software for the Hobby Eberly Telescope Niall I. Gaffney Hobby Eberly Telescope Mark E. Cornell McDonald Observatory What is the HET project?! Joint Project: University of Texas
Implementing a Data Warehouse with Microsoft SQL Server
Page 1 of 7 Overview This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL 2014, implement ETL
OECD.Stat Web Browser User Guide
OECD.Stat Web Browser User Guide May 2013 May 2013 1 p.10 Search by keyword across themes and datasets p.31 View and save combined queries p.11 Customise dimensions: select variables, change table layout;
How To Contact The Author... 4. Introduction... 5. What You Will Need... 6. The Basic Report... 7. Adding the Parameter... 9
2011 Maximo Query with Open Source BIRT Sometimes an organisation needs to provide certain users with simple query tools over their key applications rather than complete access to use the application.
COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER
Page 1 of 8 ABOUT THIS COURSE This 5 day course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL Server
Data analysis of L2-L3 products
Data analysis of L2-L3 products Emmanuel Gangler UBP Clermont-Ferrand (France) Emmanuel Gangler BIDS 14 1/13 Data management is a pillar of the project : L3 Telescope Caméra Data Management Outreach L1
LABSHEET 1: creating a table, primary keys and data types
LABSHEET 1: creating a table, primary keys and data types Before you begin, you may want to take a look at the following links to remind yourself of the basics of MySQL and the SQL language. MySQL 5.7
Oracle Data Integrator: Administration and Development
Oracle Data Integrator: Administration and Development What you will learn: In this course you will get an overview of the Active Integration Platform Architecture, and a complete-walk through of the steps
How to Design and Create Your Own Custom Ext Rep
Combinatorial Block Designs 2009-04-15 Outline Project Intro External Representation Design Database System Deployment System Overview Conclusions 1. Since the project is a specific application in Combinatorial
COURSE SYLLABUS COURSE TITLE:
1 COURSE SYLLABUS COURSE TITLE: FORMAT: CERTIFICATION EXAMS: 55043AC Microsoft End to End Business Intelligence Boot Camp Instructor-led None This course syllabus should be used to determine whether the
Galaxy Survey data analysis using SDSS-III as an example
Galaxy Survey data analysis using SDSS-III as an example Will Percival (University of Portsmouth) showing work by the BOSS galaxy clustering working group" Cosmology from Spectroscopic Galaxy Surveys"
Pivot Charting in SharePoint with Nevron Chart for SharePoint
Pivot Charting in SharePoint Page 1 of 10 Pivot Charting in SharePoint with Nevron Chart for SharePoint The need for Pivot Charting in SharePoint... 1 Pivot Data Analysis... 2 Functional Division of Pivot
Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days
Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days Course
OSKAR 2.4.0 Example Revision: 8
OSKAR Example Version history: Revision Date Modification 1 2012-04-24 Creation 2 2012-05-15 Updated figures and text for correct polarisation order. 3 2012-06-13 Updated figures and text to reflect changes
The World-Wide Telescope, an Archetype for Online Science
The World-Wide Telescope, an Archetype for Online Science Jim Gray, Microsoft Research Alex Szalay, Johns Hopkins University June 2002 Technical Report MSR-TR-2002-75 Microsoft Research Microsoft Corporation
Netezza Workbench Documentation
Netezza Workbench Documentation Table of Contents Tour of the Work Bench... 2 Database Object Browser... 2 Edit Comments... 3 Script Database:... 3 Data Review Show Top 100... 4 Data Review Find Duplicates...
Tutorial 2 Online and offline Ship Visualization tool Table of Contents
Tutorial 2 Online and offline Ship Visualization tool Table of Contents 1.Tutorial objective...2 1.1.Standard that will be used over this document...2 2. The online tool...2 2.1.View all records...3 2.2.Search
Adam Rauch Partner, LabKey Software [email protected]. Extending LabKey Server Part 1: Retrieving and Presenting Data
Adam Rauch Partner, LabKey Software [email protected] Extending LabKey Server Part 1: Retrieving and Presenting Data Extending LabKey Server LabKey Server is a large system that combines an extensive set
What s new in TIBCO Spotfire 6.5
What s new in TIBCO Spotfire 6.5 Contents Introduction... 3 TIBCO Spotfire Analyst... 3 Location Analytics... 3 Support for adding new map layer from WMS Server... 3 Map projections systems support...
EA-ARC ALMA ARCHIVE DATA USER GUIDEBOOK
EA-ARC ALMA ARCHIVE DATA USER GUIDEBOOK Prepared by James O. Chibueze NAOJ Chile Observatory Purpose of this handbook: This handbook is aimed at providing fundamental information on how and where to access
Information Server Documentation SIMATIC. Information Server V8.0 Update 1 Information Server Documentation. Introduction 1. Web application basics 2
Introduction 1 Web application basics 2 SIMATIC Information Server V8.0 Update 1 System Manual Office add-ins basics 3 Time specifications 4 Report templates 5 Working with the Web application 6 Working
Top 10 Things to Know about WRDS
Top 10 Things to Know about WRDS 1. Do I need special software to use WRDS? WRDS was built to allow users to use standard and popular software. There is no WRDSspecific software to install. For example,
ICE Trade Vault. Public User & Technology Guide June 6, 2014
ICE Trade Vault Public User & Technology Guide June 6, 2014 This material may not be reproduced or redistributed in whole or in part without the express, prior written consent of IntercontinentalExchange,
1 File Processing Systems
COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.
Chapter 24: Creating Reports and Extracting Data
Chapter 24: Creating Reports and Extracting Data SEER*DMS includes an integrated reporting and extract module to create pre-defined system reports and extracts. Ad hoc listings and extracts can be generated
How to Manage a Calibration Database
OGIP Calibration Memo CAL/GEN/92-015 (CALDB Management Guide) i OGIP Calibration Memo CAL/GEN/92-015 How to Manage a Calibration Database Michael F. Corcoran Code 662, NASA/GSFC, Greenbelt, MD 20771 Version:
The Bamberg photographic plate archive
The Bamberg photographic plate archive The digitizing project H. Edelmann, N. Jansen, U. Heber, H. Drechsel, J. Wilms, I. Kreykenbohm Dr. Karl Remeis-Observatory & ECAP Astronomical Institute Friedrich-Alexander-University
Galaxy Morphological Classification
Galaxy Morphological Classification Jordan Duprey and James Kolano Abstract To solve the issue of galaxy morphological classification according to a classification scheme modelled off of the Hubble Sequence,
