Bringing Big Data to the Solar System Paulo Penteado Northern Arizona University, Flagstaff (visiting David Trilling) pp.penteado@gmail.com http://www.ppenteado.net
What is Big Data and why do we care?
What is Big Data and why do we care? Science special edition (2011) Data explosion Both in science and industry, available data grows exponentially with time. We have more data than humans will ever be able to examine directly. Data-intensive science is now possible, but requires new methods. http://www.sciencemag.org/site/special/data/
Tony Hey, Stewart Tansley, Kristin Tolle, (editors), 2009 http://www.amazon.com/the-fourth-paradigm-data-intensive-scientific/dp/0982544200/ PDF (free) http://research.microsoft.com/en-us/collaboration/fourthparadigm/
Big Data in Astronomy
Big Data in Astronomy Old-style archives: Lots of files in hard drives No standard to organize the data No standard way to explore the data Virtual Observatory (VO) era: Data is processed and organized into databases Data is directly accessible online No need to learn how to find and calibrate data
Big Data in Astronomy The Virtual Observatory Online services, archives and tools, to provide the means to find, explore and analyze data. VO is a concept, not a specific institution or service. The International Virtual Observatory Alliance (IVOA) just promotes and coordinates efforts, defines standards.
http://www.ivoa.net
Big Data enabling astronomy CasJobs Used by SDSS, GALEX, Kepler Online system to perform complex queries on catalogs and tables with the survey data products. Queries based on SQL User space to store query results and uploaded data.
CasJobs interface
Big Data enabling astronomy MAST Mikulski Archive for Space Telescopes HST, JWST, Kepler, FUSE, GALEX,... Varied search capabilities to find, visualize and produce calibrated data. VizieR Online access to tables and catalogs, big and small. Tools to find tables and cross-match them.
The VO is not just for observations
Big Data in Planetary Sciences The VO does not work well for the Solar System: Current archives based on sky positions. Even in name queries translated to Ra/Dec. Solar System objects move up to 360 / yr.
Big Data in Planetary Sciences The VO does not work well for the Solar System: Current archives based on sky positions. Even in name queries translated to Ra/Dec. Solar System objects move up to 360 / yr.
Big Data in Planetary Sciences The VO does not work well for the Solar System: Current archives based on sky positions. Even in name queries translated to Ra/Dec. Solar System objects move up to 360 / yr. Indexing must be by names.
Big Data in Planetary Sciences Some objects are visited by spacecraft. These observations (remote sensing) need cartography.
Big Data in Planetary Sciences
How we are making this happen The Planetary Archive For astronomical observations of Solar System bodies. Identify and catalog archived observations that contain Solar System bodies. Enable queries by names, with complex cross-matches from different data sources. Planetary Mapping For remote sensing (spacecraft) observations. Data is mapped on the surface of the target. Queries need data on the observation perspective.
A VO for the Solar System Several surveys have imaged large fractions of the sky:
A VO for the Solar System Archives must contain many observations with a Solar System body in the field of view.
A VO for the Solar System Archives must contain many observations with a Solar System body in the field of view.
A VO for the Solar System Archives must contain many observations with a Solar System body in the field of view. How many? Which observations?
A VO for the Solar System Archives must contain many observations with a Solar System body in the field of view. How many? Which observations? Unknown. Currently, no easy way to answer. Archives index observations by position in the sky. A Solar System object is just another point source. Some archives even filter out anything that moves in the images.
The Planetary Archive VO functionality for Solar System objects. We search all observations in an archive to find the images taken when a known object was in the field of view. Matches are recorded into a database, indexed by object name. Magnitudes of these sources are retrieved from catalogs. Development with D. Trilling, A. Szalay, T. Budavári, C. Fuentes
Implementation challenges There are 640 000 known Solar System objects. Some surveys have 107 images. Need to calculate where each object was when each image was taken.
Implementation challenges Matching positions to image footprints is a computationally heavy task. Matches must be evaluated, to exclude spurious matches to non-solar System objects.
Functionality Other object data needed to make useful queries:
Functionality Other object data needed to make useful queries: Dynamical orbital elements, orbit type (Near Earth, Main Belt, Trans Neptunian,...), family membership, orbit quality,...
Functionality Other object data needed to make useful queries: Dynamical orbital elements, orbit type (Near Earth, Main Belt, Trans Neptunian,...), family membership, orbit quality,... Taxonomic Spectral classification
Functionality Other object data needed to make useful queries: Dynamical orbital elements, orbit type (Near Earth, Main Belt, Trans Neptunian,...), family membership, orbit quality,... Taxonomic Spectral classification Physical albedo, diameter, rotation period,...
Functionality Other object data needed to make useful queries: Dynamical orbital elements, orbit type (Near Earth, Main Belt, Trans Neptunian,...), family membership, orbit quality,... Taxonomic Spectral classification Physical albedo, diameter, rotation period,... Name resolution Objects have many names, and get new names. Must find any kind of name, with any formatting.
Accessibility Different interfaces to be implemented: Online interactive system for complex queries prototype being built at http://www.ppenteado.net/pp_ssvo. HTTP API for automated queries (VO protocol). CasJobs interface.
Example query Find all objects observed by both SDSS and 2MASS, and join with the JPL database (orbits, albedo, diameter, taxonomy)
Data ingestion NASA/JPL and IAU/MPC databases, for orbital, physical and taxonomic data. First surveys to be processed: 1) GALEX: 2 filters (FUV: 0.15μm, NUV: 0.23μm), 1.2 field of view, 4"/pix <~21 mag 2) SDSS DR10: 5 filters (u: 0.36μm, g: 0.47μm, r: 0.62μm, i: 0.75μm, z: 0.89μm), 9' field of view, 0.4"/pix, < ~22 mag 3) 2MASS: 3 filters (J: 1.25μm, H: 1.65μm, K: 2.17μm), 9' field of view, 1"/pix, < ~ 16 mag, allsky
Data ingestion
Data ingestion GALEX / SDSS / 2MASS cross-match with 25 000 objects: SDSS DR10: 1006 observations, 451 unique objects. 2MASS: 720 observations, 493 unique objects. GALEX: 54790 observations, 18445 unique objects. SDSS x 2MASS: 220 objects observed by both. GALEX x SDSS x 2MASS: 89 objects observed by all 3. Extrapolating for 640 000 objects: ~2300 objects observed in 8 colors, from 0.36μm to 2.17μm.
Planetary mapping problems When an object is observed by spacecraft, it is not just an extended source.
Planetary mapping problems Observations not 2D (Ra/Dec). Perspective changes during the observations. Observations must be mapped on the object, not on the sky.
Planetary mapping problems Observations not just 2D (lat/lon): data change characteristics with varying illumination and perspective.
Planetary mapping problems Planetary bodies can change a lot in short time periods. 5 hours
Planetary mapping problems One of the main observation types is hyperspectral imaging:
Planetary mapping problems To be accessible and usable, remote sensing observations need an advanced archive: Must contain lots of geometric data (lat/lon, illumination, distance,...) For hyperspectral imaging, queries must use the spectra, not just metadata. Cartography must be correct, on the surface and above the limb. Calibrated data must be directly accessible online. Complex queries, with user-defined functions, must be possible. Visualization must be integrated, with interactive and programmatic generation of maps.
Planetary Mapping We are developing a hyperspectral remote sensing archive. Will be useful for many past and current missions: Cassini (Saturn system), Galileo (Jupiter system), New Horizons (Pluto system), Dawn (Ceres, Vesta), Rosetta (Churyumov-Gerasimenko), MRO (Mars),... Development with D. Trilling, J. Barnes, V. Pasek Demonstration system: titanbrowse A database and visualization system for Titan observations. A solution to most of the requirements of a planetary mapping archive. http://ppenteado.net/newhome/databases/titanbrowse/
titanbrowse interface
titanbrowse results Allowed the first detection of a tropical lake on Titan. Data had been public for years, no one had found the lake among the 107 archived spectra. http://dx.doi.org/doi:10.1038/nature11165
titanbrowse results
titanbrowse results
Conclusions The Virtual Observatory already brings Big Data into Astronomy. For Solar System objects, different tools are necessary. We are building 2 archives for the Solar System: One for astronomical-type observations. One for remote sensing observations. pp.penteado@gmail.com http://www.ppenteado.net