Copernicus and Big Data: Challenges and Opportunities Alessandro Annoni European Commission Joint Research Centre www.jrc.ec.europa.eu Serving society Stimulating innovation Supporting legislation
Big Data definition Big Data are defined as massive data sets having large, varied and complex structures that make their further storage, analysis, visualization, and processing difficult. Factors to be considered are: Volume, available data volumes are now larger; such volumes outstrip traditional storage and analysis techniques. Velocity, due to the high rate at which data is being collected and continuously made available. Variety, big data comes from a great variety of sources that are generally of three types: structured, semi structured and unstructured. Veracity, data sources (even in the same domain) are of widely differing qualities, with significant differences in the coverage, accuracy and timeliness of the data provided.
What Big Data means for Copernicus Can Copernicus being classified as Big Data? Volume = about 3,000 terabytes of EO data yearly Space Infrastructure (including ground segment) Velocity = several terabytes of new EO data per day Space and Services Infrastructures Veracity = Sentinel data and contributing missions (virtual constellations, virtual global coverages, time series, ) Space Infrastructures Variety = depends on future implementation of Copernicus and downstream Services (e.g. role and relevance of in situ-networks, citizen observatories, ) Services Infrastructures
EO landscape evolution: an example Global Land-Cover Observation capacity (*) Since the 1970s the number of missions failing within 3 years of launch has dropped from around 60% to less than 20%, the average operational life of a mission has almost tripled, increasing from 3.3 years in the 1970s to 8.6 years (and still lengthening), the average number of satellites launched per-year/per-decade has increased from 2 to 12 and spatial resolution increased from around 80 meters to less than one meter multispectral and less than half a meter for panchromatic.. (*) Who launched what, when and why; trends in Global Land-Cover Observation capacity from civilian Earth Observation satellites. Alan S. Belward and Jon O. Skøien
Challenges Copernicus is more than data
User community of EO increased significantly: the example of GOOGLE
Which will be the future users of Copernicus? EO Professionals Other Private Services (e.g Transport, Tourism, ) Public Authorities Policy and Decision Makers Research Education (including schools) Public (all citizens)?? Open to new market opportunities from few to many users BUT Each different category could have different requirements for data management
An example of professional User: JRC support to the CAP Manage image acquisition for (CAP) area-based controls nearly 7 M euro annual budget planning, programming of satellites, liaison with stakeholders data acquisition, validation, QC, delivery, and storing/sharing. Community Image Data portal (CID) online archive of satellite imagery Contains satellite imagery Central catalogue to search data Web services to access data File-based access for internal use Currently: approx. 80 TB of data Licensing: 150 000 LR/MR images (km, 100 s m) 20 000 HR images (5-50 m) 7.6 M km² VHR (< 5 m) Use of INSPIRE and GEOSS definitions Click to accept EULA (End User Licence Agreement) legally and technically implemented
Copernicus and INSPIRE INSPIRE Directive foresees the creation of a European Spatial Data Infrastructure. The INSPIRE Geoportal is the Central access point to the infrastructure and resources (>300.000) The face of INSPIRE How to better connect Copernicus and MS data and services? How to use Copernicus data to create/update the information required by INSPIRE?
Extend INSPIRE to other areas and integrate with Copernicus Services
Open Research Data Open Science requires Open Access to data Data should be easily accessible and usable within research infrastructures
Copernicus Data Tsunami Are we looking to protect ourselves? Or do we want to profit of this opportunity to produce new energy? Opportunities
Massive diffusion of cheap sensors provides new opportunities and challenges Mobile phones sensing better in some fields than others (e.g. noise) Publiclaboratory.com Waspmotes Drones some limitations relating to regulatory framework Waspmotes need programming and issues of calibration and response time but opportunities high..
Combining Remote Sensing with Social Sensing
Large scale experiment at JRC to assess quality of social network data: forest fires More than 20million Tweets and 1 million Flickr images retrieved and analysed for fires South of France Spatio-temporal clustering and analysis shows 80% of fires correctly detected
From Data to Processes If you have BIG Data from multiple sources, you cannot move the data for processing, need to move the analysis and processing to the data. To support multi-disciplinary research we need also to develop a shared understanding of what do you do with the data? How do you frame a problem and possible solution according to different disciplinary approaches. This quest requires to describe not just the data, but also processes or workflows, leading to new executable web services that are understood across disciplines.
Conclusions Copernicus is a revolution for Europe both in terms of amount of data and for open policies We need to consider a broader user community. In order to be able to serve all users we need to identify new ways for data dissemination (e.g. from small to large, from pixel extraction to area based) and processing (e.g. spatio-temporal time series, distributed geo-processing,..) Copernicus will have a positive impact on INSPIRE, Research and other relevant initiatives if it will be able to interoperate with them Also EO Professionals could better benefit if Copernicus will not stop at the level of pure data dissemination This ws will be an excellent opportunity to identify solutions and proposals to address these Challenges and Opportunities
Thank you for your kind attention