Big Data Challenge: Mining Heterogeneous Data Prof. Mihai Datcu German Aerospace Center (DLR) Munich Aerospace Faculty
Sensing & Big Data Big Data: - Computer hardware and the Cloud - Storage Challenges arise such as - Processing and analyzing - Sensors technologies Big Data does not mean the size Chris Eaton, Dirk Deroos, Tom Deutsch, George Lapis, and Paul Zikopoulos, Understanding Big Data.: McGraw-Hill Companies, April 2012, Folie 2 2
Big Data Mining Emerging applications Big Data Mining The research areas Folie 3 3
MULTISPECTRAL and SPATIAL & INFORMATION CONTENT WorldView 8 bands, 2 meter: spectral classes Spatial categories Folie 4 4
Query by Example: browsing and exploring large data sets Image query Query result Effectiveness evaluation: Precision & Recall Folie 5 5
TerraSAR-X, 1m resolution SAR images: HUGE DIVERSITY OF INFORMATION CONTENT Folie 6 6
Semantic annotation based on active learning Positive and negative examples Methodology: PF algorithm Classification SVM with RF Annotated category Semantic Patches Collections Ground truth Optimal parameters: product type (MGD), mode (High resolution Spotlight), geometric resolution configurations (RE), patch size (160 x 160 pixels); PF algorithm (Gabor filters) 7 Folie 7 7
INFORMATION CONTENT EXPLORATION: Visual Analitics 1 HS TerraSAR-X Scene = up to10 000 image patches (100 x 100 m) Folie 8 8
Semantic catalogues 300 cities, 400 000 image tiles, 850 words Folie 9 9
From CLC to EO semantic taxonomy Legend - categories defined for Venice using CORINE Land Cover nomenclature: Marine waters coastal lagoons Marine waters sea and ocean Urban fabric Pastures Forest Heterogeneous agricultural areas Open spaces with little or no vegetation Industrial, commercial and transport units Open spaces with little or no vegetation Artificial, non-agricultural vegetated areas Using: CLC 10 categories; our methodology 17 categories In the case of CLC some categories are mixed together (e.g., the bridges are included in marine waters coastal lagoons) Bridge Port Airport Water and boats Venice taxonomies (using TerraSAR-X data) Water and Bouy Water Agriculture Vegetation Cemetery vegetation Railway tracks Urban Water and urban Breaking waves River deposit Beach area Vegetation and buildings Folie 10 10
Query by Semantics TerraSAR-X data model Semantic annotation of TerraSAR-X image content queries SELECT label_id, name, FROM annotation a Join label l on a.label_id=l.label_id 11 Folie 11 11
Satellite Image Time Series Evolution Classes and Data Analitics Folie 12 12 12
In-situ data: LUCAS 2009/2012 Folie 13 13
EXOGENOUS SOURCES & EO DATA ANALITICS: LUCAS & TerraSAR-X Folie 14 14
Folie 15 15
Features Volume and multimodality of data is growing Data and information is spatio-temporal and unstructured Users want to have the knowledge Interactive is the only way of operation Exploration is predominant Context is critical and relevant Users are interested in information and knowledge independent of conjecture Folie 16 16 16
Challanges Too long cycle theory - tehnology users More to work for inter-domains communication (application but also theory) More applications on real data needed Folie 17 17 17
Selected Publications 1. Blanchart, Pierre and Ferecatu, Marin and Cui, Shiyong and Datcu, Mihai (2014) Pattern retrieval in large image databases using multiscale coarse-to-fine cascaded active learning, IEEE JSTARS. (in press) 2. Cerra, Daniele and Datcu, Mihai (2013), Expanding the Algorithmic Information Theory Frame for Applications to Earth Observation. Entropy, 15 (1), pp. 407-415. 3. Cui, Shiyong and Dumitru, Corneliu Octavian and Datcu, Mihai (2013), Ratio-Detector-Based Feature Extraction for Very High Resolution SAR Image Patch Indexing. IEEE Geoscience and Remote Sensing Letters, 10 (5), pp. 1175-1179. 4. Dumitru, Octavian and Datcu, Mihai (2013), Information Content of Very High Resolution SAR Images: Study of Feature Extraction and Imaging Parameters. IEEE Transactions on Geoscience and Remote Sensing, 51 (8), pp. 4591-4610. 5. Vaduva, Corina and Gavat, Inge and Datcu, Mihai (2013), Latent Dirichlet Allocation for Spatial Analysis of Satellite Images. IEEE Transactions on Geoscience and Remote Sensing, 51 (5), pp. 2770-2786. 6. Vaduva, Corina and Costachioiu, Teodor and Patrascu, Carmen and Gavat, Inge and Lazarescu, Vasile and Datcu, Mihai (2013), A Latent Analysis of Earth Surface Dynamic Evolution Using Change Map Time Series. IEEE Transactions on Geoscience and Remote Sensing, 51 (4), pp. 2105-2117. 7. Venganzones, Miguel and Datcu, Mihai and Graa, Manuel (2013), Further results on dissimilarity spaces for hyperspectral images RF-CBIR. Pattern Recognition Letters, 34 (14), pp. 1659-1668. 18 Folie 18 18
Publications (journals) 9. Dumitru, Corneliu Octavian and Datcu, Mihai (2013), Information Content of Very High Resolution SAR Images: Semantics, Geospatial Context, and Ontologies. JSTARS, (submitted) 10. Espinoza-Molina, Daniela and Datcu, Mihai (2013), Earth-Observation Image Retrieval Based on Content, Semantics, and Metadata. IEEE Transactions on Geoscience and Remote Sensing, Early Access, pp. 1-15. 11. Singh, Jagmal and Datcu, Mihai (2013), SAR Image Categorization With Log Cumulants of the Fractional Fourier Transform Coefficients. IEEE Transactions on Geoscience and Remote Sensing, Early Access, pp. 1-10. 12. Singh, Jagmal and Espinoza-Molina, Daniela and Datcu, Mihai (2013), Evaluation of Gibbs Random Fields-based and Wavelet-based Methods for Primitive Feature Extraction in Metric-Resolution SAR Images. JSTARS, (submitted) 13. Vaduva, Corina and Gavat, Inge and Datcu, Mihai (2013), Latent Dirichlet Allocation for Spatial Analysis of Satellite Images. IEEE Transactions on Geoscience and Remote Sensing, 51 (5), pp. 2770-2786. 14. Vaduva, Corina and Costachioiu, Teodor and Patrascu, Carmen and Gavat, Inge and Lazarescu, Vasile and Datcu, Mihai (2013), A Latent Analysis of Earth Surface Dynamic Evolution Using Change Map Time Series. IEEE Transactions on Geoscience and Remote Sensing, 51 (4), pp. 2105-2117. 15. Venganzones, Miguel and Datcu, Mihai and Graa, Manuel (2013), Further results on dissimilarity spaces for hyperspectral images RF-CBIR. Pattern Recognition Letters, 34 (14), pp. 1659-1668. And ca. 30 conference proceedings articles 19 Folie 19 19
Collaboration with CNES and ParisTech Collaboration with University Politehnica Bucharest A Virtual Observatory for TerraSAR-X Data (FP7 ICT) Earth Observation Image Librarian (ESA GSTP) Folie 20 20
The 9 th ESA-SatCen-JRC Image Information Mining Conference: the Sentinels Era 5-7 March, 2014 Romanian Space Agency Bucharest, Romania Folie 21 21
12-14 November 2014 ESA-ESRIN Frascati, Italy Call for papers and participation Folie 22 22