Dept. ISC Informatics, Systems & Collaboration UniGR Workshop: Big Data «The challenge of visualizing big data» Dr Ir Benoît Otjacques Deputy Scientific Director ISC
The Future is Data-based Can we help? 2
Who we are +/- 30 members (all MSc, MEng or PhD in Computer Science) Network of Partners from Luxembourg and abroad Funding from Ministry of Research EU / National research programs (FNR) Contract research (private/public) Outputs Fundamental research Applied Research ISC Scientific Papers R&D Studies Proof-of-concept Prototypes Professional Applications
Mission use of computer science to ease the understanding of complex big data coming from multiple and heterogeneous sources by primarily using visual representations accessed via any type of devices in various contexts of use. Only software applications layer (hardware, network not included)
Scope Data Provisionin g More than Data: Consider Meta-data More than Preprocessing: Visual Analytics Data Processing & Analysis Interactive Visualization of Data More than Graphics: Usable software tools One of the largest team in Europe focused on this topic (> 20 permanent positions) Software Tools Delivery
What we do CAD/CAM Scientific Vis Virtual Reality Computer Graphics Medical Imaging
What we do Visual Analytics Data Analytics Infovis Abstract Data Visual Data Mining
What we do www.calluna.lu
What we do Domain agnostic
What we do Business / Science Field expert How to analyse my network of friends? Field Question Web-based app with interactive visualization of social network contacts Solution usable on the field Our Group Mixed teams How to analyse network data? Generic Problem Instantiate a Generic Solution Multi-level graph drawing with semantic labelling Reuse / Adapt / Invent Potential generic solution(s) Graph drawing, dynamic graphs, adjacency matrices, graph clustering
Infovis & Visual Analytics User Interaction Raw Data Formatted & Structured Data Processed Data Visual Representation Data Acquisition Data Analysis & Mining Algorithms Drawing & Rendering Algorithms User with a problem to solve What does Big Data change?
What s the problem? 2 major challenges in Visual Analytics Scalability Dynamics Small, Mid-sized Big Static Data Well studied Open issues type A Dynamic Data Open issues type B Highly challenging (A and B) >> A+B
What s the VA problem? It s Big! Big Static Data Heterogeneous high volume data sources Scalability of data provisioning HW/SW infrastructure Scalability of mining algorithms Scalability of visual representations Software engineering issues How to run queries on distributed systems to explore big data sets? How to visualize a million multi-variate items on a screen? How to lower the time needed to run a clustering algorithm on xgbytes? How to design an interactive user interface loading big data in < 1 sec?
What s the VA problem? It s Big! What if data processing is running in the background? What if the user wants seamless nagivation in the data set? Can this map be generated in <0.1 sec on a classic laptop? How a competing algo scales 36000 French Communes on a single screen Weighted by population size, spatially constrained
What s the VA problem? Data changes! Dynamic Mid-sized Data Heterogeneous data streams Dynamic data provisioning HW/SW infrastructure Evolution of mining algorithms Evolution of visual representations Software engineering issues How to aggregate data streams? How to visualize a continuously changing data structure? How to adapt clustering algorithms to consider dynamic data? How to design an interactive user interface continuously fed by data?
What s the VA problem? Data changes! Clustering of streams V(t 1 ) V(t 2 ) V(t 3 ) V(t i ) V(t i+1 ) V(t n ) time W(t 1 ) W(t 2 ) W(t 3 ) W(t i ) W(t i+1 ) W(t n ) C1(t i ) C2(t i ) Update frequency? C1a(t i+1 ) C1b(t i+1 ) C3(t i ) Mental map? What if a MDS projection must be computed in real time to visualize the clusters? What if the user wants to adapt clustering parameters at run time? What if the connexion to a data stream is lost?
My God! Data are big and are changing! Big Dynamic Data Solutions for type A and type B problems often do not work for (A and B) problems Pre-computation (batch mode) available for big static data sets streams? Real time fusion of data streams still possible if 10 n heterogeneous streams? Stability of mental maps of the user? Aggregation strategy for multiscale data wrt time and wrt space? What if the user device is a smartphone with poor computing resources?
My God! Data are big and are changing! How/when to update it? How/when to compute it? How not to loose the user? How to interact with it?
Enabling decisions through Visual Analytics Big Data Visual Analytics techniques Data Provisioning Batch Interactive Streaming Big Systems Rethinking/adapt existing algorithms / techniques w.r.t Big Data 19
Collaborations Your Scientific / Business Problem Data Provisioning is an issue Data Visualization is an issue Data Analytics is an issue You need a software tool to do this Probably we should discuss together
Before joining ISC, its members were there
Big Data Visualization
Conclusion We are here today to join our respective forces to face a BIG challenge
Contact: Dr Ir Benoît Otjacques otjacque@lippmann.lu