Web and Big Data at LIG Marie-Christine Rousset (Pr UJF, déléguée scientifique du LIG)
Data and Knowledge Processing at Large Scale Officers: Massih-Reza Amini - Jean-Pierre Chevallet Teams: AMA EXMO GETALP HADAS MRIM SLIDE STEAMER Scientific Focus: Data mining, Natural Language Processing, Machine learning, DBMS, GIS, Information Retrieval, Social networks, Semantic Web, Linked Data 2
Distributed Systems, Parallel Computing, and Networks Officers: Vivien Quéma - Arnaud Legrand Teams DRAKKAR MESCAL MOAIS NANOSIM ERODS Scientific Focus HPC Cloud Computing Future Internet Multi-Core Programming Parallel and Embedded Systems
LIG is involved in many projects and infrastructure (Clouds/HPC) for Big Data Analytics European projects FP7 ICT Exascale Mont-Blanc 1 (2011-2014) FP7 ICT Exascale Mont-Blanc 2 (2013-2016) FP7 IRSES HPC GA (2011-2014) FP7 BioASQ (2012-2014) (large-scale categorization and question-answering for the bio-medical domain) National projects FUI Minalogic SoCTrace (2011-2015)( Analysis of traces of execution produced by multi-core embedded applications). ANR Clouds@Home (2009-2013) ANR SONGS (2011-2015) FSN OpenCloudware (2012-2014) PIA DATALYSE (2013-2016) (intelligent warehouses for heterogeneous big data) ANR Class-Y (classification in large-scale taxonomies application to taxonomies as MeSH) (2011-2015) ANR Qualinca (methods and algorithms for quality and interoperability of large documentary catalogs) ANR PAGODA (practical algorithms for ontology-based data access). MASTODONS projects PROSPECTOM (interactive study of proteoms via statistical learning and data aggregation methods) ARESOS (machine learning/data mining/information access for social network analysis) GARGANTUA (theoretical aspects of machine learning/data mining for big data) Infrastructures Meso-centre Ciment (HPC platform in Grenoble) EMERA and Grid 5000 projects
DATALYSE (PIA: appel Cloud and Big Data) Goal: deliver a collection of efficient data processing tools, referred to as Datalysers, to prepare, transform, extract value from and visualize Big Data Joint work between research and industry Academics: LIG (HADAS, ERODS, TyRex), INRIA Saclay, LIFL, LIRMM Industry: Eolas, Business et Decision (B&D), STIME Mousquetaires Timeline: started in May 2013 for a period of 42 months Deliverable 1: Big Data preparation datalysers Deliverable 2: Big Data transformation datalysers Deliverable 3: Big Data visualization datalysers Datasets and Platforms: real datasets ranging from User Big Data (UBD) to Monitoring Big Data (MBD) Website: http://www.datalyse.fr
DATALYSE Use Cases Linked/Open Data Provide access to clean and enriched datasets on museums in Grenoble Datasets: UBD Application: visualization layer to improve users experience in museums Traffic Analysis Interactive data center traffic statistics for different ISPs, hosted applications, geographic regions and time periods Datasets: MBD Application: traffic anomaly detection Digital Marketing Mining customer traffic on hosted websites Datasets: UBD Application : optimize conversion rate by monitoring customer traffic Retail Determining what makes customers leave the store Datasets: UBD Application: help better organize promotional offers for recurring customers
Datalyse architecture
Data linkage and enrichment (geo-localized, personalized) Ontology-based information access and integration Semantic search Data disambiguisation 8 Semantic Web and Linked Open Data > 31 billion RDF triples
Semantic Web technologies are now mature for creating added-value to data and for innovative applications Example of the Living Book of Anatomy (funded by PERSYVAL-lab) Description of anatomic objects, constraints, functions and 3D models «3Dmodel1 describes the Sartorius which is a Muscle that participates to the Flexion of the Knee» Reasoning and querying capabilities «which 3D objects refer to muscles that participate to the Flexion of the Knee?» Evolutive and efficient tool for patient-specific 3D anatomic visualization and simulation
My Corporis Fabrica ontology Description of anatomic objects, constraints, functions and 3D aspects «3Dmodel1 describes the Sartorius which is a Muscle that participates to the Flexion of the Knee» Reasoning and Declarative Querying capabilities on knowledge «Which 3D objects refer to muscles that participate to the Evolutive and Efficient tool for Flexion of Knee?» 75000 classes, 11 rules, 1M RDF triplets knowledge driven 3D anatomic
Conclusion Le LIG a des compétences larges et transversales autour du Web et Big Data Allant des infrastructures HPC et Cloud, aux systèmes de gestion de données et de connaissances à grande échelle, et la visualisation d informations pour l aide à la décision humaine (équipe IIHM du LIG) Allant des aspects fondamentaux de la science des données aux aspects systèmes et appliqués Le LIG est impliqué dans de nombreux projets collaboratifs nationaux et Européens sur ces thématiques