Distributed knowledge sharing and production through collaborative e-science platforms

Size: px
Start display at page:

Download "Distributed knowledge sharing and production through collaborative e-science platforms"

Transcription

1 Distributed knowledge sharing and production through collaborative e-science platforms PhD Defense - Alban Gaignard Advisor: Johan Montagnat CNRS, University of Nice Sophia Antipolis, I3S Laboratory, MODALIS research group 1

2 Translational research & e-science Research laboratory Healthy population Target population Scanner Medical data Data processing Legacy database 2

3 Translational research & e-science Research laboratory 2

4 Translational research & e-science Research laboratory... 2

5 Translational research & e-science Research laboratory... e-science platform 2

6 Translational research & e-science Research laboratory Sharing e-science platform 2

7 Translational research & e-science Research laboratory Sharing 2 Processing e-science platform 2

8 Translational research & e-science Heterogeneity Research laboratory Distribution Dynamicity Scalability 1 Sharing Knowledge 2 Processing e-science platform 2

9 Challenges & Hypothesis Questions: Scalability/Distribution: how to efficiently search over large distributed data sources? Dynamicity/Heterogeneity: how to cope with legacy/non-relocatable data? how to dynamically combine several independent data sources? Knowledge: how to share/search for data and processing tools with high expressivity? better results interpretation? 3

10 Challenges & Hypothesis Questions: Scalability/Distribution: how to efficiently search over large distributed data sources? Dynamicity/Heterogeneity: how to cope with legacy/non-relocatable data? how to dynamically combine several independent data sources? Knowledge: how to share/search for data and processing tools with high expressivity? better results interpretation? Hypothesis: H1: Domain ontologies H2: Data sources are distributed and autonomous H3: e-science platforms allow to share & produce scientific resources 3

11 Challenges & Hypothesis Questions: Scalability/Distribution: how to efficiently search over large distributed data sources? Dynamicity/Heterogeneity: how to cope with legacy/non-relocatable data? how to dynamically combine several independent data sources? Knowledge: how to share/search for data and processing tools with high expressivity? better results interpretation? Hypothesis: H1: Domain ontologies H2: Data sources are distributed and autonomous H3: e-science platforms allow to share & produce scientific resources Scientific areas Knowledge engineering: reasoning on semantic description of data & processing tools e-science: computing infra. to process/ share/re-purpose scientific resources 3

12 Challenges & Hypothesis Questions: Scalability/Distribution: how to efficiently search over large distributed data sources? Dynamicity/Heterogeneity: how to cope with legacy/non-relocatable data? how to dynamically combine several independent data sources? Knowledge: how to share/search for data and processing tools with high expressivity? better results interpretation? Hypothesis: H1: Domain ontologies H2: Data sources are distributed and autonomous H3: e-science platforms allow to share & produce scientific resources Scientific areas Knowledge engineering: reasoning on semantic description of data & processing tools e-science: computing infra. to process/ share/re-purpose scientific resources semantic e-science reducing "time-to-discovery" 3

13 Thesis Objectives 4

14 Thesis Objectives Coherent sharing and production of distributed knowledge in Life-Science: Knowledge sharing: coping with semantic data volume, distribution, heterogeneity Knowledge production: extracting meaningful & long-term data from large & technical datasets 4

15 Main contributions 1. Knowledge base federation Transparent, efficient, and expressive semantic federated querying Abstract Knowledge Graphs [Web Intelligence'12] [IC'12 workshop] [MICCAI'12 workshop] 2. Semantic Workflows Characterization of semantically annotated services (Nature and Role) Semantic experiment summaries [KEOD'11] [IC'10 workshop] [TMI'13] [CBMS'11] 5

16 E-Science 1 : data integration 6

17 E-Science 1 : data integration E-T-L1 Centralized querying Data warehouse E-T-Ln Materialized Data Integration Extract - Transform - Load Efficiency Scalability Dynamicity Hardly relocatable data? 6

18 E-Science 1 : data integration E-T-L1 Sub-querying Centralized querying Data warehouse E-T-Ln Federated querying Sub-querying Materialized Data Integration Extract - Transform - Load Efficiency Scalability Dynamicity Hardly relocatable data? Virtualized Data Integration Distributed Query Processing Efficiency Scalability (Load/Volume) Dynamicity Data kept at source 6

19 E-Science 1 : distributed semantic querying DARQ Splendid SemWiq Sparql- DQP FedX KGRAM Distribution Performance Heterogeneity Dynamicity Expressivity ?? + +?

20 E-Science 1 : distributed semantic querying DARQ Splendid SemWiq Sparql- DQP FedX KGRAM Distribution Performance Heterogeneity Dynamicity Expressivity ?? + +? Missing expressivity (subset of SPARQL) Only SELECT queries on Basic Graph Patterns, no PATH expressions, no bound subjects for SemWiq, etc. 7

21 E-Science 1 : distributed semantic querying DARQ Splendid SemWiq Sparql- DQP FedX KGRAM KGRAM- DQP Distribution Performance Heterogeneity Dynamicity Expressivity ?? + +? Missing expressivity (subset of SPARQL) Only SELECT queries on Basic Graph Patterns, no PATH expressions, no bound subjects for SemWiq, etc. Balancing Expressivity & Performance 7

22 E-Science 1 : semantic data handling with KGRAM Representing, querying and reasoning on Knowledge Graphs Generic engine Expressivity: SPARQL 1.1 compliant Versatility: several data models (RDF, XML, SQL) Reasoning: RDFS entailments + Inference rules 8

23 E-Science 1 : semantic data handling with KGRAM Representing, querying and reasoning on Knowledge Graphs Generic engine Expressivity: SPARQL 1.1 compliant Versatility: several data models (RDF, XML, SQL) Reasoning: RDFS entailments + Inference rules query language Query parsing Query engine AQL Query evaluation AQL Data producer Query rewriting native QL AKG Data matching & filtering Data transforming native data Data source AQL: Abstract Query Language AKG: Abstract Knowledge Graph AKG KGRAM abstract machine 8

24 E-Science 2 : scientific workflows Semantic workflow (WF) environments METEOR-S ; Taverna/FETA ; BioCatalogue ; BioMOBY target WF design/sharing WF results interpretation through Provenance standards (Provenir, OPM PROV-*) Standards Scalability Linked Data approach Domain knowledge Linked e- BioInfra NeuGrid RDFProv ProvBase Provenan ce Data Wings/ Pegasus PaCE Taverna/ Janus / PROV-O published as a W3C Candidate Recommendation (11 December 2012) 9

25 E-Science 2 : scientific workflows Semantic workflow (WF) environments METEOR-S ; Taverna/FETA ; BioCatalogue ; BioMOBY target WF design/sharing WF results interpretation through Provenance standards (Provenir, OPM PROV-*) Standards Scalability Linked Data approach Domain knowledge Linked e- BioInfra NeuGrid RDFProv ProvBase Provenan ce Data Wings/ Pegasus PaCE Taverna/ Janus / NeuSem Store PROV-O published as a W3C Candidate Recommendation (11 December 2012) 9

26 Contribution 1 Knowledge Sharing (for e-science platforms)

27 Efficient & expressive sharing of knowledge graphs 11

28 Efficient & expressive sharing of knowledge graphs Objectives Transparent federated semantic engine Heterogeneity + Dynamicity Balancing expressivity and performance Distribution + Scalability + Knowledge 11

29 Efficient & expressive sharing of knowledge graphs Objectives Transparent federated semantic engine Heterogeneity + Dynamicity Balancing expressivity and performance Distribution + Scalability + Knowledge 11

30 Efficient & expressive sharing of knowledge graphs Objectives Transparent federated semantic engine Heterogeneity + Dynamicity Balancing expressivity and performance Distribution + Scalability + Knowledge Methods Abstract Knowledge Graphs Distributed Query Processing techniques Static and dynamic optimization 11

31 KGRAM-DQP: distributed query processing KGRAM-DQP (Federator) Query evaluator Parallel MetaProducer Web service client Web service client Producer Data producer #1 SPARQL Web service endpoint Data producer #2 SPARQL Web service endpoint rewritten query native results rewritten query native results Data source #1 Data source #2 12

32 KGRAM-DQP: distributed query processing KGRAM-DQP (Federator) Query evaluator Web service client Data producer #1 SPARQL Web service endpoint Data producer #2 rewritten query native results Data source #1 Parallel MetaProducer Web service client SPARQL Web service endpoint rewritten query native results Data source #2 Producer Cost of network communication Distributed query processing performance Service parallelism / optimizations 12

33 rithm 6 illustrates how to distribute the query over a set of federated knowledge b exploiting the parallelism of each remote producer. Results follow the SPARQL W3C recommendation 5 and are represented as a set of Results, each of them enco ing a set of Mappings between variables and values. KGRAM-DQP: parallel evaluation Algorithm 6: Fine-grained parallel distributed query processing, with an exp wait condition. Data: P roducers the set of SPARQL endpoints, EdgeReq the set of edge requests forming the SPARQL query, scheduler a thread pool allowing parallel execution. Result: Results the set of SPARQL results. 1 foreach (e 2 EdgeReq) do 2 foreach (p 2 P roducers) do in parallel 3 scheduler.submit(p.getedges(e)) ; 4 wait for scheduler ; 5 foreach (task 2 scheduler.getfinished()) do 6 Results task.getresults() ; (a) Synch. barrier (b) Pipelining The principle consists in iterating over each edge request forming the initial S query (line 1). Then, for each edge request, all federated SPARQL endpoints are concurrently (line 3). The federator then wait for all federated endpoints to finish 13

34 Static optimization: pushing applicable FILTERs Filtering irrelevant results the sooner (lighter network communications) add FILTER to each single triple pattern (if applicable) 74 Chapter 4. Semantic data and query distribution Input SPARQL query Listing 4.3: Full SPARQL query distributed over remote KGRAM endpoints PREFIX foaf: < PREFIX dbpedia: < SELECT DISTINCT?x?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. FILTER (CONTAINS (?name, Bobby A )) } Listing 4.4: Generated SPARQL query encapsulating a single edge request through the naive rewriting strategy PREFIX foaf: < CONSTRUCT {?x foaf:name?name} WHERE {?x foaf:name?name. } 14

35 Static optimization: pushing applicable FILTERs Filtering irrelevant results the sooner (lighter network communications) add FILTER to each single triple pattern (if applicable) 74 Chapter 4. Semantic data and query distribution Input SPARQL query Listing 4.3: Full SPARQL query distributed over remote KGRAM endpoints PREFIX foaf: < PREFIX dbpedia: < SELECT DISTINCT?x?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. FILTER (CONTAINS (?name, Bobby A )) } Rewritten sub-query Listing 4.4: Generated SPARQL query encapsulating a single edge request through the naive rewriting strategy PREFIX foaf: < CONSTRUCT {?x foaf:name?name} WHERE {?x foaf:name?name. } 14

36 SELECT DISTINCT?x?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. FILTER (CONTAINS (?name, Bobby A )) Static optimization: pushing } applicable FILTERs Filtering irrelevant results the sooner (lighter ing strategy network communications) add FILTER to each single triple pattern (if applicable) 74 Chapter 4. Semantic data and query distribution PREFIX foaf: < CONSTRUCT {?x foaf:name?name} WHERE {?x foaf:name?name. } Listing 4.3: Full SPARQL query distributed over remote KGRAM endpoints Input SPARQL query PREFIX foaf: < PREFIX dbpedia: < SELECT DISTINCT?x?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. FILTER (CONTAINS (?name, Bobby A )) } Listing 4.4: Generated SPARQL query encapsula Rewritten sub-query results behind federated endpoints, and thus the load of their processing by the federator. Listing 4.5: Optimized SPARQL query encapsula ing strategy Optimized sub-query PREFIX foaf: < CONSTRUCT {?x foaf:name?name} WHERE { Listing 4.4: Generated SPARQL query encapsulating a single edge request through the naive rewriting strategy?x foaf:name?name. FILTER (CONTAINS (?name, Bobby A )) PREFIX foaf: < } CONSTRUCT {?x foaf:name?name} WHERE {?x foaf:name?name. } 14

37 Dynamic optimization: pushing values Avoid re-evaluation by exploiting intermediate results (communication 74of already known values saved) Chapter [Bind 4. joins] Semantic data and query distribution Replacing variables by their known values for each single triple pattern. Input SPARQL query Listing 4.3: Full SPARQL query distributed over remote KGRAM endpoints PREFIX foaf: < PREFIX dbpedia: < SELECT DISTINCT?x?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. FILTER (CONTAINS (?name, Bobby A )) } Listing 4.4: Generated SPARQL query encapsulating a single edge request through the naive rewriting strategy Intermediate result?x = PREFIX foaf: < CONSTRUCT {?x foaf:name?name} WHERE {?x foaf:name?name. } 15

38 for, and only one triple should be produced f Dynamic optimization: pushing values Avoid re-evaluation by exploiting intermediate } results (communication 74of already known values saved) Chapter [Bind 4. joins] Semantic data and query distribution Replacing variables by their known values for each single triple pattern. Listing 4.7: Optimized SPARQL query encapsu Listing 4.3: Full SPARQL query distributed over remote KGRAM endpoints Rewritten sub-query rewriting strategy Input SPARQL query PREFIX foaf: < PREFIX dbpedia: < SELECT DISTINCT?x?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. FILTER (CONTAINS (?name, Bobby A )) } Listing 4.6: Optimized SPARQL query encapsu rewriting strategy PREFIX dbpedia: < CONSTRUCT {< dbpedi < dbpedia:birthd PREFIX dbpedia: < CONSTRUCT {?x dbpedia:birthdate?date } WHERE {?x dbpedia:birthdate?date } Listing 4.4: Generated SPARQL query encapsulating a single edge request through the naive rewriting strategy Intermediate result?x = PREFIX foaf: < CONSTRUCT {?x foaf:name?name} WHERE {?x foaf:name?name. } Listing 4.8: Optimized SPARQL query encapsu rewriting strategy PREFIX dbpedia: < CONSTRUCT { < dbpedia } WHERE { < dbpedia } 15

39 rewriting strategy for, and only one triple should be produced f PREFIX dbpedia: < CONSTRUCT {< Listing 4.6: Optimized dbpedia:birthdate SPARQL query?date} encapsu W < dbpedia:birthdate?date Dynamic optimization: pushing rewriting values strategy } PREFIX dbpedia: < CONSTRUCT {< dbpedi < dbpedia:birthd Avoid re-evaluation by exploiting intermediate } results (communication 74of already known values saved) Chapter [Bind 4. joins] Semantic data and query distribution rewriting strategy Replacing variables by their known values for each single triple pattern. Listing 4.7: Optimized SPARQL query encapsulating a single edg PREFIX dbpedia: < CONSTRUCT { Listing 4.7: Optimized SPARQL query encapsu Input Listing SPARQL 4.3: Full query SPARQL query distributed over remote KGRAM endpoints Rewritten sub-query?x dbpedia:birthdate rewriting?date strategy PREFIX foaf: < } WHERE { PREFIX dbpedia: < PREFIX dbpedia: < dbpedia:birthdate?date CONSTRUCT { SELECT DISTINCT?x?name?date WHERE { }?x dbpedia:birthdate?date?x foaf:name?name. } WHERE {?x dbpedia:birthdate?date.?x dbpedia:birthdate?date FILTER (CONTAINS (?name, Bobby A )) } } Listing 4.8: Optimized SPARQL query encapsulating a single edg rewriting Optimized strategy sub-query < dbpedia Listing 4.4: Generated SPARQL query encapsulating PREFIX a dbpedia: single edge < request Listing through 4.8: the Optimized naive rewriting strategy SPARQL query encapsu Intermediate result CONSTRUCT { rewriting strategy?x = < dbpedia:birthdate?date PREFIX foaf: < } WHERE { PREFIX dbpedia: < CONSTRUCT {?x foaf:name?name} WHERE { < CONSTRUCT { dbpedia:birthdate?date?x foaf:name?name. } < dbpedia } } WHERE { } 15 The experiments presented in section 4.3 will show that th

40 Experiment: large-scale benchmarking (1/2) Objective: performance assessment Material and Methods FedBench from the FedX team (50M triples ; 7 life-science SPARQL queries) Grid'5000 Computing Infrastructure FedX + Fuseki endpoints KGRAM-DQP 166 Chapter 8. Experime FedBench Life-Science datasets Data source Linked Data collection Size (triples) #1 ChEBI 7.3M #2 DBpedia sub-set #1 25.3M #3 DBpedia sub-set #2 18.3M #4 DrugBank 0.7M #5 KEGG Drug 1M Table 8.4: Updated FedBench Life Science data collections (52M triples) fragmented ov Listing 10.7: LS7 FedBench query FedBench Life-Science query #7 SELECT $drug $transform $mass WHERE { { $drug < berlin.de/drugbank/resource/drugbank/affectedorganism> } was reserved for the execution of the FedX federation engine, the othe reserved to expose the 5 data sources through a Fuseki SPARQL endpoin Humans and other mammals. It has to be noted that due to some incompatibilities, it has not been po FedX with OpenRDF-Sesame SPARQL endpoints (versions and described in [Schwarte et al., 2011]. $drug < berlin.de/drugbank/resource/drugbank/casregistrynumber> $cas. $keggdrug < $cas. $keggdrug < $mass FILTER ( $mass > 5 ) } OPTIONAL {$drug < $transform.} To evaluate KGRAM, we deployed a similar environment as previou berlin.de/drugbank/resource/drugbank/biotransformation> experiment 1 through the physical federation. We reserved 6 nodes of the S cluster, one of which was dedicated to the KGRAM federation engine, an ing nodes exposing the 5 data sources through KGRAM endpoints. 16

41 Experiment: large-scale benchmarking (2/2) Real distributed computing infrastructure Mean evaluation time over 10 runs 17

42 Experiment: large-scale benchmarking (2/2) Real distributed computing infrastructure Mean evaluation time over 10 runs 50% timeout variability variability 17

43 Highlights & short-term perspectives Highlights Transparent federated semantic querying No prior knowledge on data source content Performances between DARQ / Splendid and FedX [Distribution /Dynamicity] Expressive approach: SPARQL 1.1 support (Optional, Negation, Property path, aggregates) [Scalability] [Knowledge] Short-term perspectives Coarse-grain DQP (dynamic triple pattern grouping in SERVICE clauses) Prototype algorithm, but possibly ineffective (query planing) Relational database mediation Prototype SQL data producer in KGRAM-DQP [Scalability] [Heterogeneity] 18

44 Contribution 2 Knowledge Production (for e-science platforms)

45 Scientific workflow issues 20

46 Scientific workflow issues 1. Editing data-links between processes? 20

47 Scientific workflow issues 1. Editing data-links between processes? 2. Identifying the cause of failures or atypical results 20

48 Scientific workflow issues 1. Editing data-links between processes? 2. Identifying the cause of failures or atypical results Knowledge-oriented WF environments ease workflow design propagate knowledge on results 20

49 Design issues Workflow design issue, close-up: MRI MRI Registration Matrix x y a z t b Re-sampling MRI Several natures of treatment or data, not explicit at technical level Only considering nature: ambiguity 21

50 Design issues Workflow design issue, close-up: MRI MRI "reference" role Registration "patient" role Matrix x y a z t b Re-sampling MRI Several natures of treatment or data, not explicit at technical level Only considering nature: ambiguity need for Roles to relate data to processing tools! 21

51 Runtime issues Results exploitation issue, close-up: can-be-superimposed-with used wasgeneratedby Registration used used x y a z t b used Re-sampling wasgeneratedby Need for non-ambiguous service annotations to produce new domain-specific statements 22

52 Issues & Objectives Issues: (i) How to explicit the semantics of data processing? (ii) How to benefit from this knowledge... at experiment design-time? at experiment runtime? Objectives: (i) complexity of designing an e-science experiment (workflow) ; (ii) exploitation of results produced during data-intensive experiments. 23

53 Methods Several kinds of knowledge: Technical knowledge (OWL-S, OPM) ; Domain knowledge: 1.Nature of data and services ; 2.Role of data from the service point of view. Our contribution: 1.Domain-specific Role Taxonomy: clarifying bindings between technical service descriptions and domain concepts ; 2.Produce new valuable knowledge through inferences along platform exploitation. Supported by the OntoNeuroLOG domain ontology and the OPM provenance ontology. 24

54 Methods Several kinds of knowledge: Technical knowledge (OWL-S, OPM) ; Domain knowledge: 1.Nature of data and services ; 2.Role of data from the service point of view. Our contribution: 1.Domain-specific Role Taxonomy: clarifying bindings between technical service descriptions and domain concepts ; 2.Produce new valuable knowledge through inferences along platform exploitation. Supported by the OntoNeuroLOG domain ontology and the OPM provenance ontology. 24

55 concepts and Role concepts when annotating semantic service parameters by relying on a domain-specific role taxonomy. Figure 6.3 illustrates the taxonomy of roles dedicated to the characterization of the relationships between neuroimaging data and their dedicated processing. Role concepts are organized following the main classes of neuroimaging processing similarly to the OntoNeuroLOG dataset processing ontology. Neuroimaging Role taxonomy Domain-specific extension of the OPM Role class Roles to disambiguate the annotation of service parameters. Figure 6.3: A domain-specific role taxonomy characterizing how neuroimaging data can be related to neuroimaging defense, 15 march processing 2013, Sophia tools. Antipolis A. Gaignard, PhD 25

56 Neuroimaging Role taxonomy Figure 6.3 illustrates the taxonomy of roles dedicated to the characterization of the Domain-specific extension of the OPM Role class 134 Chapter 6. Semantic scientific workflows for knowledge capture and extension concepts and Role concepts when annotating semantic service parameters by relying on a domain-specific role taxonomy. relationships between neuroimaging data and their dedicated processing. Role concepts are organized following the main classes of neuroimaging processing similarly to the OntoNeuroLOG dataset processing ontology. Roles to disambiguate the annotation of service parameters. Figure 6.3: A domain-specific role taxonomy characterizing how neuroimaging data can be related to neuroimaging processing tools. As-reference Registration As-floating As-transformation x z 0 y t 0 As-unprocessed a b 1 As-transformation This taxonomy illustrates another example of disambiguation in the context of resampling processes. Indeed, the two roles As-affine-transformation and As-transformation-field precise how a matrix should be interpreted by a resampling process. If we consider two 3 3 matrices, they could share the same nature and representation format. However, one could be interpreted as a set of parameters for translation, rotation and scaling, in the context of an affine geometrical transformation, whereas the other one could be interpreted as a deformation field in the context of a non-rigid transformation. Relying on this taxonomy of roles, we are now able to precisely annotate the input and output parameters of our image registration service considered in the running example (figure 6.2) with both Natural and Role concepts. Both input images are characterized by a same Natural concept, T1 weighted magnetic resonance image (T1-MR). T1-MR can be considered as a Natural concept because it stands on its own and does not characterize how input data are related with any other entities. On the other hand, service input parameters can be annotated with two distinct Role concepts to characterize how input data are related to the registration process. The service input parameter interpreting data as floating (the moving data, that will finally be realigned) is annotated with role As-floating- Re-sampling As-resampled 25

57 Inference rule example Inference rules to produce semantic annotations We propose in this chapter a methodology for producing and deducing new meaningful statements. If we consider the result of the registration workflow p in Figure 6.1, it would be interesting to associate the atlas used as input in the tion process to the registered image produced. More generally, our approach the propagation of the effect of services (or sub-parts of workflow) to the produc For instance, we would like to automate the generation of a fact saying that a can be superimposed with another one, because in some cases, processing too require that their input data are expressed in the same coordinate system, and th beforehand been registered. can-be-superimposed-with Registration Resampling Registration-Class Resampling-Class used wasgeneratedby x y a Registration z t b used used used Re-sampling wasgeneratedby Matrix Resampling used (As_affine_transformation) Atlas used (As_reference_image) Result wasgeneratedby (As_resampled_image) Registration Matrix wasgeneratedby (As_affine_transformation) can_be_superimposed_with Atlas Result Figure 6.2: Linking data and processes through generic and domain-specific relations. provenance-based knowledge propagation 26

58 Experiment: inferring VIP experiment summaries (real-life) Objectives: Inferring meaningful experiment summaries from WF runs & domain knowledge Coping with provenance as distributed Linked Data Material & Methods: VIP e-science platform (Moteur WF engine ; OntoVIP ontology) Service annotations (Roles), OPM provenance, Inference rules 12 Chapter 1. Introduction VIP Portal VIP Execution Service Simulation workflows VIP Data Service Organ models Distributed computing infrastructure VIP Platform Simulated data Figure 1.2: The VIP platform, easing the access to medical image simulators, organ models, and leveraging A. Gaignard, thephd EGI defense, distributed 15 march computing 2013, Sophia infrastructure Antipolis to handle heavy simulation. 27

59 Experiment: inferring VIP experiment summaries (real-life) Objectives: Inferring meaningful experiment summaries from WF runs & domain knowledge Coping with provenance as distributed Linked Data Material & Methods: VIP e-science platform (Moteur WF engine ; OntoVIP ontology) Service annotations (Roles), OPM provenance, Inference rules 12 Chapter 1. Introduction VIP Portal VIP Execution Service Simulation workflows VIP Data Service Organ models Distributed computing infrastructure VIP Platform Simulated data Figure 1.2: The VIP platform, easing the access to medical image simulators, organ models, and leveraging A. Gaignard, thephd EGI defense, distributed 15 march computing 2013, Sophia infrastructure Antipolis to handle heavy simulation. 27

60 8.4.2 Results and discussion Semantic experiment summaries The main result of this experiment is a meaningful statements inferred from the execution of a medical image simula iment. These new statements provide a high-level, and concise semantic summary. We consider the experiment summary as a high-level descript only involves domain-specific classes and properties defined in the VIP ont pared to the generic and technical entities provided by the OPM provenanc We also consider the experiment summary as concise since only 7 statemen produced, compared to the 15 thousand statements produced through the M provenance plugin. Coarse-grained & meaningful provenance Inferring VIP experiment summaries (real-life) Fine-grained & technical provenance phantom protocol PET-simulationcompatible-model Parameter-set Simulation parsetextprotocol rdf:type rdf:type is-a compileprotocole generatejobs phantom protocol PET-Simulation Lmf2RawSino sorteosingles sorteosingles sorteosingles sorteosingles sorteosingles sorteosingles sorteoemission sorteoemission sorteoemission sorteoemission sorteoemission sorteoemission Inference rules derives-from-model Simulation workflow run rdf:type derives-from-parameter-set Simulated-data is-a-result-of-at sinogram sinogram is-a is-a PET-Sinogram Figure 8.11: New inferred meaningful statements (dashed arrows) constituting the semant summary. 28

61 PREFIX rdf: < rdf syntax ns#> PREFIX rdfs: < schema#> PREFIX opmo: < PREFIX opmv: < PREFIX ws: < service owl lite.owl#> PREFIX iec: < owl lite.owl#> Semantic experiment summaries The main result of this experiment is a meaningful statements inferred from the execution of a medical image simula iment. These new statements provide a high-level, and concise semantic summary. We consider the experiment summary as a high-level descript only involves domain-specific classes and properties defined in the VIP ont pared to the generic and technical entities provided by the OPM provenanc We also consider the experiment summary as concise since only 7 statemen produced, compared to the 15 thousand statements produced through the M provenance plugin. Inferred meaningful experiment summary: Inferring VIP experiment summaries: material & methods PREFIX vip model: < model.owl#> PREFIX vip simulation: < simulation.owl#> PREFIX vip simulated data: < simulated data.owl#> Inference rule: CONSTRUCT {?out vip model:derives from model?inphantom #... } WHERE {?agent (iec:refers to/rdf:type) vip simulation:image reconstruction simulator component.?wcb opmo:cause?agent.?wcb opmo:effect?x.?x rdf:type opmv:process.?wgb opmo:cause?x.?wgb opmo:effect?out. PET-simulationcompatible-model rdf:type phantom Parameter-set rdf:type protocol Simulation is-a PET-Simulation?agent2 (iec:refers to/rdf:type) vip simulation:parameters generation simulator component.?wcb2 opmo:cause?agent2.?wcb2 opmo:effect?y.?y rdf:type opmv:process. derives-from-model Simulation workflow run rdf:type derives-from-parameter-set Simulated-data }?used1 opmo:cause?inphantom.?used1 opmo:effect?y.?used1 opmo:role/rdfs:label?techrolephantom.?agent2 ws:has input?inportphantom.?inportphantom (iec:refers to/rdf:type) vip model:geometrical phantom object model.?inportphantom rdfs:comment?techrolephantom.?inphantom opmo:avalue?vinphantom.?vinphantom opmo:content?cinphantom. #... is-a-result-of-at sinogram is-a is-a PET-Sinogram Figure 8.11: New inferred meaningful statements (dashed arrows) constituting the semant summary. 29

62 Semantic experiment summaries The main result of this experiment is a meaningful statements inferred from the execution of a medical image simula iment. These new statements provide a high-level, and concise semantic e summary. We consider the experiment summary as a high-level descripti only involves domain-specific classes and properties defined in the VIP onto pared to the generic and technical entities provided by the OPM provenanc We also consider the experiment summary as concise since only 7 statement A real-life medical imaging simulation workflow: semantic mash-up experimentproduced, compared to the 15 thousand statements produced through the M er meaningful experiment summaries 183 provenance plugin. Inferring VIP experiment summaries: results Semantic experiment summaries : raph composed of 4523 nodes and edges, figure 8.10 represents a simpligraph in which some nodes have been removed such as the unique instance of PM Account allowing to retrieve all instances generated in the context of a sinorkflow execution. from this simplified graph we can distinguish two main nodes oteur.processor/sorteo_singles and which correspond to serwith a large number of invocations. PET-simulationcompatible-model Parameter-set Simulation rdf:type rdf:type is-a phantom protocol PET-Simulation rdf:type Simulation workflow run derives-from-parameter-set Simulated-data derives-from-model is-a-result-of-at is-a sinogram is-a PET-Sinogram 8.10: A filtered OPM provenance graph with removed rdf:type properties for the main OPM classes s Artifact, Used, WasGeneratedBy, etc. Figure BIG fine-grained, meaningless provenance 8.11: New inferred meaningful statements (dashed arrows) constituting the semanti summary. ue to its fine granularity and its size, the OPM model leads to complex graphs inng large amounts of generic and technical elements. Interpreting these OPM graphs ficult. To address this issue, we segmented the produced semantic annotations gh two distinct semantic repositories. First, a short-term repository, aiming at temily storing OPM statements, as the necessary input data to infer new meaningful ments. Second, a long-term repository, aiming at permanently storing the new states resulting from inferences involving domain-specific entities provided by the VIP ogy. FEW meaningful statements results Interpretation 30

63 Inferring VIP experiment summaries: results Distributed linked provenance data & inference rules Grid'5000 infrastructure (3 OPM data sources) + KGRAM-DQP 8.4. A real-life medical imaging simulation workflow: semantic mash-up to infer meaningful experiment summaries Reusable inference rules adapt to simulator component evolutions - do not adapt to workflow structure evolutions phantom compileprotocole protocol parsetextprotocol generatejobs sorteosingles Lmf2RawSino sorteoemission subsumedby subsumedby Lmf2RawSino_v2 Lmf2RawSino_v3 sinogram Figure 8.12: Updated Sorteo workflow involving a refined Lmf2RawSino servi 31

64 Inferring VIP experiment summaries: results 1 week of VIP operation / 18 possible inference rules: 118 Simulations (15K triples each) 1.7 M triples 118 Experiments summaries 2656 triples US simulations MR simulations CT simulations scalability 32

65 Highlights & short-term perspectives Highlights Clear delineation between Role and Natural concepts Domain ontology at workflow design-time and run-time Scalable annotation of analyzed data through semantic experiment summaries Reusable inference rules Short-term perspectives Integration of neuro-imaging roles in a sound domain ontology From OPM ontology to PROV-O Publishing experiment summaries as Linked Open Data 33

66 Summary Enhance e-science platforms with Knowledge Engineering (and Semantic Web technologies) Scalable and expressive Knowledge Sharing approach through distributed query processing techniques and abstract knowledge graphs Smart Knowledge Production: "few but meaningful data" Deployment into real-life platforms 2 softwares: NeuSemStore and KGRAM-DQP in production in 2 ANR projects : NeuroLOG and VIP 34

67 Future directions 1. Towards high performance federated semantic querying: triple pattern grouping & query planning "Elastic" SPARQL endpoint for massive knowledge graphs 2. Towards highly expressive federated semantic querying FedBench extensions with more expressive queries Towards distributed reasoning (optimal plan for inferences? materialization?) 3. Towards versatile and reliable knowledge base federations R2RML-based mediation of SQL databases generalized provenance, from processed data to the originating data sources (explanation) 4. Towards reduced information overload in e-science Semantic experiment summaries & (goal-driven) conceptual workflows [Cerezo et al., 2011] Eased inference rules design by relying on WF goals Annotated data to help in WF design 35

68 Merci! O. Corby, A. Gaignard, C. Faron Zucker, J. Montagnat. KGRAM versatile data graphs querying and inference engine, WI'12 (International Conference on Web Intelligence), Macao, A. Gaignard, J. Montagnat, B. Wali, B. Gibaud. Characterizing semantic service parameters with Role concepts to infer domain-specific knowledge at runtime, KEOD 11 (International Conference on Knowledge Engineering and Ontology Development), Paris, A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby. Semantic Federation of Distributed Neurodata, MICCAI-DCICTAI workshop (Data- and Compute-Intensive Clinical and Translational Imaging Applications), Nice, A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby. Fédération multi-sources en neurosciences : intégration de données relationnelles et sémantiques, IC'12 (Ingénierie des Connaissances), workshop "Ingénierie des connaissances pour l'inter-opérabilité sémantique en e-santé", Paris, T. Glatard, C. Lartizien, B. Gibaud, R. Ferreira da Silva, G. Forestier, F. Cervenansky, M. Alessandrini, H. Benoit-Cattin, O. Bernard, S. Camarasu-Pop, N. Cerezo, P. Clarysse, A. Gaignard, P. Hugonnard, H. Liebgott, S. Marache, A. Marion, J. Montagnat, J. Tabary and D. Friboulet. A Virtual Imaging Platform for multi-modality medical image simulation, IEEE Transactions on Medical Imaging (TMI), 32 (1), pages , 2013.

Fédération et analyse de données distribuées en imagerie biomédicale

Fédération et analyse de données distribuées en imagerie biomédicale Software technologies for integration of processes and data in neurosciences ConnaissancEs Distribuées en Imagerie BiomédicaLE Fédération et analyse de données distribuées en imagerie biomédicale Johan

More information

Disributed Query Processing KGRAM - Search Engine TOP 10

Disributed Query Processing KGRAM - Search Engine TOP 10 fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries Johan Montagnat CNRS, I3S lab, Modalis team on behalf of the CrEDIBLE

More information

fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Interrogation d'entrepôts distribués et hétérogènes

fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Interrogation d'entrepôts distribués et hétérogènes fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Interrogation d'entrepôts distribués et hétérogènes Johan Montagnat Alban Gaignard http://credible.i3s.unice.fr MI CNRS appel

More information

fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries

fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries Johan Montagnat CNRS, I3S lab, Modalis team on behalf of the CrEDIBLE

More information

Semantic Interoperability

Semantic Interoperability Ivan Herman Semantic Interoperability Olle Olsson Swedish W3C Office Swedish Institute of Computer Science (SICS) Stockholm Apr 27 2011 (2) Background Stockholm Apr 27, 2011 (2) Trends: from

More information

MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management Zhan Liu, Fabian Cretton, Anne Le Calvé, Nicole Glassey, Alexandre Cotting, Fabrice Chapuis

More information

Taming Big Data Variety with Semantic Graph Databases. Evren Sirin CTO Complexible

Taming Big Data Variety with Semantic Graph Databases. Evren Sirin CTO Complexible Taming Big Data Variety with Semantic Graph Databases Evren Sirin CTO Complexible About Complexible Semantic Tech leader since 2006 (née Clark & Parsia) software, consulting W3C leadership Offices in DC

More information

DataBridges: data integration for digital cities

DataBridges: data integration for digital cities DataBridges: data integration for digital cities Thematic action line «Digital Cities» Ioana Manolescu Oak team INRIA Saclay and Univ. Paris Sud-XI Plan 1. DataBridges short history and overview 2. RDF

More information

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology Hong-Linh Truong Institute for Software Science, University of Vienna, Austria truong@par.univie.ac.at Thomas Fahringer

More information

Lightweight Data Integration using the WebComposition Data Grid Service

Lightweight Data Integration using the WebComposition Data Grid Service Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed

More information

TopBraid Insight for Life Sciences

TopBraid Insight for Life Sciences TopBraid Insight for Life Sciences In the Life Sciences industries, making critical business decisions depends on having relevant information. However, queries often have to span multiple sources of information.

More information

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management Ram Soma 2, Amol Bakshi 1, Kanwal Gupta 3, Will Da Sie 2, Viktor Prasanna 1 1 University of Southern California,

More information

Graph Database Performance: An Oracle Perspective

Graph Database Performance: An Oracle Perspective Graph Database Performance: An Oracle Perspective Xavier Lopez, Ph.D. Senior Director, Product Management 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Program Agenda Broad Perspective

More information

FIPA agent based network distributed control system

FIPA agent based network distributed control system FIPA agent based network distributed control system V.Gyurjyan, D. Abbott, G. Heyes, E. Jastrzembski, C. Timmer, E. Wolin TJNAF, Newport News, VA 23606, USA A control system with the capabilities to combine

More information

Additional mechanisms for rewriting on-the-fly SPARQL queries proxy

Additional mechanisms for rewriting on-the-fly SPARQL queries proxy Additional mechanisms for rewriting on-the-fly SPARQL queries proxy Arthur Vaisse-Lesteven, Bruno Grilhères To cite this version: Arthur Vaisse-Lesteven, Bruno Grilhères. Additional mechanisms for rewriting

More information

Semantic Stored Procedures Programming Environment and performance analysis

Semantic Stored Procedures Programming Environment and performance analysis Semantic Stored Procedures Programming Environment and performance analysis Marjan Efremov 1, Vladimir Zdraveski 2, Petar Ristoski 2, Dimitar Trajanov 2 1 Open Mind Solutions Skopje, bul. Kliment Ohridski

More information

LDIF - Linked Data Integration Framework

LDIF - Linked Data Integration Framework LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,

More information

TopBraid Life Sciences Insight

TopBraid Life Sciences Insight TopBraid Life Sciences Insight In the Life Sciences industries, making critical business decisions depends on having relevant information. However, queries often have to span multiple sources of information.

More information

The Ontological Approach for SIEM Data Repository

The Ontological Approach for SIEM Data Repository The Ontological Approach for SIEM Data Repository Igor Kotenko, Olga Polubelova, and Igor Saenko Laboratory of Computer Science Problems, Saint-Petersburg Institute for Information and Automation of Russian

More information

Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens

Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many

More information

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. ravirajesh.j.2013.mecse@rajalakshmi.edu.in Mrs.

More information

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model 22 October 2014 Tony Hammond Michele Pasin Background About Macmillan

More information

Vertical Integration of Enterprise Industrial Systems Utilizing Web Services

Vertical Integration of Enterprise Industrial Systems Utilizing Web Services Vertical Integration of Enterprise Industrial Systems Utilizing Web Services A.P. Kalogeras 1, J. Gialelis 2, C. Alexakos 1, M. Georgoudakis 2, and S. Koubias 2 1 Industrial Systems Institute, Building

More information

Integrating Open Sources and Relational Data with SPARQL

Integrating Open Sources and Relational Data with SPARQL Integrating Open Sources and Relational Data with SPARQL Orri Erling and Ivan Mikhailov OpenLink Software, 10 Burlington Mall Road Suite 265 Burlington, MA 01803 U.S.A, {oerling,imikhailov}@openlinksw.com,

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering

HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering Chang Liu 1 Jun Qu 1 Guilin Qi 2 Haofen Wang 1 Yong Yu 1 1 Shanghai Jiaotong University, China {liuchang,qujun51319, whfcarter,yyu}@apex.sjtu.edu.cn

More information

excellent graph matching capabilities with global graph analytic operations, via an interface that researchers can use to plug in their own

excellent graph matching capabilities with global graph analytic operations, via an interface that researchers can use to plug in their own Steve Reinhardt 2 The urika developers are extending SPARQL s excellent graph matching capabilities with global graph analytic operations, via an interface that researchers can use to plug in their own

More information

Mining the Web of Linked Data with RapidMiner

Mining the Web of Linked Data with RapidMiner Mining the Web of Linked Data with RapidMiner Petar Ristoski, Christian Bizer, and Heiko Paulheim University of Mannheim, Germany Data and Web Science Group {petar.ristoski,heiko,chris}@informatik.uni-mannheim.de

More information

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel

More information

We have big data, but we need big knowledge

We have big data, but we need big knowledge We have big data, but we need big knowledge Weaving surveys into the semantic web ASC Big Data Conference September 26 th 2014 So much knowledge, so little time 1 3 takeaways What are linked data and the

More information

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

LinkZoo: A linked data platform for collaborative management of heterogeneous resources LinkZoo: A linked data platform for collaborative management of heterogeneous resources Marios Meimaris, George Alexiou, George Papastefanatos Institute for the Management of Information Systems, Research

More information

Semantic Workflows and the Wings Workflow System

Semantic Workflows and the Wings Workflow System To Appear in AAAI Fall Symposium on Proactive Assistant Agents, Arlington, VA, November 2010. Assisting Scientists with Complex Data Analysis Tasks through Semantic Workflows Yolanda Gil, Varun Ratnakar,

More information

Addressing Self-Management in Cloud Platforms: a Semantic Sensor Web Approach

Addressing Self-Management in Cloud Platforms: a Semantic Sensor Web Approach Addressing Self-Management in Cloud Platforms: a Semantic Sensor Web Approach Rustem Dautov Iraklis Paraskakis Dimitrios Kourtesis South-East European Research Centre International Faculty, The University

More information

An industry perspective on deployed semantic interoperability solutions

An industry perspective on deployed semantic interoperability solutions An industry perspective on deployed semantic interoperability solutions Ralph Hodgson, CTO, TopQuadrant SEMIC Conference, Athens, April 9, 2014 https://joinup.ec.europa.eu/community/semic/event/se mic-2014-semantic-interoperability-conference

More information

Ontological Identification of Patterns for Choreographing Business Workflow

Ontological Identification of Patterns for Choreographing Business Workflow University of Aizu, Graduation Thesis. March, 2010 s1140042 1 Ontological Identification of Patterns for Choreographing Business Workflow Seiji Ota s1140042 Supervised by Incheon Paik Abstract Business

More information

D5.3.2b Automatic Rigorous Testing Components

D5.3.2b Automatic Rigorous Testing Components ICT Seventh Framework Programme (ICT FP7) Grant Agreement No: 318497 Data Intensive Techniques to Boost the Real Time Performance of Global Agricultural Data Infrastructures D5.3.2b Automatic Rigorous

More information

Big Data Provenance: Challenges and Implications for Benchmarking

Big Data Provenance: Challenges and Implications for Benchmarking Big Data Provenance: Challenges and Implications for Benchmarking Boris Glavic Illinois Institute of Technology 10 W 31st Street, Chicago, IL 60615, USA glavic@iit.edu Abstract. Data Provenance is information

More information

CRM dig : A generic digital provenance model for scientific observation

CRM dig : A generic digital provenance model for scientific observation CRM dig : A generic digital provenance model for scientific observation Martin Doerr, Maria Theodoridou Institute of Computer Science, FORTH-ICS, Crete, Greece Abstract The systematic large-scale production

More information

SCIENTIFIC workflows have recently emerged as a new

SCIENTIFIC workflows have recently emerged as a new IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 2, NO. 1, JANUARY-MARCH 2009 79 A Reference Architecture for Scientific Workflow Management Systems and the VIEW SOA Solution Cui Lin, Student Member, IEEE,

More information

Intelligent interoperable application for employment exchange system using ontology

Intelligent interoperable application for employment exchange system using ontology 1 Webology, Volume 10, Number 2, December, 2013 Home Table of Contents Titles & Subject Index Authors Index Intelligent interoperable application for employment exchange system using ontology Kavidha Ayechetty

More information

Supporting Change-Aware Semantic Web Services

Supporting Change-Aware Semantic Web Services Supporting Change-Aware Semantic Web Services Annika Hinze Department of Computer Science, University of Waikato, New Zealand a.hinze@cs.waikato.ac.nz Abstract. The Semantic Web is not only evolving into

More information

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications Gaël de Chalendar CEA LIST F-92265 Fontenay aux Roses Gael.de-Chalendar@cea.fr 1 Introduction The main data sources

More information

Business rules and science

Business rules and science Business rules and science Science is a distributed, heterogeneous, rapidly evolving complex of activities, like an enterprise Business processes in science are largely ad hoc and undocumented, like very

More information

Application of ontologies for the integration of network monitoring platforms

Application of ontologies for the integration of network monitoring platforms Application of ontologies for the integration of network monitoring platforms Jorge E. López de Vergara, Javier Aracil, Jesús Martínez, Alfredo Salvador, José Alberto Hernández Networking Research Group,

More information

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study Amar-Djalil Mezaour 1, Julien Law-To 1, Robert Isele 3, Thomas Schandl 2, and Gerd Zechmeister

More information

RDF y SPARQL: Dos componentes básicos para la Web de datos

RDF y SPARQL: Dos componentes básicos para la Web de datos RDF y SPARQL: Dos componentes básicos para la Web de datos Marcelo Arenas PUC Chile & University of Oxford M. Arenas RDF y SPARQL: Dos componentes básicos para la Web de datos Valladolid 2013 1 / 61 Semantic

More information

A science-gateway workload archive application to the self-healing of workflow incidents

A science-gateway workload archive application to the self-healing of workflow incidents A science-gateway workload archive application to the self-healing of workflow incidents Rafael FERREIRA DA SILVA, Tristan GLATARD University of Lyon, CNRS, INSERM, CREATIS Villeurbanne, France Frédéric

More information

Service-Oriented Architecture and its Implications for Software Life Cycle Activities

Service-Oriented Architecture and its Implications for Software Life Cycle Activities Service-Oriented Architecture and its Implications for Software Life Cycle Activities Grace A. Lewis Software Engineering Institute Integration of Software-Intensive Systems (ISIS) Initiative Agenda SOA:

More information

Evaluating Semantic Web Service Tools using the SEALS platform

Evaluating Semantic Web Service Tools using the SEALS platform Evaluating Semantic Web Service Tools using the SEALS platform Liliana Cabral 1, Ioan Toma 2 1 Knowledge Media Institute, The Open University, Milton Keynes, UK 2 STI Innsbruck, University of Innsbruck,

More information

Semantic Web Technologies and Data Management

Semantic Web Technologies and Data Management Semantic Web Technologies and Data Management Li Ma, Jing Mei, Yue Pan Krishna Kulkarni Achille Fokoue, Anand Ranganathan IBM China Research Laboratory IBM Software Group IBM Watson Research Center Bei

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

Techniques to Produce Good Web Service Compositions in The Semantic Grid

Techniques to Produce Good Web Service Compositions in The Semantic Grid Techniques to Produce Good Web Service Compositions in The Semantic Grid Eduardo Blanco Universidad Simón Bolívar, Departamento de Computación y Tecnología de la Información, Apartado 89000, Caracas 1080-A,

More information

Context Capture in Software Development

Context Capture in Software Development Context Capture in Software Development Bruno Antunes, Francisco Correia and Paulo Gomes Knowledge and Intelligent Systems Laboratory Cognitive and Media Systems Group Centre for Informatics and Systems

More information

IBM WebSphere ILOG Rules for.net

IBM WebSphere ILOG Rules for.net Automate business decisions and accelerate time-to-market IBM WebSphere ILOG Rules for.net Business rule management for Microsoft.NET and SOA environments Highlights Complete BRMS for.net Integration with

More information

Tool Support for Inspecting the Code Quality of HPC Applications

Tool Support for Inspecting the Code Quality of HPC Applications Tool Support for Inspecting the Code Quality of HPC Applications Thomas Panas Dan Quinlan Richard Vuduc Center for Applied Scientific Computing Lawrence Livermore National Laboratory P.O. Box 808, L-550

More information

OAK Database optimizations and architectures for complex large data Ioana MANOLESCU-GOUJOT

OAK Database optimizations and architectures for complex large data Ioana MANOLESCU-GOUJOT OAK Database optimizations and architectures for complex large data Ioana MANOLESCU-GOUJOT INRIA Saclay Île-de-France Université Paris Sud LRI UMR CNRS 8623 Plan 1. The team 2. Oak research at a glance

More information

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Anne Monceaux 1, Joanna Guss 1 1 EADS-CCR, Centreda 1, 4 Avenue Didier Daurat 31700 Blagnac France

More information

Managing enterprise applications as dynamic resources in corporate semantic webs an application scenario for semantic web services.

Managing enterprise applications as dynamic resources in corporate semantic webs an application scenario for semantic web services. Managing enterprise applications as dynamic resources in corporate semantic webs an application scenario for semantic web services. Fabien Gandon, Moussa Lo, Olivier Corby, Rose Dieng-Kuntz ACACIA in short

More information

Models and Architecture for Smart Data Management

Models and Architecture for Smart Data Management 1 Models and Architecture for Smart Data Management Pierre De Vettor, Michaël Mrissa and Djamal Benslimane Université de Lyon, CNRS LIRIS, UMR5205, F-69622, France E-mail: firstname.surname@liris.cnrs.fr

More information

Big Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD

Big Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD Big Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD Boris Mocialov (H00180016) MSc Software Engineering Heriot-Watt University, Edinburgh April 5, 2015 1 1 Introduction The purpose

More information

Data Services @neurist and beyond

Data Services @neurist and beyond s @neurist and beyond Siegfried Benkner Department of Scientific Computing Faculty of Computer Science University of Vienna http://www.par.univie.ac.at Department of Scientific Computing Parallel Computing

More information

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation

More information

Application of OASIS Integrated Collaboration Object Model (ICOM) with Oracle Database 11g Semantic Technologies

Application of OASIS Integrated Collaboration Object Model (ICOM) with Oracle Database 11g Semantic Technologies Application of OASIS Integrated Collaboration Object Model (ICOM) with Oracle Database 11g Semantic Technologies Zhe Wu Ramesh Vasudevan Eric S. Chan Oracle Deirdre Lee, Laura Dragan DERI A Presentation

More information

Linked Statistical Data Analysis

Linked Statistical Data Analysis Linked Statistical Data Analysis Sarven Capadisli 1, Sören Auer 2, Reinhard Riedl 3 1 Universität Leipzig, Institut für Informatik, AKSW, Leipzig, Germany, 2 University of Bonn and Fraunhofer IAIS, Bonn,

More information

Digital libraries of the future and the role of libraries

Digital libraries of the future and the role of libraries Digital libraries of the future and the role of libraries Donatella Castelli ISTI-CNR, Pisa, Italy Abstract Purpose: To introduce the digital libraries of the future, their enabling technologies and their

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

Secure Semantic Web Service Using SAML

Secure Semantic Web Service Using SAML Secure Semantic Web Service Using SAML JOO-YOUNG LEE and KI-YOUNG MOON Information Security Department Electronics and Telecommunications Research Institute 161 Gajeong-dong, Yuseong-gu, Daejeon KOREA

More information

Position Paper: Validation of Distributed Enterprise Data is Necessary, and RIF can Help

Position Paper: Validation of Distributed Enterprise Data is Necessary, and RIF can Help Position Paper: Validation of Distributed Enterprise Data is Necessary, and RIF can Help David Schaengold Director of Business Solutions Revelytix, Inc Sept 19, 2011, Revised Oct 17, 2011 Overview Revelytix

More information

Big Data Provenance - Challenges and Implications for Benchmarking. Boris Glavic. IIT DBGroup. December 17, 2012

Big Data Provenance - Challenges and Implications for Benchmarking. Boris Glavic. IIT DBGroup. December 17, 2012 Big Data Provenance - Challenges and Implications for Benchmarking Boris Glavic IIT DBGroup December 17, 2012 Outline 1 Provenance 2 Big Data Provenance - Big Provenance 3 Implications for Benchmarking

More information

An Ontological Approach to Oracle BPM

An Ontological Approach to Oracle BPM An Ontological Approach to Oracle BPM Jean Prater, Ralf Mueller, Bill Beauregard Oracle Corporation, 500 Oracle Parkway, Redwood City, CA 94065, USA jean.prater@oracle.com, ralf.mueller@oracle.com, william.beauregard@oracle.com

More information

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India Big Data and Semantic Web in Manufacturing Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India Outline Big data in Manufacturing Big data Analytics Semantic web technologies Case

More information

Composing Data-Providing Web Services

Composing Data-Providing Web Services Composing Data-Providing Web Services Mahmoud Barhamgi Supervised by Djamal Benslimane LIRIS Laboratory Claude Bernard University Lyon 1 mahmoud.barhamgi@liris.cnrs.fr djamal.benslimane@liris.cnrs.fr Abstract

More information

Open Data Integration Using SPARQL and SPIN

Open Data Integration Using SPARQL and SPIN Open Data Integration Using SPARQL and SPIN A Case Study for the Tourism Domain Antonino Lo Bue, Alberto Machi ICAR-CNR Sezione di Palermo, Italy Research funded by Italian PON SmartCities Dicet-InMoto-Orchestra

More information

urika! Unlocking the Power of Big Data at PSC

urika! Unlocking the Power of Big Data at PSC urika! Unlocking the Power of Big Data at PSC Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center February 1, 2013 nystrom@psc.edu 2013 Pittsburgh Supercomputing Center Big Data

More information

Core Enterprise Services, SOA, and Semantic Technologies: Supporting Semantic Interoperability

Core Enterprise Services, SOA, and Semantic Technologies: Supporting Semantic Interoperability Core Enterprise, SOA, and Semantic Technologies: Supporting Semantic Interoperability in a Network-Enabled Environment 2011 SOA & Semantic Technology Symposium 13-14 July 2011 Sven E. Kuehne sven.kuehne@nc3a.nato.int

More information

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

BUSINESS VALUE OF SEMANTIC TECHNOLOGY BUSINESS VALUE OF SEMANTIC TECHNOLOGY Preliminary Findings Industry Advisory Council Emerging Technology (ET) SIG Information Sharing & Collaboration Committee July 15, 2005 Mills Davis Managing Director

More information

Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints

Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints Christian Bizer 1 and Andreas Schultz 1 1 Freie Universität Berlin, Web-based Systems Group, Garystr. 21, 14195 Berlin, Germany

More information

Ontology and automatic code generation on modeling and simulation

Ontology and automatic code generation on modeling and simulation Ontology and automatic code generation on modeling and simulation Youcef Gheraibia Computing Department University Md Messadia Souk Ahras, 41000, Algeria youcef.gheraibia@gmail.com Abdelhabib Bourouis

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer Alejandro Vaisman Esteban Zimanyi Data Warehouse Systems Design and Implementation ^ Springer Contents Part I Fundamental Concepts 1 Introduction 3 1.1 A Historical Overview of Data Warehousing 4 1.2 Spatial

More information

How To Make Sense Of Data With Altilia

How To Make Sense Of Data With Altilia HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

K@ A collaborative platform for knowledge management

K@ A collaborative platform for knowledge management White Paper K@ A collaborative platform for knowledge management Quinary SpA www.quinary.com via Pietrasanta 14 20141 Milano Italia t +39 02 3090 1500 f +39 02 3090 1501 Copyright 2004 Quinary SpA Index

More information

Creating an RDF Graph from a Relational Database Using SPARQL

Creating an RDF Graph from a Relational Database Using SPARQL Creating an RDF Graph from a Relational Database Using SPARQL Ayoub Oudani, Mohamed Bahaj*, Ilias Cherti Department of Mathematics and Informatics, University Hassan I, FSTS, Settat, Morocco. * Corresponding

More information

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY Yu. A. Zagorulko, O. I. Borovikova, S. V. Bulgakov, E. A. Sidorova 1 A.P.Ershov s Institute

More information

AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS

AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS Wesley Deneke 1, Wing-Ning Li 2, and Craig Thompson 2 1 Computer Science and Industrial Technology Department, Southeastern Louisiana University,

More information

DISCOVERING RESUME INFORMATION USING LINKED DATA

DISCOVERING RESUME INFORMATION USING LINKED DATA DISCOVERING RESUME INFORMATION USING LINKED DATA Ujjal Marjit 1, Kumar Sharma 2 and Utpal Biswas 3 1 C.I.R.M, University Kalyani, Kalyani (West Bengal) India sic@klyuniv.ac.in 2 Department of Computer

More information

A Survey of Data Provenance Techniques

A Survey of Data Provenance Techniques A Survey of Data Provenance Techniques Yogesh L. Simmhan, Beth Plale, Dennis Gannon Computer Science Department, Indiana University, Bloomington IN 47405 {ysimmhan, plale, gannon}@cs.indiana.edu Technical

More information

PONTE Presentation CETIC. EU Open Day, Cambridge, 31/01/2012. Philippe Massonet

PONTE Presentation CETIC. EU Open Day, Cambridge, 31/01/2012. Philippe Massonet PONTE Presentation CETIC Philippe Massonet EU Open Day, Cambridge, 31/01/2012 PONTE Description Efficient Patient Recruitment for Innovative Clinical Trials of Existing Drugs to other Indications Start

More information

Federated Data Management and Query Optimization for Linked Open Data

Federated Data Management and Query Optimization for Linked Open Data Chapter 5 Federated Data Management and Query Optimization for Linked Open Data Olaf Görlitz and Steffen Staab Institute for Web Science and Technologies, University of Koblenz-Landau, Germany {goerlitz,staab}@uni-koblenz.de

More information

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015 E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing

More information

Improving EHR Semantic Interoperability Future Vision and Challenges

Improving EHR Semantic Interoperability Future Vision and Challenges Improving EHR Semantic Interoperability Future Vision and Challenges Catalina MARTÍNEZ-COSTA a,1 Dipak KALRA b, Stefan SCHULZ a a IMI,Medical University of Graz, Austria b CHIME, University College London,

More information

City-Wide Smart Healthcare Appointment Systems Based on Cloud Data Virtualization PaaS

City-Wide Smart Healthcare Appointment Systems Based on Cloud Data Virtualization PaaS , pp. 371-382 http://dx.doi.org/10.14257/ijmue.2015.10.2.34 City-Wide Smart Healthcare Appointment Systems Based on Cloud Data Virtualization PaaS Ping He 1, Penghai Wang 2, Jiechun Gao 3 and Bingyong

More information

Data Quality in Information Integration and Business Intelligence

Data Quality in Information Integration and Business Intelligence Data Quality in Information Integration and Business Intelligence Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies

More information

Linked Open Data A Way to Extract Knowledge from Global Datastores

Linked Open Data A Way to Extract Knowledge from Global Datastores Linked Open Data A Way to Extract Knowledge from Global Datastores Bebo White SLAC National Accelerator Laboratory HKU Expert Address 18 September 2014 Developments in science and information processing

More information

ONTODESIGN; A DOMAIN ONTOLOGY FOR BUILDING AND EXPLOITING PROJECT MEMORIES IN PRODUCT DESIGN PROJECTS

ONTODESIGN; A DOMAIN ONTOLOGY FOR BUILDING AND EXPLOITING PROJECT MEMORIES IN PRODUCT DESIGN PROJECTS ONTODESIGN; A DOMAIN ONTOLOGY FOR BUILDING AND EXPLOITING PROJECT MEMORIES IN PRODUCT DESIGN PROJECTS DAVY MONTICOLO Zurfluh-Feller Company 25150 Belfort France VINCENT HILAIRE SeT Laboratory, University

More information

Semantically Enhanced Web Personalization Approaches and Techniques

Semantically Enhanced Web Personalization Approaches and Techniques Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,

More information

RDF graph Model and Data Retrival

RDF graph Model and Data Retrival Distributed RDF Graph Keyword Search 15 2 Linked Data, Non-relational Databases and Cloud Computing 2.1.Linked Data The World Wide Web has allowed an unprecedented amount of information to be published

More information

Publishing Linked Data Requires More than Just Using a Tool

Publishing Linked Data Requires More than Just Using a Tool Publishing Linked Data Requires More than Just Using a Tool G. Atemezing 1, F. Gandon 2, G. Kepeklian 3, F. Scharffe 4, R. Troncy 1, B. Vatant 5, S. Villata 2 1 EURECOM, 2 Inria, 3 Atos Origin, 4 LIRMM,

More information

Ontology-Based Discovery of Workflow Activity Patterns

Ontology-Based Discovery of Workflow Activity Patterns Ontology-Based Discovery of Workflow Activity Patterns Diogo R. Ferreira 1, Susana Alves 1, Lucinéia H. Thom 2 1 IST Technical University of Lisbon, Portugal {diogo.ferreira,susana.alves}@ist.utl.pt 2

More information

Big Workflow: More than Just Intelligent Workload Management for Big Data

Big Workflow: More than Just Intelligent Workload Management for Big Data Big Workflow: More than Just Intelligent Workload Management for Big Data Michael Feldman White Paper February 2014 EXECUTIVE SUMMARY Big data applications represent a fast-growing category of high-value

More information