A Grid Data Integration Service (OGSA-DQP) Paul Watson, University of Newcastle-upon-Tyne based on the work of Norman Paton, Tasos Gounaris, Alvaro Fernandes, Rizos Sakellariou University of Manchester Jim Smith, Arijit Mukherjee, Paul Watson University of Newcastle-upon-Tyne www.neresc.ac.uk
The Problem Many grid applications would benefit from access to distributed data Data sources are scattered and autonomous Integration is often done by tedious manual process or (recently) hand-coded workflows We are interested in how to simplify the process of querying distributed data Focussing initially on information held in (relational) databases www.neresc.ac.uk 2
Distributed Query Processing Queries are expressed in OQL allows computations to be included in the query A single query may reference data at multiple sites the data locations may be transparent to the query author select p.proteinid, Blast(p.sequence) from protein p, proteinterm t where t.termid = S92 and p.proteinid = t.proteinid www.neresc.ac.uk 3
Query Compiler OGSA-DQP automatically compiles and executes the query on a set of Grid nodes - in parallel where possible OQL Parser Logical Optimiser Physical Optimiser Single-node optimiser Multi-node optimiser Partitioner Scheduler Evaluator www.neresc.ac.uk 4
Execution Plan select p.proteinid, Blast(p.sequence) from protein p, proteinterm t where t.termid = S92 and p.proteinid = t.proteinid The plan is split in to a set of partitions Grid resources are acquired to execute the partitions in parallel where possible, required and affordable 9,10 3-8 exchange reduce reduce op_call (Blast) exchange hash_join (proteinid) exchange reduce 1 2 table_scan (protein) table_scan termid=s92 (proteinterm) www.neresc.ac.uk 5
Evaluation on the Grid The OGSA-DQP builds on OGSA-DAI accesses relational databases wrapped by OGSA-DAI Oracle, DB2, MySQL Data streams between nodes flow control All services are OGSI-compliant built on GT3 www.neresc.ac.uk 6
perform(querysubplan) Execution on the Grid GDQ 3 3 Client G 1 GDS GDQS GDT perform(query) N0 perform(querysubplan) 2 perform(querysubplan) createservice 2 4 Factory GQESF G createservice Factory GQESF G results GDS 3 GDS GDT GQES1 G GDT GQES1 G N3 N4 4 results GDS GDT GQES2 G operation_call blast(p.sequence) reduce (p.proteinid, blast) Factory GDS GDS G GQESF G reduce (p.proteinid, blast) 14 GDT GQES3 G hash_join (p.proteinid=t.proteinid) results sequential_scan N2 reduce (proteinid,sequence) Web Services (BLAST) reduce (proteinid) N1 2 createservice operation_call blast(p.sequence) Factory GQESF G GDS G sequential_scan (term=8372) www.neresc.ac.uk 7
Mutual Benefit The Grid needs DQP: Declarative, high-level resource integration with implicit parallelism DQP needs the Grid: Systematic access to remote data and computational resources Cost based optimisation Dynamic resource discovery and allocation www.neresc.ac.uk 8
Summary DQP is a potentially important technology for the Grid OGSA-DQP supports: declarative expression of queries location transparency access to both data and computational resources dynamic deployment on Grid resources implicit parallelism First release made in September 2003 available for download Dynamic adaptation now being investigated fault-tolerance, performance, cost www.neresc.ac.uk 9
Experiences and Issues Remote service deployment not yet available for Grids, but some work PhD Project at Newcastle (Chris Fowler) dynamically deploy individual services remotely initial prototype by end of November 2003 working on security issues WS only GridShed project (Newcastle + BT) design of hosting environments for Grids install execution images on nodes as required www.neresc.ac.uk 10
Experiences & Issues DQP vs Workflow? for what space of problems is each better DQP advantages? declarative expression of intent cost-based choice of execution plans implicit parallelisation Investigating with Bioinformatics applications in the my Grid project DQP with workflows & workflows with DQP www.neresc.ac.uk 11
Projects/Sponsors Projects OGSA-DAI Polar Polar* my Grid Sponsors www.neresc.ac.uk 12