CORBA and Life Sciences Ulf Leser 4. December 2002
Table of Content CORBA in a nutshell The Life Science Research Domain Task Force The Genome Maps Standard The CORBA approach to data integration Ulf Leser: CORBA and Life Sciences, December 2002 2
CORBA Common Object Request Broker Architecture - A reference architecture, not an implementation - Developed in a community process through OMG - Object-oriented middleware Target - Easier, more flexible RPCs - Interoperability of applications over networks - Language- and platform independent - Free specification of interfaces Main elements - Interfaces, interfaces, interfaces Ulf Leser: CORBA and Life Sciences, December 2002 3
Similar techniques RPC, RFC - Calling a procuedre/function on a remote machine - Very old Enterprise Java Beans - Also interface-centric - Also object-oriented - Pure JAVA DCOM - Also interface-centric - Somewhat object-oriented - Pure Microsoft.NET Ulf Leser: CORBA and Life Sciences, December 2002 4
Object Management Goup (OMG) Over 800 member organizations - world s largest software consortium Founded April 1989 Small staff (30 full time); no internal development Dedicated to creating and popularizing object-oriented standards for application integration based on existing technology Source: OMG Ulf Leser: CORBA and Life Sciences, December 2002 5
Object Management Architecture Application Objects Domain Objects Object Request Broker Object Services Common Facilities CORBA Object Legacy Application Wrapper Forget all that for today Ulf Leser: CORBA and Life Sciences, December 2002 6
Implementation with CORBA Interface Definition Server skeleton generation Client stub generation Implementation of methods Bind stubs to implement. Bind stubs to (existing) application Bind ORB Start ORB Bind ORB to application Ulf Leser: CORBA and Life Sciences, December 2002 7
Code generation Your parts OMG OMG IDL IDL specification specification IDL Compiler Language mapping Client Client code code Stub code ORB ORB Library Skeleton code Server Server code code Client ready ready to to request request Server ready ready to to serve serve Source: EBI industry group, CORBA tutorials Ulf Leser: CORBA and Life Sciences, December 2002 8
RPCs in CORBA Client: - Obtain CORBA reference - Transparent method invocation ORB Stub Application ORB: Request - Server localisation IIOP - Request propagation Result Server Stub ORB Server: - Manage CORBA objects - Receive and execute RPCs Database Ulf Leser: CORBA and Life Sciences, December 2002 9
Interface Definition Language Defining an interface, not an implementation Object-oriented language - Strongly typed - Multiple inheritance - Structs Language independent Mapped to programming languages: - JAVA - C++ - App. 20 more skeletons and stubs Ulf Leser: CORBA and Life Sciences, December 2002 10
Example: HelloWorld.idl module module Tutorial1 Tutorial1 { interface interface HelloWorld HelloWorld { string string get_text(); get_text(); }; }; }; }; module defines a naming context: names from this module used outside this module must be referenced as (for example): Tutorial1::HelloWorld interface defines a CORBA object (class): these objects are available for clients and are implemented by a server an available method: methods are called by clients to fulfill their requests Source: EBI industry group, CORBA tutorials Ulf Leser: CORBA and Life Sciences, December 2002 11
CORBA Services CORBA Services support development and deployment: - Naming service: object localization by name - Trading service: service localization by properties - Transaction service: 2-phase commit protocol management - Query service - Event service - Relationship service -... Ulf Leser: CORBA and Life Sciences, December 2002 12
The Life Science Research Domain Task Force Ulf Leser: CORBA and Life Sciences, December 2002 13
Domain Services: LSR OMG is organized in domain services - Financial - Automotive - Life Science - Etc. Life Science Research Task Force since 1997 (www.omg.org/homepages/lsr) Task forces have working groups Task forces supervise definition and adoption of specifications (I.e., documents) Regular meetings Ulf Leser: CORBA and Life Sciences, December 2002 14
LSR working groups Biomolecular Sequence Analysis (adopted) Genomic Maps (adopted) Bibliographic Query Service (adopted) Macromolecular Structure Laboratory Equipment Control Interfaces Gene Expression Chemical Structure Access and Representation Biomolecular Sequence Analysis Entities Ulf Leser: CORBA and Life Sciences, December 2002 15
Standardisation Process RfI Time-consuming process OMG Architectural board committed to: - Orthogonality - Cutting-edge technology Participation: relatively open Submitters must commit themselves to provide implementations From first idea to standard: 2-3 years RfP LoI Proposals Standard Ulf Leser: CORBA and Life Sciences, December 2002 16
Example 1: Genome Maps Ulf Leser: CORBA and Life Sciences, December 2002 17
Genomic Maps Differences: - Co-ordinate system - Ordering - Object types Different species, chromosomes, regions Ulf Leser: CORBA and Life Sciences, December 2002 18
Scope of Genome Maps Maps -no sequences Access - no calculation or comparison Retrieval - no writing An interface not a data model: - Easy to implement for providers - Powerful enough for clients - Covering most types of maps Potential basis for higher-level services Ulf Leser: CORBA and Life Sciences, December 2002 19
First Proposal crossreferences 0..* mappedobj 1..1 Mappable species chromosome type getmaps() MapObje ct database name id 1..1 Point Segment length unit Ma rke r Ma p Clo ne 1..1 MapEle m e nt positionprecision 1..* onmap 1..1 getnrofelements() getallelements() getrangebetweenobjects() getelementsinsegment() IntervalPosition PointPosition Range Pos ition leftend rightend Orde re dposition rank frameworkelement position Vag ue Po s itio n leftflankingobj rightflankingobj Lin e a rma p maxcoordinate mincoordinate getscalarrange() getaround() Bin CytogeneticElement rank getsuperelement() getsubelements() getsiblings() Ulf Leser: CORBA and Life Sciences, December 2002 20
Mappable Objects Mappable species chromosome type getmaps() MapObje ct database name id Point Segment length unit Mappable are all objects which can be placed on a map Cross-linked to equal objects in other databases Segments have extent: clones, bands, maps,... Points are points: marker, EST, STS,... Ulf Leser: CORBA and Life Sciences, December 2002 21
Maps Maps are segments Maps can be placed on maps Two types: - Linear maps have a co-ordinate system: physical maps genetic maps - Bin maps have only ranges: Radiation-hybrid maps 1 Lin e a rma p maxcoordinate mincoordinate getscalarrange() getaround() Segment length unit Ma p getnrofelements() getallelements() getrangebetweenobjects() getelementsinsegment() Bin Ulf Leser: CORBA and Life Sciences, December 2002 22
MapElement MapElement is the assignment of a Mappable to a Map with a Position in a Co-ordinate system n:m relationship between Map and Mappable Map, MapElement and Mappable could be on different servers MapEle m e nt positionprecision IntervalPosition PointPosition Range Position leftend rightend frameworkelement position leftflankingobj rightflankingobj OrderedPosition rank Vag ue Po s itio n Ulf Leser: CORBA and Life Sciences, December 2002 23
First Proposal crossreferences 0..* mappedobj 1..1 Mappable species chromosome type getmaps() MapObje ct database name id 1..1 Point Segment length unit Ma rke r Ma p Clo ne 1..1 MapEle m e nt positionprecision 1..* onmap 1..1 getnrofelements() getallelements() getrangebetweenobjects() getelementsinsegment() IntervalPosition PointPosition Range Pos ition leftend rightend Orde re dposition rank frameworkelement position Vag ue Po s itio n leftflankingobj rightflankingobj Lin e a rma p maxcoordinate mincoordinate getscalarrange() getaround() Bin CytogeneticElement rank getsuperelement() getsubelements() getsiblings() Ulf Leser: CORBA and Life Sciences, December 2002 24
Implementation: Wrapping IXDB Integrated database: > 30 data sources Many different maps available Ulf Leser: CORBA and Life Sciences, December 2002 25
Experiences - Semantic Different semantics: - Relational database, object-oriented MapIDL - IXDB.Locus does not exists in MapIDL - Genes with or without extent - Cardinalities: IXDB stores many values - Synonyms Not all information in IXDB is representable in MapIDL Information loss Ulf Leser: CORBA and Life Sciences, December 2002 26
Experiences - Technical Transient versus persistent references Consistency - Between client and server - Between CORBA server and database Memory management - Releasing objects - Multi-copy objects Multi-threaded programming First shot easy, but good implementation difficult Ulf Leser: CORBA and Life Sciences, December 2002 27
CORBA and Data Integration Ulf Leser: CORBA and Life Sciences, December 2002 28
Interoperability Maps are stored in many data sources... - GDB - RHdb - CEPH - Hugemap - IXDB - XACE -... Difficult to get an integrated view on all available data GDB RHdb Ulf Leser: CORBA and Life Sciences, December 2002 29
Current Approach Ulf Leser: CORBA and Life Sciences, December 2002 30
Integration by Standard Ulf Leser: CORBA and Life Sciences, December 2002 31
If Standards were used... User Choose source: GDB GDB MGD MGD RHdb RHdb CEPH Map Comparison Application.getMaps( X ) ORB.getMaps( X ) GDB MGD RHdb CEPH Ulf Leser: CORBA and Life Sciences, December 2002 32
Two Approaches to Interoperability Someone builds an integrating system: Mediator - Typically laborious - Req. understanding of source data ORB IDL JDBC HTML HTML HTML - Schema and interface evolution Data Source Data Source Data Source Sources provide a standard access method: - Fixed structure and semantic - Most problems are shifted from mediator to providers Data Source Mediator ORB IDL 1 IDL 1 Data Source IDL 1 Data Source Ulf Leser: CORBA and Life Sciences, December 2002 33
Integration Obstacles Removed? Semantic & structure - Documentation, MapIDL Data model - CORBA (IDL->language mapping) Access mechanism - CORBA (IIOP) Query capabilities - Methods prescribed Ulf Leser: CORBA and Life Sciences, December 2002 34
Obstacles Removed, cont d Data conflicts -Not resolved Data source autonomy - Source implements and maintains server Fuzzy concepts - Documentation Object identification -Not resolved Ulf Leser: CORBA and Life Sciences, December 2002 35
Conclusions and Open Questions Ulf Leser: CORBA and Life Sciences, December 2002 36
General Design Problem Clients: - Typed access: no impedance mismatch, no parsing - Homogeneous structure and semantic - Standard canned queries Server: - Install CORBA (ORB...) - Adopt standard semantic - Implement interface Make it powerful! Make it simple! Ulf Leser: CORBA and Life Sciences, December 2002 37
Questions Designing a good interface is non-trivial - Performance: Objects versus structs Navigation versus queries - Complexity Do we need 5 different position types? Hierarchies? - What are the specific needs of potential applications? Map comparison Map integration Map visualisation Ulf Leser: CORBA and Life Sciences, December 2002 38
Questions cont d Using CORBA services - Availability? For all clients at low cost? - Maturity? Object-by-Value, MOF, POA? Personal opinion - Naming service: useful - Query service: useless - Collection service: too expensive - Relationship service: too expensive - Trading service: unclear - Object-by-Value: wonderful - POA: essential - Other LSR standards: very important, once they exist Ulf Leser: CORBA and Life Sciences, December 2002 39
Questions cont d Ad-hoc queries? Against what schema... - the IDL? Not possible - IDL is not a data model, no query language - the schema of the source? execquery( in string query) schema is possibly unknown varies from source to source sources might not have a schema at all sources may change schema What is the result? - Must be a programming language construct described in IDL Possibility: use of class-restricted queries Ulf Leser: CORBA and Life Sciences, December 2002 40
Conclusions Trade-off: Comprehensiveness versus ease - Standard as least common denominator? - Sufficient power for all applications? Trade-off: Performance, comfort, usability - Sufficient performance requires caching and structs / OBV - Caching affects consistency - Structs are less elegant - OBV not yet commonly implemented Success? - Hype has gone: few implementations available - Performance! Ulf Leser: CORBA and Life Sciences, December 2002 41
Literature L. Wang, P. Rodriguez-Tome, N. Redaschi, P. McNeil, A. Robinson and P. Lijnzaad. Accessing and distributing EMBL data using CORBA (common object request broker architecture). Genome Biology, 1 (5): 2000. - G. Vossen. The CORBA Specification for Cooperation in Heterogeneous Information Systems. 1st Workshop on Cooperative Information Systems; LNCS 1202, Kiel, Germany, 1997. - S. Baker. CORBA and Databases - Do you really need both? Object Expert, May: 1996. Emmanuel Barillot, Ulf Leser, Philip Lijnzaad, Christophe Cussat-Blanc, Kim Jungfer, Fridiric Guyon, Guy Vaysseix, Carsten Helgesen and Patricia Rodriguez-Tome: "A Proposal for a Standard CORBA Interface for Genome Maps", Bioinformatics, 15(2), pp. 157-169. http://www.omg.org/lsr/ http://corba.ebi.ac.uk/ Ulf Leser: CORBA and Life Sciences, December 2002 42
Questions? Ulf Leser: CORBA and Life Sciences, December 2002 43