Cancer Biomedical Informatics Grid (cabig) V. Juggy Jagannathan Adjunct Professor of Computer Science West Virginia University VP of Research, MedQuist
cabig Initiative by the National Cancer Institute (NCI) Goal is to accelerate research discoveries by linking researchers, physicians and patients in the cancer community.
cabig community 50 cancer centers 30 federal, academic, not-for profit organizations Currently involves 900 researchers Community is organized into workspaces.
cagrid Cancer Biomedical Informatics Grid for cabig The computing infrastructure that the cabig program is going to rely on. It is based on a service-oriented architecture (SOA) It implements grid technologies cagrid 1.1 released in September
cagrid Logical View Registry Security GRID Infrastructure Service/Data Suppliers Service/Data Consumers
Technologies & Components used by cagrid Globus Toolkit [Grid infrastructure] Mobius GME Cancer Common Ontology Representation Environment (cacore) ActiveBPEL GridGrouper part of the Grid Authentication and Authorization
GRID Basics Fundamentally, it is an approach to tap the power and storage capacity that exists and is unused in millions and millions of computers across the globe and put it to real scientific use such as finding a cure for cancer! Relies on the following basic themes Resource sharing on a global scale Secure access and managing trust Efficient use of resources over the internet Reliance on open and interoperable standards Example application: SETI@home: the search for signs of extraterrestrial intelligence, the mother of all @home applications Source: http://gridcafe.web.cern.ch/gridcafe/index.html
Globus Toolkit A collection of components to help with achieving the vision of GRID computing Practical approach to dealing with heterogeneity in computing Standards used in the plumbing: SSL/TLS v1 LDAP v3 X.509 Proxy Certificates SOAP HTTP GridFTP OGSI [Open Grid Services Infrastructure] http://www.globus.org/
Mobius GME * Three Core Services GME (Global Model Exchange) a DNSlike data definition registry service Mako, a service that helps to create, integrate, validate various database instances and support federated queries DTS data translation service to translate data in one form to another based on knowledge of the respective * http://projectmobius.osu.edu/overview.php models
cacore (Cancer Common Ontology Representation Environment) Consists off: Cancer Bioinformatics Infrastructure Objects (cabio) Cancer Data Standards Repository (cadsr) Enterprise Vocabulary Services (EVS) Common Security Model (CSM) Common Logging Module (CLM)
cacore Infrastructure General Approach Use of Model Driven Architecture Adoption of an N tier architecture Use of controlled vocabularies Use of registration of metadata (for interoperability) Source: http://ncicb.nci.nih.gov/ncicb/training/cadsr_training/courseoffering Course# 1000
cabio Cancer Bioinformatics Infrastructure Objects (cabio) a set of JavaBeans with an open API (webservices API) that provide access to data
Cancer Data Standards Repository (cadsr) A metadata registry used to register descriptive information needed to understand, interoperate and reuse cancer data. Implementation of ISO/IEC 11179 Metadata standard
cadsr implementation of ISO Source: http://ncicb.nci.nih.gov/ncicb/training/cadsr_training/courseoffering Course# 1010
Enterprise Vocabulary Services (EVS) Vocabularies Used NCI Thesaurus GO (Gene Ontology) LOINC MGED (Microarray Gene Expression Data) MedDRA (Medical Dictionary for Regulatory Activities) SNOMED For more info: http://ncicb.nci.nih.gov/ncicb/training/cadsr_training/courseoffering Course# 1030
cagrid Data Description Infrastructure Core Services Registered In Registered In GME Cancer Data Standards Repository Semantically Described In Enterprise Vocabulary Services Global Model Exchange Object Definitions WSDL Data Type Definitions XSD Client Service Object Definitions Service Definition Validates Against Client Uses Service API Objects Grid Service Objects Serialize To XML Grid Client Objects Client API Source: http://www.cagrid.org/mwiki/index.php?title=cagrid:tutorials
ActiveBPEL Open source engine implementing the BPEL standard in Java BPEL Business Process Execution Language an XML-based language for defining workflows BPEL implements workflows and utilize web services in carrying out tasks and activities defined in the business process Look at this web site for a simple tutorial on the concepts: http://www.active-endpoints.com/open-source-tuto Source: http://www.active endpoints.com/active bpel engine overview.htm
Security Infrastructure
Grid Authentication and Authorization with Reliably Distributed Services (GAARDS) Developed on top of Globus Toolkit Based on Grid Security Infrastructure Provides for: Grid user management Identity federation Trust management Group management Access control and policy management Extension of local security domains to grid security domains. Source: http://www.cagrid.org/mwiki/index.php?title=gridgrouper:technical_resources
GAARDS Components DORIAN user account management Grid Trust Service (GTS) creating the trust infrastructure in the federated environment Grid Grouper group management Authentication service federated service Common Security Module (CSM) access control policy enforcement Security Metadata meta data that communicates the security requirements to support interoperability
GAARDS Architecture Source: http://www.cagrid.org/mwiki/index.php?title=image:gaards.png
GridGrouper Based on the Grouper open source toolkit developed under Internet2 framework: [see: GROUPER.INTERNET2.EDU ] Supports: Group management by distributed authorities Subgroups, composite groups, custom groups
Grouper Case Study Duke Case Study of use of Grouper: http://www.internet2.edu/pubs/middleware-cs-grp_duke.pdf Some quotes from this study: After assessing the current Identity Management infrastructure, with the assistance of key campus stakeholders, Duke s IT staff determined that the Grouper Groups Management Toolkit, developed by the Internet2 Middleware Initiative, would best meet their need for unified group management. Grouper has helped Duke implement a comprehensive group management system and scale their infrastructure to support over 100,000 course, dynamic and local groups. The system supports over 1,1 million group membership entries used to control access to and enhance interaction with various applications, such as a lecture capture software called Lectopia and online personal and shared storage using WebFiles. Individuals can also view their memberships and enjoy
Grid Grouper Architecture Source: http://www.cagrid.org/mwiki/index.php?title=image:gridgrouper.jpg
cagrid Concluding Comments Flexible, scalable infrastructure built on top of GRID technology Use of Model Driven Architecture (MDA) to develop a collection of services which can interoperate Scientific community is already building a whole host of applications that leverage this infrastructure in their fight against cancer. For a sample set of applications go to: http://www.cagrid.org/mwiki/index.php?title=main_