Realization of Inventory Databases and Object-Relational Mapping for the Common Information Model

Similar documents

Object Oriented Database Management System for Decision Support System.

Cost Savings Solutions for Year 5 True Ups

Integration of Nagios monitoring tools with IBM's solutions

Managing Large Imagery Databases via the Web

White Paper: 5GL RAD Development

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

LinuxWorld Conference & Expo Server Farms and XML Web Services

The following sections provide information on the features and tasks of Server Inventory:

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

Server Consolidation with SQL Server 2008

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

MS Design, Optimize and Maintain Database for Microsoft SQL Server 2008

Dell and JBoss just work Inventory Management Clustering System on JBoss Enterprise Middleware

Chapter 2 TOPOLOGY SELECTION. SYS-ED/ Computer Education Techniques, Inc.

Papermule Workflow. Workflow and Asset Management Software. Papermule Ltd

Distributed Database Access in the LHC Computing Grid with CORAL

Online Transaction Processing in SQL Server 2008

Implement Hadoop jobs to extract business value from large and varied data sets

Introduction to Object-Oriented and Object-Relational Database Systems

OBJECTS AND DATABASES. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 21

Chapter 18: Database System Architectures. Centralized Systems

EnterpriseLink Benefits

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

Microsoft Private Cloud Fast Track

Reference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION

Chapter 2: Remote Procedure Call (RPC)

The Sierra Clustered Database Engine, the technology at the heart of

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

2) Xen Hypervisor 3) UEC

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Database Services for CERN

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

MS SQL Server 2014 New Features and Database Administration

COMP5426 Parallel and Distributed Computing. Distributed Systems: Client/Server and Clusters

A Performance Analysis of Distributed Indexing using Terrier

White Paper: 1) Architecture Objectives: The primary objective of this architecture is to meet the. 2) Architecture Explanation

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

High Level Design Distributed Network Traffic Controller

Cloud Computing and Advanced Relationship Analytics

Getting Started with HC SharePoint Module

The EMSX Platform. A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks. A White Paper.

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

Hadoop 只支援用 Java 開發嘛? Is Hadoop only support Java? 總不能全部都重新設計吧? 如何與舊系統相容? Can Hadoop work with existing software?

CLOUD COMPUTING. When It's smarter to rent than to buy

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

A Brief. Introduction. of MG-SOFT s SNMP Network Management Products. Document Version 1.3, published in June, 2008

WebSphere ESB Best Practices

Cloud Optimize Your IT

ORACLE COHERENCE 12CR2

Apache HBase. Crazy dances on the elephant back

Rackspace Cloud Databases and Container-based Virtualization

Shark Installation Guide Week 3 Report. Ankush Arora

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models

Maintaining Non-Stop Services with Multi Layer Monitoring

The Data Quality Monitoring Software for the CMS experiment at the LHC

Cisco and EMC Solutions for Application Acceleration and Branch Office Infrastructure Consolidation

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

MEng, BSc Applied Computer Science

Microsoft Operations Manager 2005

JBoss & Infinispan open source data grids for the cloud era

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

Building Reliable, Scalable AR System Solutions. High-Availability. White Paper

Open Source egovernment Reference Architecture Osera.modeldriven.org. Copyright 2006 Data Access Technologies, Inc. Slide 1

Inside the Digital Commerce Engine. The architecture and deployment of the Elastic Path Digital Commerce Engine

Windows Server 2003 Migration Guide: Nutanix Webscale Converged Infrastructure Eases Migration

Moving From Hadoop to Spark

Mission-critical HP-UX 11i v2 WebSphere Reference Architecture White Paper

BarTender Integration Methods. Integrating BarTender s Printing and Design Functionality with Your Custom Application WHITE PAPER

High Availability with Elixir

Module 12: Microsoft Windows 2000 Clustering. Contents Overview 1 Clustering Business Scenarios 2 Testing Tools 4 Lab Scenario 6 Review 8

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

Desktop Virtualization Technologies and Implementation

Using FlashNAS ZFS. Microsoft Hyper-V Server Live Migration

Migrating from Unix to Oracle on Linux. Sponsored by Red Hat. An Oracle and Red Hat White Paper September 2003

Virtualizing Exchange

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

An Oracle White Paper January A Technical Overview of New Features for Automatic Storage Management in Oracle Database 12c

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Big Data With Hadoop

P16_IBM_WebSphere_Business_Monitor_V602.ppt. Page 1 of 17

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces

What s new in TIBCO Spotfire 6.5

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University

Infrastructure. Oren Barshishat IT Support Manager. Adi Finkels Customer Support Manager. Sparta Systems

Configuring and Deploying a Private Cloud

DataDirect XQuery Technical Overview

HDFS Cluster Installation Automation for TupleWare

PageScope Enterprise Suite 3.0

Transcription:

Realization of Inventory Databases and Object-Relational Mapping for the Common Information Model Department of Physics and Technology, University of Bergen. November 8, 2011 Systems and Virtualization Management: Standards and the Cloud

Outline 1 Introduction Background High Level Trigger 2 3

Background High Level Trigger Introduction

ALICE detector system Figure: Illustration of the ALICE detector system from the ALICE web pages.

LHC/CERN Figure: Illustration of LHC from CERN document server.

Background High Level Trigger High Level Trigger What is triggering? Tell data acquisition systems when something interesting happens so that it can be read-out. Done in several levels: lower levels in front-end electronics, while high level or event filtering typically is done in software on high-performance compute clusters.

Background High Level Trigger Motivation ALICE can produce as much as ~25 GB/s, but bandwidth to storage is limited to ~1.25 GB/s. Much of the data is not new physics and it can therefore be discarded. How is it done? Selection of interesting events: Event filtering or triggering. Selection of interesting parts of events: Region of Interest. Compression of events: Reconstruction and lossless compression.

Background High Level Trigger ALICE HLT cluster Online system Online system, not a batch farm. Cannot rerun when something goes wrong. Any faults have direct impact on the physics results. Process placement Challenge Hierarchical and distributed processing. Process placement important aspect. Must have reliable and stable operation. But using commodity hardware + complex software configuration => have to build in resilience for failure.

Cluster Figure: Picture of cluster.

Usefulness of Inventory Databases Inventory database To know which part to order when something breaks. To keep track of systems when reorganizing. To have relevant and up to date information available when debugging. Former solution Typical LAMP stack with a manually maintained database. Too much work to keep information up-to-date with the available man-power.

Advantages of automatic updating Automatic process placement of the distributed physics application. Implementing fault-tolerance with fail-over procedures. Increased consistency in inventory data. No room for human mistakes. Reduced cost as the cluster administrator is relieved from a tedious and laborious task. System management and monitoring.

Requirements Introduction Functional Provide up to date information about the state of the cluster for the physics applicaton. System management and monitoring applications need inventory information. Non-functional Support the programming languages that are in use in HLT: Python, Java, C++. Scalability. HLT was designed for a maximum of around 1000 nodes. Currently around 200. Reliability. Need to ensure availablility of inventory information through use of redundancy or clustering.

Choosing an approach Options Use existing CIM implementations (sblim, openpegasus, OpenWBEM). Use own model with ORM. Use CIM model with own implementatino in ORM. Needed support for multiple programming languages. Wanted tight integration with existing software. Needed only information gathering, not complete instrumentation. RDBS in the back-end would make it easier to implement high-availability and scalability.

Object-relational mapping Object-relational mapping is a programming technique for persisting data from object-oriented programming languages to relational databases. Software packages that implements this conversion in a generic way, so that it can be used for any object hierarchy, are called object relational mappers. Examples Hibernate for Java was one of the first of such mappers and is maybe also the most well-known. Sqlalchemy for python is an ORM, that includes a declarative way of defining objects, further simplifying the process of mapping objects to databases.

Advantages of ORM Relational Database Systems: transactionality and enhanced data integrity. Object-Oriented programming: Reuse and Modularity.

Model hierarchies and level of mapping Two options: Mapping at the level of CIM meta schema: flexible, but no intuitive mapping between CIM classes and programming language classes. Lack of constraints. Mapping at the level of CIM schema: less flexible, but more natural mapping between CIM classes and OOP classes. Decision: map at level of CIM schema, counter less flexibility by automatic generation. Makes it as flexible as mapping at meta level.

Applying ORM to an existing model The typical mappings one have to consider when applying an ORM to a model, are: Model to relational database schema. Model to class definitions in target programming language. The actual ORM mapping between the relational database schema and the class definitions in the programming language.

Mapping details Introduction Tree aspects to consider for ORM: Mapping of structure. Classes and properties: Normally a straight-forward mapping from CIM model. Datatypes: relatively easy to find corresponding data types in both programming languages and relational databases. Inheritance: This is where it gets harder. Has to decide for common type of mapping. Mapping of constraints. Qualifiers: for instance maximum values for integers or lenght of strings. Mapping of behavior. Methods: This we do not consider as it is not needed in our environment.

Major tasks Introduction The effort of designing the inventory solution can be divided in two major tasks: 1 Creating a programming language representation of a CIM model instance as well as the corresponding object-relational mapping. 2 Automatically keeping the data in the CIM model instance up to date.

Two implementations Two implementations have been developed: For supporting more than one programming language: python and java. To be able in the future to test interoperability between solutions.

Mapper generation tools 1/2 C:21,/0".$,/0".$12$($3"."(% 4''# 3"("5$%" :(+%$(/"12$($3"5 ;6$7$18D:> 6$7$ 899#&/$%&'( A&*"5($%",?@ :(+%$(/"+1 ;6$7$1<*="/%+> 2<E!"#$%&'($# )$%$*$+",-+%".,?@ 2<E1C'.9&#"5 3"("5$%",B#$#/0".- C&.9- ;D-%0'(18D:> :(+%$(/"+1 ;D-%0'(1<*="/%+> D-%0'( 899#&/$%&'( Figure: Overview of the tools developed.

Mapper generation tools 2/2 Mapper generation tools For Python, the mapper generation tool is implemented as a simple back-end to the SBLIM MOF compiler that outputs ORM-enabled python code. For Java this is a flexible framework with a lot of functionality (Chainreaction). For more information, see the references of the paper. Inventory gathering systems The python inventory solution is a stand-alone inventory gathering system for compute clusters. The Java solution is a module that is integrated in SysMES, where the framework takes care of the client/server

Inventory gathering system!"#$%& '()$"*+%,&-%.$, -%)*/-&- '()$"!"#$%& '()$"*+%,&-%.$, -%)*/-&- +%0$%&(12*3$10$1,&(1$*5* 67)-&$ '()$"*/4,&(1$*5* 67)-&$ Figure: The two inventory gathering systems.

Data access 1/2 Figure: Four ways to use the CIM. Figure taken from Common Information Model (CIM) Infrastructure - DSP0004.

Data access 2/2 Introduction Exchange of inventory information at both SQL- and ORM-level. Provide XML-RPC (or similar) bindings created by introspection of the generated CIM instance to extend compatibility to languages without ORM. The client/server transport of the inventory gathering solution is a forth option for data access. The ways of access inventory information is summarized in the table in the next slide:

Summary of access methods Table: Comparison of the different access methods to the inventory data. Access method ORM Inventory Gathering System Data operations CRUD and complex queries CRUD Interoperability OO-like client side Limited to languages with ORM solutions Depends on implementation XML-RPC CRUD All common languages SQL CRUD and All common complex languages queries Web service CRUD All common languages Yes hwdiscover: Yes SySMES: No No No No

Scalability and reliability Typically with ORMs, one can choose which RDBS should be used for persisting objects. This means that high-availability and scalability can be achieved by careful selection and configuration of database solution, mostly independently of the rest of the system.

Python implementation Two differences Future Used together with the PerspectiveBroker of the twisted framework, to implement a Remote Persistent Object. Uses declarative module of sqlalchemy -> only needs to provide the class definitions and the declarative mapper does the rest. Make use of inotify for updating information. Extend the inventory gathering system with resource management capabilities.

Conclusion Introduction Normal relational databases can be used for storing an instance of the CIM model. Transactionality is handled by the database/orm layer, don t need to think about this in code. Makes it easier to code concurrent, distributed applications (with a central data storage). Can use features already present in relational database products. Most important for us: High-reliability: i.e. can choose database with clustering support Scalability: can go from convenient unittesting in sqlite to mysq/postgresql/oracle for deployment. Native and intuitive language bindings due to automatic creation of ORM-enabled code.

Results Only the functionality of the python implementation has been tested on the production cluster. Works as expected. Information is gathered and the databasse is filled. No performance or interoperability tests have been done so far on the production cluster. Limited performance testing has been done on smaller setups. sqlite in memory is twice as fast as on disk. The java implementation has been tested in a test cluster in Heidelberg.

Outlook The tasks for the near future will be extend the infrastructure that can be used to collaboratively work on increasing the interoperability of the CIM+ORM frameworks through unification of the mapping and its configuration. Facilities that can be used to automate conformance testing, interoperability testing, unisttesting, package building and more. If the goal of supporting many programming languages working on the same database at the same time (given generators are provided) is reached, it will dramatically improve the interoperability on the data persistence layer.