Continuous Integration for XML and RDF Data



Similar documents
Continuous Integration for XML and RDF Data

From monolithic XML for print/web to lean XML for data: realising linked data for dictionaries

Building a Continuous Integration Pipeline with Docker

Pipeline Orchestration for Test Automation using Extended Buildbot Architecture

UTILIZING THE PORTABILITY OF DOCKER TO DESIGN A PORTABLE AND SCALABLE CONTINUOUS INTEGRATION STACK

Unit Testing webmethods Integrations using JUnit Practicing TDD for EAI projects

Intro to Docker and Containers

Fast Feedback: Jenkins + Functional and Non-Functional Mobile App Testing Without Pulling Your Hair

SOFTWARE DEVELOPMENT BASICS SED

DevOps. Building a Continuous Delivery Pipeline

Jenkins World Tour 2015 Santa Clara, CA, September 2-3

Platform as a Service and Container Clouds

Mobile App Development: The CD Recipe Jenkins + Functional and Non-functional Testing + Real Devices. Carlo Cadet, Director, Technical Evangelists

From Traditional Functional Testing to Enabling Continuous Quality in Mobile App Development

Linstantiation of applications. Docker accelerate

Ikasan ESB Reference Architecture Review

Jenkins Slave Cloud with Apache Mesos. Klaus Azesberger Reinhard Kiesswetter Infonova GmbH

Effective feedback from quality tools during development

DevOps Course Content

SAS in clinical trials A relook at project management,

Building Value with Continuous Integration

A central continuous integration platform

Making Cloud Portability a Practical CTO & Founder GigaSppaces natishalom.typaped.com

Continuous Integration (CI) for Mobile Applications

LSKA 2010 Survey Report Job Scheduler

CI for FPGA D&V. Continuous Integration for FPGA Design and Verification Verification Futures Alan Fitch, Ericsson TV Ltd

Getting Started Using Project Photon on VMware Fusion/Workstation

Bioinformatics for programmers

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising

What Is Microsoft Private Cloud Fast Track?

Building Value with Continuous Integration

State-of-the-Art ENTERPRISE JAVA APPLICATIONS WITH SPRING BOOT

Beginners guide to continuous integration. Gilles QUERRET Riverside Software

Automating Security Testing. Mark Fallon Senior Release Manager Oracle

Intro to Docker for CMS

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

Part 2: The Neuron ESB

Openshift for Continuous Integration

Continuous Keylane

"Build and Test in the Cloud "

Automation using Selenium

80% 50x. 30x. CASE STUDY: How WaveMaker Got Faster, Better, More Agile with Docker. Lower Costs. Better Performance. Greater App Density

The CIO s Dream: A Cloud Platform With Lower Cost, More Agility and Better Performance. A publication by:

Development Testing for Agile Environments

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Cross-Platform ASP.NET 5 For the Cloud. Anthony Sneed

In this training module, you learn how to configure and deploy a machine with a monitoring agent through Tivoli Service Automation Manager V7.2.2.

Automation & Open Source. How to tame the Cloud?

Do DevOps on VMware vcloud Air Your Way, Without the Rework! Ashok Aletty, vcloud Air Solution Architect

SUCCESFUL TESTING THE CONTINUOUS DELIVERY PROCESS

Practicing Continuous Delivery using Hudson. Winston Prakash Oracle Corporation

Chris Rosen, Technical Product Manager for IBM Containers, Lin Sun, Senior Software Engineer for IBM Containers,

INCREASE YOUR WEBMETHODS ROI WITH AUTOMATED TESTING. Copyright 2015 CloudGen, LLC

Continuous Integration

Cloud3DView: Gamifying Data Center Management

The Purview Solution Integration With Splunk

FlexPod from Cisco and NetApp:

When Does Colocation Become Competitive With The Public Cloud? WHITE PAPER SEPTEMBER 2014

When Does Colocation Become Competitive With The Public Cloud?

PHP in RPM distribution

CONTINUOUS INTEGRATION. Introduction

Application Release Automation (ARA) Vs. Continuous Delivery

Enabling IT Agility with an Open Hybrid Cloud

RED HAT CLOUD SUITE FOR APPLICATIONS

Parasoft and Skytap Deliver 24/7 Access to Complete Test Environments

Assignment # 1 (Cloud Computing Security)

Benchmarking the SDN controller!

This presentation introduces you to the Decision Governance Framework that is new in IBM Operational Decision Manager version 8.5 Decision Center.

Automating Big Data Benchmarking for Different Architectures with ALOJA

WHITE PAPER Redefining Monitoring for Today s Modern IT Infrastructures

BuildBot. S.Cozzini/A.Messina/G.Giuliani. And Continuous Integration. RegCM4 experiences. Warning: Some slides/ideas.

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

CI Pipeline with Docker

ArcGIS for Server in the Cloud

DevOps with Containers. for Microservices

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

"Cloud Computing: Powering the Future of Testing"

Continuous Integration and Automatic Testing for the FLUKA release using Jenkins (and Docker)

Solving the dev-ops dilemma: Deploying cloud workloads in minutes

How to Create a Flexible CRM Solution Based on SugarCRM in a vcloud Environment. A VMware Cloud Evaluation Reference Document

The Virtualization Practice

Continuous Integration For Fusion Middleware

Massively! Continuous Integration! A case study for Jenkins at cloud-scale

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

Build Automation for Mobile. or How to Deliver Quality Apps Continuously. Angelo Rüggeberg

DRUPAL CONTINUOUS INTEGRATION. Part I - Introduction

Rudder. Sharing IT automation benefits in a team with Rudder. Benoît Peccatte bpe@normation.com. Normation Tous droits réservés normation.

High Speed Transfers Using the Aspera Node API! Aspera Live Webinars November 6, 2012!

Organise Your Business

More about Continuous Integration:

Hardware/Software Guidelines

Continuous Integration and Bamboo. Ryan Cutter CSCI Spring Semester

Software Construction

Only Athena provides complete command over these common enterprise mobility needs.

Introduction to Arvados. A Curoverse White Paper

ISLET: Jon Schipp, Ohio Linux Fest An Attempt to Improve Linux-based Software Training

View Point. Overcoming Challenges associated with SaaS Testing. Abstract. - Vijayanathan Naganathan, Sreesankar Sankarayya

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

Network Virtualization Solutions - A Practical Solution

Transcription:

Continuous Integration for XML and RDF Data Sandro Cirulli Language Technologist Oxford University Press (OUP) 6 June 2015

Table of contents 1. Context 2. Continuous Integration with Jenkins 3. Automatic Deployment with Docker 4. Future Work

Oxford University Press Context 3/19 Oxford University Press (OUP) is a world-renowned dictionary publisher OUP launched the Oxford Global Languages (OGL) initiative to digitize under-represented languages Language data is converted into XML and RDF

Where we started from Challenges 4/19 OUP dictionary data was originally developed for print products OUP acquired dictionaries from other publishers in various formats Data conversions were performed by freelancers using various programming languages, tools, and development environments No testing, no code reuse

Our aim 5/19 Produce lean, machine-interpretable XML and RDF Leverage Semantic Web technologies for linking and inference Convert tens of language resources in a scalable, maintainable, and cost-effective manner

Continuous Integration What it is Continuous Integration (CI) is a software development practice where a development team commits their work frequently and each commit is integrated by an automated build tool detecting integration errors CI requires a build server to monitor changes in the code, run tests, build, and notify developers We use Jenkins as it is the most popular open-source CI server

Continuous Integration Workflow and components 7/19

Continuous Integration Nightly Builds 8/19 Nightly builds are automated builds scheduled on a nightly basis We currently builds XML and RDF for 7 datasets Nightly builds currently take on average 5 hours on a multi-core Linux machine with 132 GB RAM Builds are parallelized using 8 cores

Continuous Integration Unit Testing 9/19 XSpec for XSLT code RDFUnit for RDF data XProcspec for XProc pipeline Test results are converted into JUnit reports via XSLT Unit tests are run shortly after a developer commits the code

Continuous Integration Monitor View

Continuous Integration Benefits of CI 11/19 Code reuse: on average, 70-80% of the code could be reused for new XML/RDF conversions Code quality: regression bugs are avoided Bug fixes: bugs are spotted quickly and fixed more rapidly Automation: no manual steps, faster and less error-prone build process Integration: reduced risks, time, and costs for integration with other systems

Continuous Integration Jenkins Demo 12/19

13/19 Automatic Deployment with Docker Docker Docker is an open source platform for deploying distributed applications running inside containers Docker provides development and operational teams with a shared, consistent environment for development, testing, and release Docker avoids the classic but it worked on my machine issue Docker allows applications and their dependencies to be moved portably across development and production environments

Docker Containers 14/19

Automatic Deployment with Docker Dockerfile 15/19 FROM platform_base MAINTAINER Sandro Cirulli <sandro.cirulli@oup.com> # exist-db version ENV EXISTDB_VERSION 2.2 # install exist WORKDIR /tmp RUN curl -LO http://downloads.sourceforge.net/exist/ Stable/${EXISTDB_VERSION}/eXist-db-setup-${ EXISTDB_VERSION}.jar ADD exist-setup.cmd /tmp/exist-setup.cmd # run command line configuration RUN expect -f exist-setup.cmd

Automatic Deployment with Docker Dockerfile (cont.) RUN rm exist-db-setup-${existdb_version}.jar existsetup.cmd # set persistent volume VOLUME /data/existdb WORKDIR /opt/exist # change default port to 8008 RUN sed -i s/default="8080"/default="8008"/g tools/ jetty/etc/jetty.xml EXPOSE 8008 8443 ENV EXISTDB_HOME /opt/exist CMD bin/startup.sh 16/19

Future Work 17/19 Scalability: cloud instances to run compute-intensive processes, distribute builds across slave machines Availability: Circuit Breaker Design Pattern Code coverage: lack of code coverage tools for XSLT (XSpec and Cakupan are the best we could find) Deployment orchestration: docker-compose to orchestrate Docker containers

Acknowledgements 18/19 The work described here was carried out by a developers team at OUP: Khalil Ahmed Nick Cross Matt Kohl and myself

Thank you for your attention! Any questions? Slides available at: www.sandrocirulli.net/xml-london-2015 Contact me at: sandro.cirulli@oup.com