DATA ARCHIVING: MAKING THE MOST OF NEW TECHNOLOGIES AND STANDARDS



Similar documents
Bringing Big Data into the Enterprise

CLOUD AND YOUR CORE SYSTEMS. An Evolutionary Approach

Business Usage Monitoring for Teradata

RESPONSES TO QUESTIONS AND REQUESTS FOR CLARIFICATION Updated 7/1/15 (Question 53 and 54)

Data Warehouse as a Service. Lot 2 - Platform as a Service. Version: 1.1, Issue Date: 05/02/2014. Classification: Open

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

SharePlex for SQL Server

Jitterbit Technical Overview : Microsoft Dynamics CRM

SOLUTION BRIEF. JUST THE FAQs: Moving Big Data with Bulk Load.

IBM Optim. The ROI of an Archiving Project. Michael Mittman Optim Products IBM Software Group IBM Corporation

Our Cloud Offers You a Brighter Future

Release: 1. ICADBS601A Build a data warehouse

Cloud Storage and Backup

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA)

Global Enterprise Business Management Platform Interactive, Intelligent with Controls to Ensure Profit

Contents. Overview. The solid foundation for your entire, enterprise-wide business intelligence system

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

CA Workload Automation Agents for Mainframe-Hosted Implementations

Cost-Effective Business Intelligence with Red Hat and Open Source

Access Database Hosting. An introduction to Cloud Hosting Access databases from Your Office Anywhere

An Oracle White Paper May Oracle Database Cloud Service

Product Overview. UNIFIED COMPUTING Managed Hosting - Storage Data Sheet

Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led

Oracle Database 11g: New Features for Administrators DBA Release 2

Objectif. Participant. Prérequis. Pédagogie. Oracle Database 11g - New Features for Administrators Release 2. 5 Jours [35 Heures]

SAP Sybase SQL Anywhere New Features Improve Performance, Increase Security, and Ensure 24/7 Availability

Product Overview. Payroll & Personnel Data Warehouse

Importance of Data Abstraction, Data Virtualization, and Data Services Page 1

Jitterbit Technical Overview : Salesforce

Backup and Recovery for SAP Environments using EMC Avamar 7

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

AGILE DEVELOPMENT AND YOUR CORE SYSTEMS. Adaptive Integration

SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box)

Optimal Planning Software Platform Development with Cloud Computing Technology

ENTERPRISE EDITION ORACLE DATA SHEET KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR

Choosing A CMS. Enterprise CMS. Web CMS. Online and beyond. Best-of-Breed Content Management Systems info@ares.com.

IT Workload Automation: Control Big Data Management Costs with Cisco Tidal Enterprise Scheduler

Jet Enterprise Frequently Asked Questions Pg. 1 03/18/2011 JEFAQ - 02/13/ Copyright Jet Reports International, Inc.

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Oracle 11g New Features - OCP Upgrade Exam

Realizing the Value Proposition of Cloud Computing

Virtualization and the U2 Databases

Minder. simplifying IT. All-in-one solution to monitor Network, Server, Application & Log Data

Unico Enterprise Big Data

SAP INTEGRATION APPROACHES

Oracle Big Data SQL Technical Update

Configuration and Development

Data Mining Commonly Used SQL Statements

YOUR APP. OUR CLOUD.

ebusiness Web Hosting Alternatives Considerations Self hosting Internet Service Provider (ISP) hosting

Product Summary of XLReporter with OPC Servers

INTRODUCING ORACLE APPLICATION EXPRESS. Keywords: database, Oracle, web application, forms, reports

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

How To Write An Ets Request For Proposal (Rfp)

Batch Historian. Introduction. Benefits. Configuration-free, batch-based data collection. Reliable data retrieval through data buffering

Meeting the Needs of Database Management for SaaS: Oracle Database 12c

Powerful Management of Financial Big Data

Managed Storage Services

DEVELOP ROBOTS DEVELOPROBOTS. We Innovate Your Business

The Technology Evaluator s Cheat Sheets. Business Intelligence & Analy:cs

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Client. Applications. Middle Tier. Database. Infrastructure. Leading Vendors

FileDrawer An Enterprise File Sharing and Synchronization (EFSS) solution.

JOURNAL OF OBJECT TECHNOLOGY

Sisense. Product Highlights.

Understanding Object Storage and How to Use It

Using In-Memory Data Fabric Architecture from SAP to Create Your Data Advantage

ICE for Eclipse. Release 9.0.1

Data Backup and Restore (DBR) Overview Detailed Description Pricing... 5 SLAs... 5 Service Matrix Service Description

The Private Cloud Your Controlled Access Infrastructure

Keyfort Cloud Services (KCS)

EnergySync and AquaSys. Technology and Architecture

County of Los Angeles. Chief Information Office Preferred Technologies for Geographic Information Systems (GIS) September 2014

PATROL From a Database Administrator s Perspective

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

INTRODUCTION TO CASSANDRA

Product Overview. UNIFIED COMPUTING Managed Hosting Compute

All can damage or destroy your company s computers along with the data and applications you rely on to run your business.

DATA MINING AND WAREHOUSING CONCEPTS

An Oracle White Paper June Oracle Database Firewall 5.0 Sizing Best Practices

Cloud: App-Centric Scalability, Availability, Reliability and Security. Prakash Sinha, Director, Product Management October 27, 2009

Chapter 14: Databases and Database Management Systems

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Inmagic Content Server Workgroup Configuration Technical Guidelines

Cars on the Ground, Customers in the Clouds. Scaling a Website While Enhancing Innovation

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK tel:

Your Data, Any Place, Any Time.

Requirements Checklist for Choosing a Cloud Backup and Recovery Service Provider

EMERGING TRENDS Business Process Management

Introduction to Database as a Service

Synchronization Agent Configuration Guide

Digital Pathways. Harlow Enterprise Hub, Edinburgh Way, Harlow CM20 2NQ

Mobile App Containers: Product Or Feature?

Big Data Analytics - Accelerated. stream-horizon.com

The ESB and Microsoft BI

What You Need to Know About Transitioning to SOA

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

EOH Cloud Services - EOH Cloud Backup - Server

Transcription:

DATA ARCHIVING: MAKING THE MOST OF NEW TECHNOLOGIES AND STANDARDS Reducing the cost of archiving while improving the management of the information Abstract Future proofed archiving to commodity priced on premise hardware, or private / public Cloud.

Contents Management Overview... 2 Data Archiving... 2 Attributes of Data to be archived... 2 Benefits... 3 Additional Issues... 3 New technologies and their impact... 3 New standards and their impact... 4 Reference Architecture to achieve this... 5 Ostia s implementation of this architecture... 7 How can Ostia help?... 7

Management Overview The term Big Data has being coined relatively recently implying that managing a large amount of data is a new thing, however, this couldn t be further from the truth. Traditional IT centric organizations such as banking, insurance and government organizations have been dealing with very large data sets for many years. In fact there is an added complication for these organizations that in the vast majority of cases, data governance legislation mandates that much of this information must be maintained for a specified number of years depending on the data. This has been managed in the past by using propriety technologies to archive and maintain these data sets which have had the impact of locking organizations into those proprietary technologies. The costs associated with this model are increasing yearly as more and more data must be archived in this way. In addition, the proprietary nature of these solutions makes it hard to integrate them with the open and cost effective technologies that are widely available today and can help to simplify this process. New technologies and standards that are now out there can help significantly reduce the costs of archiving transactional data while at the same time making it fully accessible real time and through various open source solutions. Data Archiving Some organizations have an active policy of archiving data and have been doing this for many years. However, all organizations are archiving data without actually thinking through the consequences. In some cases, archiving is achieved by storing the data in a data warehouse created for some other purpose and thus by default having an archived copy, while others just let their databases continue to grow and thus archive the data in their transactional databases. In the first case where organizations are actively archiving data, they already know the benefits and are archiving specifically to gain those benefits. In the case where data warehouses are used as archives, many of the benefits are being missed to a degree while the last case where databases are continually allowed to grow misses all of the benefits of data archiving and is simply not sustainable into the future. In the last two cases at least, the cost of archiving data in this way is far more than it has to be. Attributes of Data to be archived Consider the attributes of data that is generally archived: - It is a record of something that has happened and therefore the record itself will never change in the future. - It is unlikely to be needed for day to day operational requirements and in most cases is maintained because it must be available in the future if someone requests to see it.

- It does not have to be available real time. Benefits Transactional systems must be robust, resilient and scalable and thus will use bullet proof database and hardware technologies to support them. These systems must be available 24/7 and the costs associated with them understandably reflect that. For this reason, there are many reasons why explicit policies to archive data out of these systems provide benefits to an organization: - Maintaining read only data in an active transactional database is a very expensive way to store data which could potentially be archived. - This read only data will lead to increased index sizes to data and can potentially have an impact on the performance of the database for various queries that use these indexes. - This read only data is being backed up every time the database is being backed up thus leading to longer backup times, larger backups and longer batch windows which are actually getting shorter with many organizations wishing to eliminate them completely. - The archived data can potentially be used for analytic processing without impacting the transactional systems or any data warehousing activity. Additional Issues The physical archiving of data is perhaps only half the battle. There are further issues that must be dealt with around the data being archived: - Locating information from the archived data can itself be a challenge due to the length of time that may have elapsed since data was archived and the amount of archived data. - The application that made sense of the data may have been retired and only the raw data available so it may be difficult to interpret the data a number of years after it has been archived. - The structure of the data can also change over time perhaps with the addition of new columns. The ability to sensibly show different versions of the archived data will be problematic even if the application that can display the data is still available as it is likely to only understand the newest structure of the data. New technologies and their impact The cost of storage has been dropping exponentially in the past number of years yet many archiving solutions, either explicitly or in the way that they have been implemented; depend on expensive proprietary hardware and software. Often, these platforms are designed to be as robust, resilient and scalable as the transactional data platform but such service levels are not required for data that is being archived.

As the archived data will be read only, there is no need for best of breed technologies to manage the data. Low cost commodity hardware and freeware software offer a cost effective platform on which to store the archived data. Once written to the platform, it can immediately be backed up such that if the hardware fails for some reason, it can easily be retrieved from the back up version. There are further benefits for organizations wishing to use public or private clouds to store the data. This again is a cost effective way to store the data as the cloud based servers can be created and left idle most of the time and only paid for when they need to be fired up for use. Where data residency is an issue, using a private cloud based solution can address this concern. In addition, many of the larger cloud providers will offer guarantees as to data residency as part of their service level agreements (SLAs). These new technologies offer the ability to significantly reduce the cost of actually storing the archived data. New standards and their impact The Internet has changed the world since its introduction and the body that has overseen this change, the World-Wide Web Consortium (W3C) and others have delivered further related standards which can be used in archiving solutions: - XML offers the ability to represent data from multiple platforms and technologies in a single, standard way. - XSLT offers an ability to easily create representations of data that make sense into the future whether an application is retired or not. - REST and SOAP based services offer the ability to access data and process it from virtually any of the processing technologies whether they are open source or proprietary. - SSL along with LDAP access rules ensure that data is only visible to authorised individuals or groups. - Standard encryption algorithms ensure that data may be fully encrypted on the wire or prior to being written to disk. - Web crawling engines such as that used by Google offer the ability to search through and catalogue all of the data in the archive and enable private Google like searches on data thus enabling easier location of data when it must be retrieved. - Linked Open Data offers the potential to connect and link data from multiple different and disparate sources together to form a full picture automatically. Using these new standards offers a flexibility to deal with this archived data in ways never before envisaged with proprietary implementations.

Reference Architecture to achieve this The first element in the implementation of such an architecture is the normalization of the data feeds from the various back office databases. Normalization in this sense will mean the creation of XML feeds that are directly consumable by the archiving engine regardless of the host data code pages (e.g. ASCII or EBCDIC), platform (e.g. Windows, z/os or Solaris) or database (e.g. MS SQL Server, DB2, VSAM or Oracle). Creating an XML feed also allows the creation of a stand-alone style sheet (XSLT) to represent the data in a meaningful way. This will ensure that the archiving engine can use a standards based approach to define data archiving policies on the data by using these XML feeds regardless of the original source of the data.

Once policies have been defined, the implementation engine can then archive the data to commodity hardware and software platforms either on premise or in a private or public cloud depending on data governance requirements. This represents a cost effective mechanism for storing this read only data. The active style sheet for the data will be archived with the data and is thus always compatible with the data with which it has been archived. Once archived, again due to the standards that are in place, all of the data in the archive will be available as REST accessible XML objects enabling an internal Google or other crawler to build up an index on the data thus enabling Google like searches to be done on the data in the archive enabling data to be found easily. Once found, the data can be displayed using the style sheet archived with the data.

Ostia s implementation of this architecture While this model could potentially be built by hand, Ostia s Portus platform provides a simple, configuration based approach to enabling these feeds and in conjunction with Ostia s policy definition and archiving components offers a simple, configuration based (i.e. no coding required) end to end implementation of this architecture. This solution offers the following benefits: - Configuration based solution for archiving your transactional data. - Common solution regardless of platform or database technology. - All data archived to a single common target system in a standard, common format. - Representation of the data is archived with the data thus the data is viewable no matter what version of the data structure or if the original application that interpreted the data has been retired. - Archived data stored in cost effective commodity hardware and software components. - Standards based approach means that any tool can be used to access the archive to locate data or to create visualizations around the data. How can Ostia help? Ostia have many years of experience with core IT systems and understand their strengths and weaknesses. We have also had much experience with traditional integration stacks and the models that have built up around them but now offer an ability to provide adaptive integration solutions in the timeframe required for agile projects. Ostia are able to do this because of their Portus data integration technology they have developed over the last 10 years. It has taken 10 years for Ostia to get to the front of today s technology. We can help you stay there.