Open Source Long Term Preservation Archives. Richard Matthews Sun Microsystems, Inc.



Similar documents
Web 2.0 and Preservation Archiving Services Infrastructure. Keith Rajecki Solutions Architect Sun Microsystems, Inc.

How To Build A Cloud Storage System

2. TEMPLATE TITLE SLIDE WITH PHOTO

Reference Architectures for Repositories and Preservation Archiving

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

<Insert Picture Here> Solution Direction for Long-Term Archive

EMC arhiviranje. Lilijana Pelko Primož Golob. Sarajevo, Copyright 2008 EMC Corporation. All rights reserved.

Overview of Storage and Data Management Industry Trends in Long Term Information Retention and Preservation

ETERNUS CS High End Unified Data Protection

Discovering Sun in the Digital Libraries and Repositories Market. October 7, Art Pasquinelli Education Market Strategist Sun Microsystems, Inc.

巨 量 資 料 分 層 儲 存 解 決 方 案

Object storage in Cloud Computing and Embedded Processing

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

EaseTag Cloud Storage Solution

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models

Sun Open Archive Framework and Fedora Repository Solutions

Growth of Unstructured Data & Object Storage. Marcel Laforce Sr. Director, Object Storage

Geospatial Imaging Cloud Storage Capturing the World at Scale with WOS TM. ddn.com. DDN Whitepaper DataDirect Networks. All Rights Reserved.

Sun Constellation System: The Open Petascale Computing Architecture

THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC

Technical Update 2008

Building Storage Clouds for Online Applications A Case for Optimized Object Storage

Moving Data and Distributing Data

Oracle Database 12c Plug In. Switch On. Get SMART.

ZFS Backup Platform. ZFS Backup Platform. Senior Systems Analyst TalkTalk Group. Robert Milkowski.

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Oracle Maximum Availability Architecture with Exadata Database Machine. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

Data Center Op+miza+on

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

High-Level Storage and Data Management Trends & Observations

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief

IBM Infrastructure for Long Term Digital Archiving

Project OpenSolaris TM Dynamic Service Containers featuring Nimsoft SLM

OPEN STORAGE REVOLUTIONIZING HOW YOU STORE INFORMATION

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

Archiving On-Premise and in the Cloud. March 2015

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

Trends in Cloud Computing and Data Intensive Networks. PASIG Malta - 25 June 2009

SMART SCALE YOUR STORAGE - Object "Forever Live" Storage - Roberto Castelli EVP Sales & Marketing BCLOUD

Information Migration

Introduction to NetApp Infinite Volume

How To Retire A Legacy System From Healthcare With A Flatirons Eas Application Retirement Solution

How To Understand The Strategic Importance Of Archive Solutions

<Insert Picture Here> Oracle In-Memory Database Cache Overview

Will Object Stores Dominate Storage of Unstructured Data?

In ediscovery and Litigation Support Repositories MPeterson, June 2009

CASE STUDY: DIGITAL PRESERVATION AT THE NATIONAL LIBRARY OF NEW ZEALAND

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication

The Future of Data Management

Big data management with IBM General Parallel File System

CROSS PLATFORM AUTOMATIC FILE REPLICATION AND SERVER TO SERVER FILE SYNCHRONIZATION

How to Manage Critical Data Stored in Microsoft Exchange Server By Hitachi Data Systems

Cloud Computing Where ISR Data Will Go for Exploitation

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

<Insert Picture Here> Oracle VM and Cloud Computing

SharePoint is Not an ECM System. Jason Lamon

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Turnkey Deduplication Solution for the Enterprise

ENVIRONMENTAL PRESSURES DRIVING AN EVOLUTION IN FILE STORAGE

Accelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Exadata Database Machine

In Memory Accelerator for MongoDB

Building Storage-as-a-Service Businesses

Zadara Storage Cloud A

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

James Serra Sr BI Architect

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

ILM: Tiered Services & The Need For Classification

Hadoop. Sunday, November 25, 12

ARCHITECTING COST-EFFECTIVE, SCALABLE ORACLE DATA WAREHOUSES

Solution Guide Parallels Virtualization for Linux

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina

Ex Libris Rosetta: A Digital Preservation System Product Description

Utilizing the SDSC Cloud Storage Service

#MMTM15 #INFOARCHIVE #EMCWORLD 1

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Data Integrity Means and Practices

Cloudy Middleware MARK LITTLE TOBIAS KUNZE

VERITAS Business Solutions. for DB2

Complete Unstructured Information Management Validated Solutions for Scalable and Secure Content Management

Archiving Information Storage and Its Advantages

Long term retention and archiving the challenges and the solution

Scale out NAS on the outside, Object storage on the inside

Solution Brief: Creating Avid Project Archives

Automated and Scalable Data Management System for Genome Sequencing Data

Enable Data Collaboration Through Powerful Storage Efficiencies

Data Storage At the Heart of any Information System. Ken Claffey, VP/GM - June 2015

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

OPTIMIZING PRIMARY STORAGE WHITE PAPER FILE ARCHIVING SOLUTIONS FROM QSTAR AND CLOUDIAN

The syslog-ng Store Box 3 F2

Transcription:

Open Source Long Term Preservation Archives Richard Matthews Sun Microsystems, Inc. 1 1

Presentation Prepared by: > Keith Rajecki > Industry Solutions Architect > Global Education & Research Presented by: > Rick Matthews > Sr. Staff Engineer > Solaris Software, Archive Products Group 2

Agenda Sun, Open Source, and Communities Preservation Archiving Trends Sun Archiving Storage Solutions References 3

Sun Microsystems Today Fortune 211 Company Annual Revenues $13+ Billion Java Devices 6 Billion Java Developers 5 Million Annual R&D ~$2 Billion Annual Storage Petabytes Shipped 410 Worldwide Employees 35,000 SPARC Embedded Processors 44+ Million Sun Confidential: Internal Only U.S. Patents 5,000+ Solaris 10 Licenses 7 Million Cash $4.8 Billion Business Presence 100 Countries 4

Sun's Open Source Strategy Developer Preference More core developers More deploying developers More partners User Preference Free to use More platform choice More suppliers Larger user community 5 Value Proposition Business Deployment Sun's target market Binary distribution Pay for value

Sun and Open Source Software OSS has been part of Sun's DNA for awhile Every software asset we produce is open source. If it isn't today, it will be pretty damn quickly. Jonathan Schwartz CEO, Sun Microsystems January, 2007 Sun's commitment to OSS Communities > This includes Open Repository Communities > More than just a Storage perspective > Dedicated organization to support OSS Communities 6

Sun's Open Stack Flexible and Heterogeneous with Zero Barrier to Exit Database Platform Application Infrastructure Virtualization Java Enterprise System Composite Application Platform Sun xvm Operating System Partners Architecture 7

OpenSPARC The Most Open Platform on the Planet 4300 downloads to date 14 million lines of source code Community interest: 1 to 1000 core systems First derivative design: SimplyRISC S1 core OpenSolaris + OpenSPARC = Only Truly Open Platform www.opensparc.net Sun Confidential: Internal Only 8

OpenSolaris 90,000 Members 64 Community Projects, BrandZ, DTrace, Solaris ZFS, Zones 53 User Groups Worldwide 260 Code Contributions 2006 Codie Best Open Source Solution 2005 Open Source World Editor's Choice 2005 InfoWorld Innovators Award 2005 MIT Young Innovator Innovation Happens Everywhere www.opensolaris.org 9 Bryan Cantrill

Sun is Committed to Developer Communities Building Open and Free Java Solaris SPARC Building a Vibrant Ecosystem: Sun is the Largest Commercial Contributor to Open Source Communities Community Infrastructure Communities java.net The Source for Java Technology Collaboration 10 Ecosystem

Sun Preservation Archiving Community PASIG Meeting May 27-30 San Francisco www.sun-pasig.org Comparison of high-level OAIS architectures, services-oriented architecture, and use cases Sharing of best practices and software code Cooperation on standard, open, in-a-box solutions around repository technologies Review of preservation and archiving storage architectures and eresearch data set management Discussion of the uses of commercial third-party and community-developed solutions 11

Trends: Growth and Preservation Humans created 161 exabytes of data in 2006, approximately 3 million times the information in all the books ever written, according to IDC. Source: "An Inconvenient IT Truth," By Michael Vizard, eweek, 06/22/07 80% of all movies made before 1940 are gone. Source: CLIR Report The Future of the Past, 1999 12

Repository Archiving Projects Compliance Book and Image Digitization and Sharing National Heritage Content Newspapers Replicated, Tiered Repositories for Archived Materials Research Data, Applications, and Systems Science, Technology, and Medical Journals Born Digital Materials 13

What's The Buzz? Problem: > Exponential growth of digital content, now and in the future > Powerful, flexible infrastructure required to archive: Store unstructured, fixed content Search that content Preserve that content for the long-term One Proposed Solution: > Fedora plus Sun StorageTek 5800 Honeycomb 14

SAM-QFS Infinite Archive System SAM-QFS: World s best policy based multi-tiered archive manager > > > > Application Transparent dynamic data movement Four tiers, local and remote Continuous Archive = CDP WORM & Retention management Infinite Archive System > > > > Scalable multi-tiered SAM-QFS platform base 10-256TB systems. Data-In-Place upgrade 15

Sun StorageTek 5800 Honeycomb Content-aware Open Storage What It Is > 'Smart', network-attached, clustered, racked storage system What It Provides > 64TB (raw) per rack: data objects + metadata > WORM Objects > Metadata awareness built into the design > Reliability, persistency, currency assurances Open System, Open Source Software 16

Sun StorageTek 5800 Honeycomb Content-aware Open Storage RAIN architecture > Symmetric cluster CPU, memory, SATA Disks Each node > > > > > Opteron-based SunFire server Solaris 10 3 GB RAM Dual Gig-E 4 x 500GB SATA L2 load-spreading switches Service processor 17

Sun StorageTek 5800 Honeycomb Content-aware Open Storage 'Half Cell' 8 servers 16 TB (raw) 'Full Cell' 16 servers 32 TB (raw) 'Rack' (2 Cells) 16 servers 32 TB (raw) 'Hot Scaling' 18 'Hive' (N Cells) [Future]

Why Honeycomb? Architecture optimized to store and retrieve unstructured fixed content Object storage, metadata aware Extreme data protection via RAID6, data selfhealing, bit-rot detection > Mean Time To Data Loss > 2M years A commitment to standards > Dublin Core metadata > Web DAV > Future: XAM (metadata + query model) 19

Why Honeycomb? Open Source strategy fits with majority of repository/archive software efforts Standard Java and C APIs in SDK Horizontal Scaling as storage needs grow Dublin Core is only the beginning Platform-agnostic [Near Future] On-board local data services available ( Storage Beans ) 20

Why Fedora + Honeycomb? Answer: Archival Storage In order to seriously address current and future issues of scalability, persistence and flexibility, this will likely not meet the requirements, while this clearly will... Custom Monolithic Digital Library Application Digital Library Fedora Archive Server Sun STK5800 Systems 21

Why Fedora + Honeycomb? Application Clients epublishing Fedora Archive Server Sun STK5800 Systems eresearch HPC Digital Library Content-Rich Applications Designed with the proper intelligence in the proper places Metadata integral to storage World-class reliability, scalability and persistency1 End-to-end Open Source Software solution Automated wide-area backup option 1 22 Yes, this really is a word

Storage Beans Discrete Services inside Honeycomb What will they do? That's up to you! Asynchronous (Background) > Transformations > Periodic Data Scrubbing > Duplicate Consolidation Synchronous (Real-time) > Audit logs > Watermarking > Encryption 23

Sun/Fedora Efforts Fedora runs now on Solaris/Open Solaris > Server + Storage reference configurations > Inclusion of Fedora 3 in the Open Solaris 'Indiana' Repository > Fedora on Solaris How does it perform? How does it scale? What are the advantages to running on Solaris? Best Practices on Sun 24

Proof Point: Fedora/Sun/JHU Fedora Web Service APIs (SOAP and REST) User Applications Preservation Archive E-Research Publishing Abstract Data Mgmt. Layer Fedora Commons Framework Pluggable Core Modules Manage API Access API Registry Search RDF Search Ingest Manage Access Validate Policy CMABind Registry RDF Index Store Content Storage File system (Objects) RDBMS (Registry) Triplestore (Index) A Fedora/Sun Solution for Creation, Management, and Exchange of Durable, Digital e-research Content at Johns Hopkins University Fast Disk Honeycomb Tape 25

Honeycomb Virtual Views Example Photo Application Photo demo application on top of StorageTek 5800 or ST5800 Emulator Leverages metadata to organize and present the content in logical views Photo app extracts, stores, and displays embedded EXIF jpeg metadata 26

Digital Repository References New York U. Digital Media Management Stanford U. OAIS Digital Repository Johns Hopkins U. eresearch (Fedora) Purdue U. eresearch (Fedora) Oxford University Google Project (Fedora/VITAL) National Library of New Zealand Digital Preservation (Ex Libris) California Digital Library Large Scale Digitization Swedish Archive for Sound/Recording Digital Media Management Southampton U. EPrints Repository San Diego Supercomputer Large Dataset Storage U. Michigan D-Space Repository The Alberta Library OAIS Digital Repository 27

For More Information Storage Archive Manager http://www.sun.com/storagetek/management_software/data_management/sam/index.xml/ Honeycomb http://www.sun.com/storagetek/disk_systems/enterprise/5800/index.xml Honeycomb Architecture Document: http://www.sun.com/storagetek/disk_systems/enterprise/5800/5800-arch-final-lr.pdf Sun Preservation and Archiving Community http://www.sun-pasig.org Open Source Honeycomb Software http://www.opensolaris.org/os/project/honeycomb/ 28

Thank You Q&A 29