irods for Big Data Management in Research Driven Organizations Charles Schmitt CTO & Director of Informatics RENCI

Size: px
Start display at page:

Download "irods for Big Data Management in Research Driven Organizations Charles Schmitt CTO & Director of Informatics RENCI"

Transcription

1 irods for Big Data Management in Research Driven Organizations Charles Schmitt CTO & Director of Informatics RENCI

2 Acknowledgements Presented work funded in part by grants from NIH, NSF, NARA, DHS, as well as funding from UNC Teams involved include: DICE team at UNC and UCSD Networking team at RENCI and Duke Data sciences team at RENCI UNC Dept of Genetics, Research Computing, Lineberger Comprehensive Cancer Center, NC Tracs Institute, Center for Bioinformatics, Institute for Pharmacogenetics and Personalized Treatment UNC HealthCare Multiple members of the irods community 2

3 RENCI Researches, Develops, and Deploys Cyberinfrastructure Tools Networks Joint Venture between UNC, Duke, NCSU, and State of North Carolina Virtual Organizations evaluate Visualization Projects Collaborators Data Science of Cyberinfrastructure improve High Performance Computing Funding Scholarship Innovation Engagement Software Analytics 3

4 RENCI Key Initiatives E1: Storm Surge Modeling E2: NSF SSI (HydroShare) E3: PIRE Environmental/Coastal Sciences Biomedical and Health Sciences H1: CTSA H2: Sequencing H3: Secure Med. Workspace HPC H4: Decision Virtual Organizations Visualization Support C1: S2I2 C2: REACH NC Data Science of Cyberinfrastructure Networks C5: CIBER (NARA) C6: ORCA/BEN (NSF GENI) C3: E-iRODS C4. DataNet Software Analytics Tools E-iRODS GeoViz SRW 4

5 Use Case: Informatics for Next-Generation Genomics ~ Whole and Exomic Sequences generated ~10,000 Sequences stored RENCI Next Gen Sequencing Sequencing Informatics Computational Workflows, High Performance Computing, Distributed Data, Security Informatics & Cyberinfrastructure R&D Clinical Practice Identifying genomic variants relevant to clinical care Exploring ethical/legal issues around reporting genomic findings Clinical Research Determining relationships between genomic variants and disease Basic Research Finding new ways to understand the relationship between genes and disease/behavior In collaboration with UNC Research Computing, UNC Dept of Medical Genetics, Lineberger Comprehensive Cancer Center, Institute for Pharmacogenetics and Personalized Treatment, UNC High Throughput Sequencing Core, UNC Center for Bioinformatics 5

6 Managing Research Data: Genomics Sequencers Tape Archive Initial Pipeline, QC Alignment Pipeline, QC Data/Information Flow - managed by: 1) Multiple Custom Workflow Management Systems Archives (NIH, Library?), Replication Variant Detection Analysis: Phasing, Imputation, IBD, Phenotype Correlation R&D New Methods Clinical Decision Support & Presentation 2) Multiple Custom Laboratory Information Management Systems (LIMS) 3) E-iRODS Variant Database, Hadoop Clinical Validation Clinical Review Clinical Binning External Data Feeds (RefSeq, OMIM, PolyPhen, ) 6

7 The research data ecosystem: challenges UNC STORAGE (Tape, Drives) RENCI STORAGE (Tape, Drives) Genomics Storage Lab Machines Open Science Grid Teragrid External Partner Resources UNC HPC RENCI HPC RENCI Hadoop Genomics HPC Genomics Hadoop IT Machines Clouds Data management challenges: Analysts Wild West Automated Processes Controlled Developers Tracking data and metadata Data movement and migration Enforcing policies, compliance, security Encouraging managed automation Cost, disk and IT time Failures Data Providers External Partners IT Staff While not disrupting access to data Students Compliance While goals, processes, users, and software change

8 Big data and new stressors More data munging, more tools and processes involved, more hardware, more people (esp. IT and CS people) More security and compliance concerns More QC and QA concerns Data too big for review + more people/process=more mistakes More infrastructure breakages: storage systems, software tools Time slows down and mistakes are more costly Moving data is a planned IT event, analysis take days to months 8

9 What s needed? A multitude of technologies that play well together LIMS, analysis workflow engines, HPC queues, RDBMS, archival and library systems, web reporting/submission sites, Middleware that: Ties together the technologies Automates data-related chores Virtualizes the IT data infrastructure Securely manages the data at scale Works within a dynamically changing research environment Presentation title goes here 9

10 Integrated Rules Oriented Data System (irods) Proven in production use: NASA, NOAA, National Archives, Max Planck Society, Broad Institute, Wellcome Trust Sanger Institute, Lineberger Comprehensive Cancer Center, Bejing Genome Institute, Dow Chemical, Merck, International Neuroinformatics Coordinating Facilities, Proven at scale: iplant - 10k users; French National Institute for Nuclear Physics and Plasma Physics - 6 PB; Australian Research Collaboration Service storage resources; NASA Center for Climate Simulations million attributes; Cinegrid sites across Japan-US-Europe Solid foundation: SRB: initial product (developed by DICE Group, owned by General Atomics) in 1997 irods: rewrite of SRB by DICE Group in 2006; currently on version 3.3 Enterprise irods: mission critical distribution co-developed by RENCI and DICE in 2012 Support: Community of developers from groups worldwide Independent groups offering consulting and support and development irods Consortium offering formal support, training, involvement, and development help 10

11 irods- high level view Research Community Research Group A Research Group N - Unified logical interface to data and metadata resources (single namespace) - -based management of access - -driven management of data (replication, deletion, ) Institution A repository Archivals Institution B repository PI data sets Community data collections Data Services

12 irods Key Features Unified and consistent name space for digital objects Centralized metadata system Tagging, queries, used for process and security controls Manages digital objects stored in a variety of systems NFS, HDFS, S3, DDN WOS, HPSS, Instantiated via web service call, REST call, SQL query, Hadoop job, Multiple clients and APIs enforcing distributed rule engine 12

13 Principals of driven data management Relational model from late 60s/early 70s Foundation for SQL and RDBMS systems model Foundation for policy based data management systems Presentation title goes here 13

14 -based Data Management Purpose Defines Collection Defines Property Defines Controls Procedure SubType Updates Persistent State Information Ex: - QC check run - File integrity validated Periodic Assessment Criteria Source: Reagan Moore

15 -based Data Management - Collection Purpose Defines Collection Defines Digital Object Attribute Updates Property Defines Controls Procedure SubType Updates Persistent State Information Periodic Assessment Criteria Source: Reagan Moore

16 -based Data Management Collection Properties Purpose Defines Collection Defines Digital Object Attribute Integrity Updates Authenticity Access control Completeness Feature Feature Property Defines Controls Procedure Feature SubType Periodic Assessment Criteria Updates Persistent State Information Correctness Feature Consensus Consistency Source: Reagan Moore

17 -based Data Management Collection Policies Purpose Defines Collection Integrity Defines Replication Checksum Quota Data Type Digital Object Updates Attribute Authenticity Access control Completeness Feature Feature Property Defines Controls Procedure Feature SubType Periodic Assessment Criteria Updates Persistent State Information Correctness Feature Consensus Consistency Source: Reagan Moore

18 -based Data Management Collection Procedures Purpose Defines Collection Integrity Defines Replication Checksum Quota Data Type Digital Object Updates Attribute Authenticity Access control Completeness Feature Feature Property Defines Controls Procedure Feature SubType Periodic Assessment Criteria Workflow Chains Updates Persistent State Information GetUserACL SetDataType Correctness Feature Function SetQuota Consensus DataObjRepl Source: Reagan Moore Consistency Operation SysChksumDataObj

19 -based Data Management Persistent State Purpose Defines Collection DATA_ID DATA_REPL_NUM DATA_CHECKSUM Integrity Defines Replication Checksum Quota Data Type Digital Object Updates Attribute Authenticity Access control Completeness Feature Feature Property Defines Controls Procedure Feature SubType Periodic Assessment Criteria Workflow Chains Updates Persistent State Information GetUserACL SetDataType Correctness Feature Function SetQuota Consensus DataObjRepl Source: Reagan Moore Consistency Operation SysChksumDataObj

20 -based Data Management Enforcement Purpose Defines Collection DATA_ID DATA_REPL_NUM DATA_CHECKSUM Integrity Defines Replication Checksum Quot a Data Type Digital Object Updates Attribute Authenticity Access control Completeness Feature Correctness Feature Property Defines Controls Procedure Feature Feature Enforcement Point SubType Periodic Assessment Criteria Workflow Chains Function Updates Persistent State Information GetUserACL SetDataType SetQuota Consensus Invokes DataObjRepl Source: Reagan Moore Consistency Client Action Operation SysChksumDataObj

21 -based Data Management Implementation in irods Purpose (5 main types) Defines Collection DATA_ID DATA_REPL_NUM DATA_CHECKSUM SubType Archive Data grid Collection Digital Library Processing Pipeline Integrity Authenticity Access control Completeness Source: Reagan Moore Feature Correctness Defines Feature Consensus Property Defines Controls Procedure (11 default) Feature Consistency Replication Checksum Quota Data Type Feature Enforcement Points (70) Invokes Clients (50) SubType Periodic Assessment Criteria Digital Object Updates Workflow Chains Micro-service (317) Operation Updates Attribute Persistent State Information (338) msigetuseracl msisetdatatype msisetquota msidataobjrepl msisyschksumdataobj

22 Recap: -Based Data Management Purpose - reason a collection is assembled Properties - attributes needed to ensure the purpose Policies - enforce and maintain collection properties Procedures - functions that implement the policies Persistent state information - results of applying procedures Property assessment criteria validation that state information conforms to the desired purpose Federation - controlled sharing of logical name spaces These are the necessary elements for collection management 22

23

24 Default Policies in irods Data Grid 1. Setup a collection and trash directory for each account 2. Setup membership in public account 3. Manage deletion of account 4. Manage renaming of the data grid 5. Manage path permission checking 6. Manage resource quota 7. Manage use of parallel I/O streams for large files 8. Manage selection of default storage location 9. Manage selection of storage location for replication 10. Manage selection of number of processes to use when multitasking 11. Manage selection of physical path name

25 irods Rules: defining the policies Server-side workflows Action condition workflow chain recovery chain Condition - test on any attribute: Collection, file name, storage system, file type, user group, elapsed time, IRB approval flag, descriptive metadata Workflow chain: Micro-services / rules that are executed at the storage system Recovery chain: Micro-services / rules that are used to recover from errors 25

26 irods Micro-Services Function snippets that wrap a well-defined process Compute checksum Replicate file Integrity check Zoom image Get tiff image cutout Search PubMed Written in C or Python Recovery micro-services to handle failure Web services, external applications, can be wrapped as micro-services Can be chained to perform complex tasks Micro-services invoked by rule engine 26

27 irods Micro-Services Over 300 published microservices Pluggable: write, publish, re-use 27

28 Example: unified view of data idrop web client Spread across: 1) Disk-storage at UNC, 2) Disk-storage at RENCI, 3) Tape-storage at RENCI 28

29 Example: unified view of data 29

30 Example: data replication policy UNC Data Center RENCI Data Center Isilon E-iRODS icat Server E-iRODS Server DDN9900 StorNext Appliance Two working copies kept For data recovery and to allow analysis at both sites Tape Library Copy me and Data copied metadata control copy process Only on certain files (fastq, finished bam files) irods rule run nightly does the copy Performs copy, verifies copy successful, resets copy me attribute Versioning to allow for re-runs of patient samples 30

31 Example: data access policy Challenge Millions of files across different projects, growing daily Hundreds of users across different labs, changing frequently How to control access UNIX ACLs became too unwieldy Moving data means reproducing permission and group settings : access given if user and data belong to the same groups Tag data with group metadata (e.g., Lab X lung tumor study) Access rule: user s group must match data group E.g. (user y member of Lab X lung tumor study) Advantage: Data group tag generated as part of workflows, automatically Data can be moved without breaking permission model User-Data linkage not based on directory and file names Thanks to Sai Balu at LCCC 31

32 The Data Life Cycle - Collections Each data life cycle stage increases the value and usability of the original collection Project Collection Data Grid Data Processing Pipeline Digital Library Reference Collection Federation Private Shared Analyzed Published Preserved Sustained Local Distribution Service Description Representation Re-purposing Jeff gets data from a sensor Jeff shares data with colleagues Together w/ colleagues, analyzes data and produces results Results peerreviewed and published Jeff et. al. hit jackpot: collection now accepted as ref collection for decades Hydrology Datagrid grows in value to ecology and biology and federated

33 Lifecycles in an R&D data-driven ecosystem UNC STORAGE (Tape, Drives) UNC HPC RENCI STORAGE (Tape, Drives) RENCI HPC RENCI Hadoop Genomics Storage Genomics HPC Genomics Hadoop Lab Machines IT Machines Control over: Data movement and replication Metadata standards Archival, deletion, and retention Wild West As processes mature Policies as much control as needed irods Integration with workflows, hadoop, databases Hiding complexities Automation, all policy driven Analysts Data Providers Automated Processes External Partners Developers IT Staff, while transitioning adhoc practices to production processes

34 irods Clients APIs: Java, C, C++, Fortran, PHP, Python General Interfaces icommands UNIX and Windows command line interface idrop GUI interface idropweb web version of idrop interface Windows browser Web-DAV FUSE Parrot Domain specific clients: Grid tools (GridFTP, SAGA) Portals (EngineFrame) Web services (VOSpace, irods-rest) Workflows (Kepler, Taverna, NCSA Cyberintegrator) Digital libraries (Dspace, Fedora)

35 Storage Resources UNIX file system irods POSIX Driver Local Cache Universal Mass Storage System HPSS Tivoli Storage Manager Windows file system DBO SQL RDBMS HPSS Microservice Objects SRB DDN WOS Z39.50 HTTP FTP Amazon S3 Thredds HDFS/Hadoop

36 Pluggable Storage Resources irods Smart Pluggable Resource Resource 1 (e.g. high performance drive) Resource 2 (e.g. nfs drive) Resource 3 (e.g. archive) Resource 3a (cheap array of disks) Resource 3b (tape) Tree-based approach allows for extending horizontally and vertically Greater range of customized solutions: hierarchical storage management, load balancing, high availability, tailored interfaces with high performance storage environments,

37 -Managed Pluggable Resources E-iRODS Resource Wrapper Pluggable Resource Local drive Remote drive PEPs irods Rules Engine Resource-specific rules User develops pluggable resource Code inspection allows for autogenerated policy enforcement points (PEPs) Grid admin can then develop standard irods policy-enforcing rules specific to the resource Use Case Example: Pluggable resource by default replicates to ensure high availability irods rule informs resource to turn off high availability on ingested files tagged with Protected Health Information metadata

38 Lifecycles in an R&D ecosystem UNC STORAGE (Tape, Drives) RENCI STORAGE (Tape, Drives) Genomics Storage Lab Machines UNC HPC RENCI HPC RENCI Hadoop Genomics HPC Genomics Hadoop IT Machines Wild West As processes mature NFS Hadoop DDN WOS RDBMS Programmatic APIs irods policy control Data Services Data Workflows Web services irods Clients pluggable Analysts Data Providers Automated Processes External Partners IT Staff

39 irods in clinical and translational research Presentation title goes here 39

40 Secure Medical Workspace Combines Virtualization, Endpoint Data Leakage Protection (DLP), standard security such as use of VPNs, network sniffing, antivirus, group policies, 40

41 Secure Access to Data on the Clinical Side Research Systems Clinician Researcher irods-enabled samtools 1) 4) 5) E-iRODS Portal Sequence Data 3) 2) Data Sets Secure Medical Workspace NCGenes EMR 1) Clinician request for sequence reads on patient X 2) Patient id lookup to obtain subject id 3) Subject id lookup in E-iRODS 4) Data sets packaged in zip file and retrieved 5) Data unzipped and displayed within secure workspace Clinical Studies Clinical Systems

42 Loosely coupled distributed SMWs DW Deduce irods client Research Workspace irods Data Server Research Workspace irods Data Server irods client I2b2 irods Data Catalogue DW Research Workspace SAS irods client irods Data Server Researchers can access data via local clinical information system (CIS) or as shared files. Sharing between sites is managed by a combination of CIS federation and data grid middleware (more flexible, less CIS lockin)

43 Questions? Presentation title goes here 43

Technology solutions for managing and computing on largescale biomedical data

Technology solutions for managing and computing on largescale biomedical data Technology solutions for managing and computing on largescale biomedical data Charles Schmitt CTO & Director of Informatics RENCI Brand Fortner Executive Director, irods Consortium Jason Coposky Chief

More information

Data Management using irods

Data Management using irods Data Management using irods Fundamentals of Data Management September 2014 Albert Heyrovsky Applications Developer, EPCC a.heyrovsky@epcc.ed.ac.uk 2 Course outline Why talk about irods? What is irods?

More information

RELATED WORK DATANET FEDERATION CONSORTIUM, HTTP://WWW.DATAFED.ORG IRODS, HTTP://IRODS.DICERESEARCH.ORG

RELATED WORK DATANET FEDERATION CONSORTIUM, HTTP://WWW.DATAFED.ORG IRODS, HTTP://IRODS.DICERESEARCH.ORG REAGAN W. MOORE DIRECTOR DATA INTENSIVE CYBER ENVIRONMENTS CENTER UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL RWMOORE@RENCI.ORG PRIMARY RESEARCH OR PRACTICE AREA(S): POLICY-BASED DATA MANAGEMENT PREVIOUS

More information

irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories

irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories Reagan W. Moore Arcot Rajasekar Mike Wan {moore,sekar,mwan}@diceresearch.org h;p://irods.diceresearch.org

More information

Managing Next Generation Sequencing Data with irods

Managing Next Generation Sequencing Data with irods Managing Next Generation Sequencing Data with irods Presented by Dan Bedard // danb@renci.org at the 9 th International Conference on Genomics Shenzhen, China September 12, 2014 Managing NGS Data with

More information

INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)

INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS) INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS) Todd BenDor Associate Professor Dept. of City and Regional Planning UNC-Chapel Hill bendor@unc.edu http://irods.org/ SESYNC Model Integration Workshop Important

More information

irods Technologies at UNC

irods Technologies at UNC irods Technologies at UNC E-iRODS: Enterprise irods at RENCI Presenter: Leesa Brieger leesa@renci.org SC12 irods Informational Reception 1! UNC Chapel Hill Investment in irods DICE and RENCI: research

More information

Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un marciano @un.

Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un marciano @un. Policy-driven Distributed Data Management (irods) Richard Marciano marciano@unc.edu Professor @ SILS / Chief Scientist for Persistent Archives and Digital Preservation @ RENCI Director of the Sustainable

More information

Technical. Overview. ~ a ~ irods version 4.x

Technical. Overview. ~ a ~ irods version 4.x Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number

More information

Automated and Scalable Data Management System for Genome Sequencing Data

Automated and Scalable Data Management System for Genome Sequencing Data Automated and Scalable Data Management System for Genome Sequencing Data Michael Mueller NIHR Imperial BRC Informatics Facility Faculty of Medicine Hammersmith Hospital Campus Continuously falling costs

More information

The National Consortium for Data Science (NCDS)

The National Consortium for Data Science (NCDS) The National Consortium for Data Science (NCDS) A Public-Private Partnership to Advance Data Science Ashok Krishnamurthy PhD Deputy Director, RENCI University of North Carolina, Chapel Hill What is NCDS?

More information

irods at CC-IN2P3: managing petabytes of data

irods at CC-IN2P3: managing petabytes of data Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules irods at CC-IN2P3: managing petabytes of data Jean-Yves Nief Pascal Calvat Yonny Cardenas Quentin Le Boulc h

More information

DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure. Arcot (RAJA) Rajasekar DICE/SDSC/UCSD

DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure. Arcot (RAJA) Rajasekar DICE/SDSC/UCSD DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure Arcot (RAJA) Rajasekar DICE/SDSC/UCSD What is SRB? First Generation Data Grid middleware developed at the San Diego Supercomputer Center

More information

Data management challenges in todays Healthcare and Life Sciences ecosystems

Data management challenges in todays Healthcare and Life Sciences ecosystems Data management challenges in todays Healthcare and Life Sciences ecosystems Jose L. Alvarez Principal Engineer, WW Director Life Sciences jose.alvarez@seagate.com Evolution of Data Sets in Healthcare

More information

Integrated Rule-based Data Management System for Genome Sequencing Data

Integrated Rule-based Data Management System for Genome Sequencing Data Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer

More information

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire akodgire@indiana.edu 25th

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire akodgire@indiana.edu 25th irods and Metadata survey Version 0.1 Date 25th March Purpose Survey of Status Complete Author Abhijeet Kodgire akodgire@indiana.edu Table of Contents 1 Abstract... 3 2 Categories and Subject Descriptors...

More information

Using Databases to Manage State Information for. Globally Distributed Data

Using Databases to Manage State Information for. Globally Distributed Data Storage Resource Broker Using Databases to Manage State Information for Globally Distributed Data Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc sdsc.edu/srb Abstract The

More information

Balancing Big Data for Security, Collaboration and Performance

Balancing Big Data for Security, Collaboration and Performance Balancing Big Data for Security, Collaboration and Performance Sai Balu Lineberger Cancer Center UNC Chapel Hill Oct 14, 2014 About UNC Oldest Public University -1793 Top 5 Public University. 46th World

More information

irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI!

irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI! irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI! Renaissance Computing Institute (RENCI) A research unit of UNC Chapel Hill Current

More information

Michał Jankowski Maciej Brzeźniak PSNC

Michał Jankowski Maciej Brzeźniak PSNC National Data Storage - architecture and mechanisms Michał Jankowski Maciej Brzeźniak PSNC Introduction Assumptions Architecture Main components Deployment Use case Agenda Data storage: The problem needs

More information

How To Understand The Nature Of Big Data

How To Understand The Nature Of Big Data Big Data is Coming for You W. Christopher Lenhardt RENCI DAARWG, Chair Outline A few words about RENCI Introduction: On the Nature of BIG Big Challenges Big Science Questions Big Data Other Big Trends

More information

irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods

irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods Renaissance Computing Institute (RENCI) A research unit of UNC Chapel Hill Directed by Stan Ahalt, formerly

More information

integrated Rule-Oriented Data System Reference

integrated Rule-Oriented Data System Reference i integrated Rule-Oriented Data System Reference Arcot Rajasekar 1 Michael Wan 2 Reagan Moore 1 Wayne Schroeder 2 Sheau-Yen Chen 2 Lucas Gilbert 2 Chien-Yi Hou Richard Marciano 1 Paul Tooby 2 Antoine de

More information

Object storage in Cloud Computing and Embedded Processing

Object storage in Cloud Computing and Embedded Processing Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer DDN We Accelerate Information Insight DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data

More information

OSG PUBLIC STORAGE. Tanya Levshina

OSG PUBLIC STORAGE. Tanya Levshina PUBLIC STORAGE Tanya Levshina Motivations for Public Storage 2 data to use sites more easily LHC VOs have solved this problem (FTS, Phedex, LFC) Smaller VOs are still struggling with large data in a distributed

More information

Concepts in Distributed Data Management or History of the DICE Group

Concepts in Distributed Data Management or History of the DICE Group Concepts in Distributed Data Management or History of the DICE Group Reagan W. Moore 1, Arcot Rajasekar 1, Michael Wan 3, Wayne Schroeder 2, Antoine de Torcy 1, Sheau- Yen Chen 2, Mike Conway 1, Hao Xu

More information

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007 Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the

More information

Integrating Data Life Cycle into Mission Life Cycle. Arcot Rajasekar rajasekar@unc.edu sekar@diceresearch.org

Integrating Data Life Cycle into Mission Life Cycle. Arcot Rajasekar rajasekar@unc.edu sekar@diceresearch.org Integrating Data Life Cycle into Mission Life Cycle Arcot Rajasekar rajasekar@unc.edu sekar@diceresearch.org 1 Technology of Interest Provide an end-to-end capability for Exa-scale data orchestration From

More information

How To Manage Research Data At Columbia

How To Manage Research Data At Columbia An experience/position paper for the Workshop on Research Data Management Implementations *, March 13-14, 2013, Arlington Rajendra Bose, Ph.D., Manager, CUIT Research Computing Services Amy Nurnberger,

More information

Accelerate > Converged Storage Infrastructure. DDN Case Study. ddn.com. 2013 DataDirect Networks. All Rights Reserved

Accelerate > Converged Storage Infrastructure. DDN Case Study. ddn.com. 2013 DataDirect Networks. All Rights Reserved DDN Case Study Accelerate > Converged Storage Infrastructure 2013 DataDirect Networks. All Rights Reserved The University of Florida s (ICBR) offers access to cutting-edge technologies designed to enable

More information

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri Large-scale Research Data Management and Analysis Using Globus Services Ravi Madduri Argonne National Lab University of Chicago @madduri Outline Who we are Challenges in Big Data Management and Analysis

More information

Migrating NASA Archives to Disk: Challenges and Opportunities. NASA Langley Research Center Chris Harris June 2, 2015

Migrating NASA Archives to Disk: Challenges and Opportunities. NASA Langley Research Center Chris Harris June 2, 2015 Migrating NASA Archives to Disk: Challenges and Opportunities NASA Langley Research Center Chris Harris June 2, 2015 MSST 2015 Topics ASDC Who we are? What we do? Evolution of storage technologies Why

More information

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief DDN Solution Brief Personal Storage for the Enterprise WOS Cloud Secure, Shared Drop-in File Access for Enterprise Users, Anytime and Anywhere 2011 DataDirect Networks. All Rights Reserved DDN WOS Cloud

More information

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data David Minor 1, Reagan Moore 2, Bing Zhu, Charles Cowart 4 1. (88)4-104 minor@sdsc.edu San Diego Supercomputer Center

More information

Technologies for Genomic Medicine: MaPSeq, A Computational and Analytical Workflow Manager for Downstream Genomic Sequencing

Technologies for Genomic Medicine: MaPSeq, A Computational and Analytical Workflow Manager for Downstream Genomic Sequencing Technologies for Genomic Medicine: MaPSeq, A Computational and Analytical Workflow Manager for Downstream Genomic Sequencing The Team: Jason Reilly, RENCI Senior Research Software Developer; Stanley Ahalt,

More information

THE CCLRC DATA PORTAL

THE CCLRC DATA PORTAL THE CCLRC DATA PORTAL Glen Drinkwater, Shoaib Sufi CCLRC Daresbury Laboratory, Daresbury, Warrington, Cheshire, WA4 4AD, UK. E-mail: g.j.drinkwater@dl.ac.uk, s.a.sufi@dl.ac.uk Abstract: The project aims

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

Intro to Data Management. Chris Jordan Data Management and Collections Group Texas Advanced Computing Center

Intro to Data Management. Chris Jordan Data Management and Collections Group Texas Advanced Computing Center Intro to Data Management Chris Jordan Data Management and Collections Group Texas Advanced Computing Center Why Data Management? Digital research, above all, creates files Lots of files Without a plan,

More information

Distributed File Systems An Overview. Nürnberg, 30.04.2014 Dr. Christian Boehme, GWDG

Distributed File Systems An Overview. Nürnberg, 30.04.2014 Dr. Christian Boehme, GWDG Distributed File Systems An Overview Nürnberg, 30.04.2014 Dr. Christian Boehme, GWDG Introduction A distributed file system allows shared, file based access without sharing disks History starts in 1960s

More information

Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora

Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora David Pcolar Carolina Digital Repository (CDR) david_pcolar@unc.edu Alexandra Chassanoff School of Information &

More information

Key Considerations for Managing Big Data in the Life Science Industry

Key Considerations for Managing Big Data in the Life Science Industry Key Considerations for Managing Big Data in the Life Science Industry The Big Data Bottleneck In Life Science Faster, cheaper technology outpacing Moore s law Lower costs and increasing speeds leading

More information

Practical Solutions for Big Data Analytics

Practical Solutions for Big Data Analytics Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)

More information

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013 ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and

More information

Scalable Services for Digital Preservation

Scalable Services for Digital Preservation Scalable Services for Digital Preservation A Perspective on Cloud Computing Rainer Schmidt, Christian Sadilek, and Ross King Digital Preservation (DP) Providing long-term access to growing collections

More information

Diagram 1: Islands of storage across a digital broadcast workflow

Diagram 1: Islands of storage across a digital broadcast workflow XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,

More information

Scheduling in SAS 9.4 Second Edition

Scheduling in SAS 9.4 Second Edition Scheduling in SAS 9.4 Second Edition SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. Scheduling in SAS 9.4, Second Edition. Cary, NC: SAS Institute

More information

Personalized Medicine and IT

Personalized Medicine and IT Personalized Medicine and IT Data-driven Medicine in the Age of Genomics www.intel.com/healthcare/bigdata Ketan Paranjape General Manager, Life Sciences Intel Corp. @Portlandketan 1 The Central Dogma of

More information

EMC IRODS RESOURCE DRIVERS

EMC IRODS RESOURCE DRIVERS EMC IRODS RESOURCE DRIVERS PATRICK COMBES: PRINCIPAL SOLUTION ARCHITECT, LIFE SCIENCES 1 QUICK AGENDA Intro to Isilon (~2 hours) Isilon resource driver Intro to ECS (~1.5 hours) ECS Resource driver Possibilities

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

globus online Cloud-based services for (reproducible) science Ian Foster Computation Institute University of Chicago and Argonne National Laboratory

globus online Cloud-based services for (reproducible) science Ian Foster Computation Institute University of Chicago and Argonne National Laboratory globus online Cloud-based services for (reproducible) science Ian Foster Computation Institute University of Chicago and Argonne National Laboratory Computation Institute (CI) Apply to challenging problems

More information

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation

More information

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Sean Lee Solution Architect, SDI, IBM Systems SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Agenda Converging Technology Forces New Generation Applications Data Management Challenges

More information

September 2009 Cloud Storage for Cloud Computing

September 2009 Cloud Storage for Cloud Computing September 2009 Cloud Storage for Cloud Computing This paper is a joint production of the Storage Networking Industry Association and the Open Grid Forum. Copyright 2009 Open Grid Forum, Copyright 2009

More information

Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud

Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud Use case Figure 1: Company C Architecture (Before Migration) Company C is an automobile insurance claim processing company with

More information

WOS for Research. ddn.com. DDN Whitepaper. Utilizing irods to manage collaborative research. 2012 DataDirect Networks. All Rights Reserved.

WOS for Research. ddn.com. DDN Whitepaper. Utilizing irods to manage collaborative research. 2012 DataDirect Networks. All Rights Reserved. DDN Whitepaper WOS for Research Utilizing irods to manage collaborative research. 2012 DataDirect Networks. All Rights Reserved. irods and the DDN Web Object Scalar (WOS) Integration irods, an open source

More information

Data Grid Landscape And Searching

Data Grid Landscape And Searching Or What is SRB Matrix? Data Grid Automation Arun Jagatheesan et al., University of California, San Diego VLDB Workshop on Data Management in Grids Trondheim, Norway, 2-3 September 2005 SDSC Storage Resource

More information

2011 FileTek, Inc. All rights reserved. 1 QUESTION

2011 FileTek, Inc. All rights reserved. 1 QUESTION 2011 FileTek, Inc. All rights reserved. 1 QUESTION 2011 FileTek, Inc. All rights reserved. 2 HSM - ILM - >>> 2011 FileTek, Inc. All rights reserved. 3 W.O.R.S.E. HOW MANY YEARS 2011 FileTek, Inc. All rights

More information

Data grid storage for digital libraries and archives using irods

Data grid storage for digital libraries and archives using irods Data grid storage for digital libraries and archives using irods Mark Hedges, Centre for e-research, King s College London eresearch Australasia, Melbourne, 30 th Sept. 2008 Background: Project History

More information

CommVault Simpana Archive 8.0 Integration Guide

CommVault Simpana Archive 8.0 Integration Guide CommVault Simpana Archive 8.0 Integration Guide Data Domain, Inc. 2421 Mission College Boulevard, Santa Clara, CA 95054 866-WE-DDUPE; 408-980-4800 Version 1.0, Revision B September 2, 2009 Copyright 2009

More information

IBM Smart Business Storage Cloud

IBM Smart Business Storage Cloud GTS Systems Services IBM Smart Business Storage Cloud Reduce costs and improve performance with a scalable storage virtualization solution SoNAS Gerardo Kató Cloud Computing Solutions 2010 IBM Corporation

More information

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated

More information

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University plale@cs.indiana.edu LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August

More information

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT Building Bioinformatics Capacity in Africa Nicky Mulder CBIO Group, UCT Outline What is bioinformatics? Why do we need IT infrastructure? What e-infrastructure does it require? How we are developing this

More information

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata Implementing Network Attached Storage Ken Fallon Bill Bullers Impactdata Abstract The Network Peripheral Adapter (NPA) is an intelligent controller and optimized file server that enables network-attached

More information

The THREDDS Data Repository: for Long Term Data Storage and Access

The THREDDS Data Repository: for Long Term Data Storage and Access 8B.7 The THREDDS Data Repository: for Long Term Data Storage and Access Anne Wilson, Thomas Baltzer, John Caron Unidata Program Center, UCAR, Boulder, CO 1 INTRODUCTION In order to better manage ever increasing

More information

Data Services for Campus Researchers

Data Services for Campus Researchers Data Services for Campus Researchers Research Data Management Implementations Workshop March 13, 2013 Richard Moore SDSC Deputy Director & UCSD RCI Project Manager rlm@sdsc.edu SDSC Cloud: A Storage Paradigm

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

Reagan Moore, PI Mary Whitton, Project Manager. National Science Foundation Cooperative Agreement: OCI 0940841

Reagan Moore, PI Mary Whitton, Project Manager. National Science Foundation Cooperative Agreement: OCI 0940841 Reagan Moore, PI Mary Whitton, Project Manager National Science Foundation Cooperative Agreement: OCI 0940841 DFC to Support Hydrologic Modeling Jon Goodall and Bakinam Essawy University of Virginia DFC

More information

Scheduling in SAS 9.3

Scheduling in SAS 9.3 Scheduling in SAS 9.3 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc 2011. Scheduling in SAS 9.3. Cary, NC: SAS Institute Inc. Scheduling in SAS 9.3

More information

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf Jenkins as a Scientific Data and Image Processing Platform Ioannis K. Moutsatsos, Ph.D., M.SE. Novartis Institutes for Biomedical Research www.novartis.com June 18, 2014 #jenkinsconf Life Sciences are

More information

Collaborative SRB Data Federations

Collaborative SRB Data Federations WHITE PAPER Collaborative SRB Data Federations A Unified View for Heterogeneous High-Performance Computing INTRODUCTION This paper describes Storage Resource Broker (SRB): its architecture and capabilities

More information

Initializing SAS Environment Manager Service Architecture Framework for SAS 9.4M2. Last revised September 26, 2014

Initializing SAS Environment Manager Service Architecture Framework for SAS 9.4M2. Last revised September 26, 2014 Initializing SAS Environment Manager Service Architecture Framework for SAS 9.4M2 Last revised September 26, 2014 i Copyright Notice All rights reserved. Printed in the United States of America. No part

More information

Wrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney

Wrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney Wrangler: A New Generation of Data-intensive Supercomputing Christopher Jordan, Siva Kulasekaran, Niall Gaffney Project Partners Academic partners: TACC Primary system design, deployment, and operations

More information

Assessment of RLG Trusted Digital Repository Requirements

Assessment of RLG Trusted Digital Repository Requirements Assessment of RLG Trusted Digital Repository Requirements Reagan W. Moore San Diego Supercomputer Center 9500 Gilman Drive La Jolla, CA 92093-0505 01 858 534 5073 moore@sdsc.edu ABSTRACT The RLG/NARA trusted

More information

Managing Microsoft Office SharePoint Server Content with Hitachi Data Discovery for Microsoft SharePoint and the Hitachi NAS Platform

Managing Microsoft Office SharePoint Server Content with Hitachi Data Discovery for Microsoft SharePoint and the Hitachi NAS Platform Managing Microsoft Office SharePoint Server Content with Hitachi Data Discovery for Microsoft SharePoint and the Hitachi NAS Platform Implementation Guide By Art LaMountain and Ken Ewers February 2010

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Big Data and the Data Lake. February 2015

Big Data and the Data Lake. February 2015 Big Data and the Data Lake February 2015 My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act

More information

API Architecture. for the Data Interoperability at OSU initiative

API Architecture. for the Data Interoperability at OSU initiative API Architecture for the Data Interoperability at OSU initiative Introduction Principles and Standards OSU s current approach to data interoperability consists of low level access and custom data models

More information

Putting Genomes in the Cloud with WOS TM. ddn.com. DDN Whitepaper. Making data sharing faster, easier and more scalable

Putting Genomes in the Cloud with WOS TM. ddn.com. DDN Whitepaper. Making data sharing faster, easier and more scalable DDN Whitepaper Putting Genomes in the Cloud with WOS TM Making data sharing faster, easier and more scalable Table of Contents Cloud Computing 3 Build vs. Rent 4 Why WOS Fits the Cloud 4 Storing Sequences

More information

European Data Infrastructure - EUDAT Data Services & Tools

European Data Infrastructure - EUDAT Data Services & Tools European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28

More information

SOA, case Google. Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901.

SOA, case Google. Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901. Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901 SOA, case Google Written by: Sampo Syrjäläinen, 0337918 Jukka Hilvonen, 0337840 1 Contents 1.

More information

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Enhanced Research Data Management and Publication with Globus

Enhanced Research Data Management and Publication with Globus Enhanced Research Data Management and Publication with Globus Vas Vasiliadis Jim Pruyne Presented at OR2015 June 8, 2015 Presentations and other useful information available at globus.org/events/or2015/tutorial

More information

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High

More information

Long term retention and archiving the challenges and the solution

Long term retention and archiving the challenges and the solution Long term retention and archiving the challenges and the solution NAME: Yoel Ben-Ari TITLE: VP Business Development, GH Israel 1 Archive Before Backup EMC recommended practice 2 1 Backup/recovery process

More information

Identity and Access Management Integration with PowerBroker. Providing Complete Visibility and Auditing of Identities

Identity and Access Management Integration with PowerBroker. Providing Complete Visibility and Auditing of Identities Identity and Access Management Integration with PowerBroker Providing Complete Visibility and Auditing of Identities Table of Contents Executive Summary... 3 Identity and Access Management... 4 BeyondTrust

More information

iplant + irods: Enabling data driven collaborations Nirav Merchant iplant Collaborative/Univ. of Arizona nirav@email.arizona.edu VAMP 2012 Utrecht

iplant + irods: Enabling data driven collaborations Nirav Merchant iplant Collaborative/Univ. of Arizona nirav@email.arizona.edu VAMP 2012 Utrecht iplant + irods: Enabling data driven collaborations Nirav Merchant iplant Collaborative/Univ. of Arizona nirav@email.arizona.edu VAMP 2012 Utrecht Topic Coverage About iplant 4 th Paradigm Technology challenges

More information

Digital Preservation Lifecycle Management

Digital Preservation Lifecycle Management Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar San Diego Supercomputer Center, University of California,

More information

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Service Oriented Architecture SOA and Web Services John O Brien President and Executive Architect Zukeran Technologies

More information

Fedora Distributed data management (SI1)

Fedora Distributed data management (SI1) Fedora Distributed data management (SI1) Mohamed Rafi DART UQ Outline of Work Package To enable Fedora to natively handle large datasets. Explore SRB integration at the storage level of the repository

More information

Powerful Management of Financial Big Data

Powerful Management of Financial Big Data Powerful Management of Financial Big Data TickSmith s solutions are the first to apply the processing power, speed, and capacity of cutting-edge Big Data technology to financial data. We combine open source

More information

EMC BACKUP MEETS BIG DATA

EMC BACKUP MEETS BIG DATA EMC BACKUP MEETS BIG DATA Strategies To Protect Greenplum, Isilon And Teradata Systems 1 Agenda Big Data: Overview, Backup and Recovery EMC Big Data Backup Strategy EMC Backup and Recovery Solutions for

More information

IBM Tivoli Storage Manager Version 7.1.4. Introduction to Data Protection Solutions IBM

IBM Tivoli Storage Manager Version 7.1.4. Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.4 Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.4 Introduction to Data Protection Solutions IBM Note: Before you use this

More information

Introduction to Arvados. A Curoverse White Paper

Introduction to Arvados. A Curoverse White Paper Introduction to Arvados A Curoverse White Paper Contents Arvados in a Nutshell... 4 Why Teams Choose Arvados... 4 The Technical Architecture... 6 System Capabilities... 7 Commitment to Open Source... 12

More information

SAS 9.4 Intelligence Platform

SAS 9.4 Intelligence Platform SAS 9.4 Intelligence Platform Application Server Administration Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2013. SAS 9.4 Intelligence Platform:

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information