Introduction to LSST Data Management. Jeffrey Kantor Data Management Project Manager

Size: px
Start display at page:

Download "Introduction to LSST Data Management. Jeffrey Kantor Data Management Project Manager"

Transcription

1 Introduction to LSST Data Management Jeffrey Kantor Data Management Project Manager

2 LSST Data Management Principal Responsibilities Archive Raw Data: Receive the incoming stream of images that the Camera system generates to archive the raw images. Process to Data Products: Detect and alert on transient events within one minute of visit acquisition. Approximately once per year create and archive a Data Release, a static self-consistent collection of data products generated from all survey data taken from the date of survey initiation to the cutoff date for the Data Release. Publish: Make all LSST data available through an interface that uses community-accepted standards, and facilitate user data analysis and production of user-defined data products at Data Access Centers (DACs) and external sites.

3 LSST From the User s Perspective A stream of ~10 million time-domain events per night, detected and transmitted to event distribution networks within 60 seconds of observation. A catalog of orbits for ~6 million bodies in the Solar System. A catalog of ~37 billion objects (20B galaxies, 17B stars), ~7 trillion observations ( sources ), and ~30 trillion measurements ( forced sources ), produced annually, accessible through online databases. Deep co-added images. Level 1 Level 2 Services and computing resources at the Data Access Centers to enable user-specified custom processing and analysis. Software and APIs enabling development of analysis codes. Level 3

4 Data Management System Architecture Application Layer (LDM-151) Scientific Layer Pipelines constructed from reusable, standard parts, i.e. Application Framework Data Products representations standardized Metadata extendable without schema change Object-oriented, python, C++ Custom Software Middleware Layer (LDM-152) Portability to clusters, grid, other Provide standard services so applications behave consistently (e.g. provenance) Preserve performance (<1% overhead) Custom Software on top of Open Source, Offthe-shelf Software 02C.05 Science User Interface and Analysis Tools 02C Science Data Archive (Images, Alerts, Catalogs) 02C Data Access Services 02C.03.05, 02C Application Framework 02C SDQA and Science Pipeline Toolkits 02C , 02C , 02C.03, 02C.04 Alert, SDQA, Calibration, Data Release Productions/Pipelines 02C.07.01, 02C Processing Middleware 02C Infrastructure Services (System Administration, Operations, Security) Infrastructure Layer (LDM-129) Distributed Platform Different sites specialized for real-time alerting, data release production, petascale data access Off-the-shelf, Commercial Hardware & Software, Custom Integration 02C Archive Site 02C Base Site Physical Plant (included in above) 02C Long-Haul Communications Data Management System Design (LDM-148)

5 Level 2 L3 Level 1 Mapping Data Products into Pipelines 02C /02. Data Quality Assessment Pipelines 02C Calibration Products Production Pipelines 02C Instrumental Signature Removal Pipeline 02C Single-Frame Processing Pipeline 02C Image Differencing Pipeline 02C Alert Generation Pipeline 02C Moving Object Pipeline 02C Coaddition Pipeline 02C.04.04/.05 Association and Detection Pipelines 02C Object Characterization Pipeline 02C PSF Estimation 02C Science Pipeline Toolkit 02C.03.05/04.07 Common Application Framework Data Management Applications Design (LDM-151)

6 Infrastructure: Petascale Computing, Gbps Networks The computing cluster at the LSST Archive at NCSA will run the processing pipelines. Single-user, single-application data center Commodity computing clusters. Distributed file system for scaling and hierarchical storage Local-attached, shared-nothing storage when high bandwidth needed Archive Site and U.S. Data Access Center NCSA, Champaign, IL Long Haul Networks to transport data from Chile to the U.S. 2x100 Gbps from Summit to La Serena (new fiber) 2x40 Gbps for La Serena to Champaign, IL (path diverse, existing fiber) Base Site and Chilean Data Access Center La Serena, Chile

7 Middleware Layer: Isolating Hardware, Orchestrating Software Enabling execution of science pipelines on hundreds of thousands of cores. Frameworks to construct pipelines out of basic algorithmic components Orchestration of execution on thousands of cores Control and monitoring of the whole DM System Isolating the science pipelines from details of underlying hardware Services used by applications to access/produce data and communicate "Common denominator" interfaces handle changing underlying technologies Data Management Middleware Design (LDM-152)

8 Database and Science UI: Delivering to Users Massively parallel, distributed, fault-tolerant relational database. To be built on existing, robust, wellunderstood, technologies (MySQL and xrootd) Commodity hardware, open source Advanced prototype in existence (qserv) Science User Interface to enable the access to and analysis of LSST data Web and machine interfaces to LSST databases Visualization and analysis capabilities More: Talks by Becla, Van Dyk

9 Critical Prototypes: Algorithms and Technologies Algorithm Design Approximately 60% of the software functional capability has been prototyped Over 350,000 lines of c++, python coded, unit tested, integrated, run in production mode Have released three terabyte-scale datasets, including single frame measurements, point source and galaxy photometry Pre-cursors leveraged Pan-STARRS, SDSS, HSC Petascale Computing Design Executed in parallel on up to 10k cores (TeraGrid/XSEDE and NCSA Blue Waters hardware) with scalable results Petascale Database Design Conducted parallel database tests up to 300 nodes, 100 TB of data, 100% of scale for operations year 1 Gigascale Network Design Currently testing at up to 1 Gbps Agreements in principle are in hand with key infrastructure providers (NCSA, FIU/AmPath, REUNA, IN2P3)

10 Data Management Scope is Defined and Requirements are Established Data Product requirements have been vetted with Science Collaborations multiple times and have successfully passed review (Jul 13) Data quality and algorithmic assessments are far advanced and we understand the risks, successfully passed review (Sep 13) Hardware sizing has been refreshed based on latest scientific and engineering requirements, system design, technology trends, software performance profiles, acquisition strategy Interfaces are defined to Phase 2 level Requirements and Final Design have been baselined (Data Management Technical Control Team) Traceability from OSS to DMSR has been verified All WBS elements have been estimated and scheduled in PMCS with scope and basis of estimate documented

11 Data Management ICDs needed for Construction start are at Phase 2 Level under formal change control in progress (Phase 1) ICDs on Confluence: Docushare:

12 Going Where the Talent is: Distributed Team Mgmt, I&T, and Science QA User Interfaces Database Science Pipelines Middleware Infrastructure

13 Data Management Organization Project Manager J. Kantor Project Scientist M. Juric LSST DM Leadership DM Lead institutions are integrated into one project and are performing in their construction roles/responsibilities Survey Science Group SSG Lead Scientist TBD F. Economou LSST System Architecture K-T. Lim G. Dubois-Felsmann SLAC International Comms/Base Site R. Lambert NOAO Processing Services & Site Infrastructure D. Petravick NCSA Science Database & Data Acc Services J. Becla SLAC Alert Production A. Connolly UW/OPEN Data Release Production R. Lupton J. Swinbank Princeton Science User Interface & Tools X. Wu D. Ciardi IPAC Data Management Organization document-139

14 Leveraging national and international investments NSF/OCI Funded Formal relationships continue with the IRNC-funded AmLight project and they are the lead entity in securing Chile - US network capacity for LSST We have leveraged significant XSEDE and Blue Waters Service Unit and storage allocations for critical R&D phase prototypes and productions Our LSST Archive Center and US Data Access Center will hosted in the National Petascale Computing Facility at NCSA A strong relationship has been established with the Condor Group at the University of Wisconsin and HTCondor is now in our processing middleware baseline We have reused a wide range of open source software libraries and tools, many of which received seed funding from the NSF Other National/International Funded We have participated in joint development of astronomical software with Pan-STARRS and HSC We have fostered collaborative development of scientific database technology via the extremely Large Data Base (XLDB) conferences and collaborations with database developers (e.g. SciDB, MySQL, MonetDB) We have a deep process of community engagement to deliver products that are needed, and an architecture to allow the community to deliver their own tools

15 Data Management is Construction Ready The Data Management System is scoped and credibly estimated Requirements have been baselined and are achievable (LSE-61) Final Design baselined (LDM-148, -151, 152, -129, -135) Approximately 60% of the software functional capability has been prototyped Data and algorithmic assessments are far advanced and we understand the risks Hardware sizing has been done based on scientific and engineering requirements, system design, technology trends, software performance profiles, acquisition strategy All lowest level WBS elements have been estimated and scheduled in PMCS with scope and basis of estimate documented All lead institutions are demonstrably integrated into one project and are performing in their construction roles/responsibilities Core lead technical personnel are on board at all institutions Agreements in principle are in hand with key technology and center providers (NCSA, NOAO, FIU/AmPath, REUNA) The software development process has been exercised fully Have successfully executed eight software and data releases Standard/formal processes, tools, environment exercised repeatedly and refined Automated build, test environment is configured and exercised nightly/weekly Data Management PMCS plans current and complete

The LSST Data management and French computing activities. Dominique Fouchez on behalf of the IN2P3 Computing Team. LSST France April 8th,2015

The LSST Data management and French computing activities. Dominique Fouchez on behalf of the IN2P3 Computing Team. LSST France April 8th,2015 The LSST Data management and French computing activities Dominique Fouchez on behalf of the IN2P3 Computing Team LSST France April 8th,2015 OSG All Hands SLAC April 7-9, 2014 1 The LSST Data management

More information

LSST Resources for Data Analysis

LSST Resources for Data Analysis LSST Resources for the Community Lynne Jones University of Washington/LSST 1 Data Flow Nightly Operations : (at base facility) Each 15s exposure = 6.44 GB (raw) 2x15s = 1 visit 30 TB / night Generates

More information

LDM-129: Data Management Infrastructure Design

LDM-129: Data Management Infrastructure Design LDM-129: Data Management Infrastructure Design Release 3.0 Mike Freemon, Jeff Kantor October 11, 2013 Contents 1 2 Infrastructure Components 5 2 3 Facilities 9 2.1 3.1 National Petascale Computing Facility,

More information

LSST Data Management. Tim Axelrod Project Scientist - LSST Data Management. Thursday, 28 Oct 2010

LSST Data Management. Tim Axelrod Project Scientist - LSST Data Management. Thursday, 28 Oct 2010 LSST Data Management Tim Axelrod Project Scientist - LSST Data Management Thursday, 28 Oct 2010 Outline of the Presentation LSST telescope and survey Functions and architecture of the LSST data management

More information

The Large Synoptic Survey Telescope: Status Update

The Large Synoptic Survey Telescope: Status Update The Large Synoptic Survey Telescope: Status Update Steven M. Kahn LSST Director Mid-Decadal Review Committee December 13, 2015 LSST in a Nutshell The LSST is an integrated survey system designed to conduct

More information

LSST Database Design Jacek Becla

LSST Database Design Jacek Becla LSST Database Design Jacek Becla Database and Data Access Lead October 21-25, 2013 FINAL DESIGN REVIEW October 21-25, 2013 Name of Mee)ng Loca)on Date - Change in Slide Master 1 Outline Driving requirements

More information

PMCS - WBS with Definition

PMCS - WBS with Definition 02C Data Management Construction This WBS element provides the complete LSST Data Management System (DMS). The DMS has these main responsibilities in the LSST system: Process the incoming stream of images

More information

Organization and Staffing

Organization and Staffing Large Synoptic Survey Telescope (LSST) Organization and Staffing Robert McKercher LPM-103 Latest Revision Date: September 3, 2013 Change Record Version Date Description Owner name 1 9/3/2013 Initial Version

More information

MEMORANDUM OF AGREEMENT BETWEEN THE BOARD OF TRUSTEES OF THE UNIVERSITY OF ILLINOIS AND THE ASSOCIATION OF UNIVERSITIES FOR RESEARCH IN ASTRONOMY.

MEMORANDUM OF AGREEMENT BETWEEN THE BOARD OF TRUSTEES OF THE UNIVERSITY OF ILLINOIS AND THE ASSOCIATION OF UNIVERSITIES FOR RESEARCH IN ASTRONOMY. Memorandum of Agreement between The Board of Trustees of the University of Illinois (on behalf of the National Center for Supercomputing Applications NCSA) and the Association of Universities for Research

More information

Data analysis of L2-L3 products

Data analysis of L2-L3 products Data analysis of L2-L3 products Emmanuel Gangler UBP Clermont-Ferrand (France) Emmanuel Gangler BIDS 14 1/13 Data management is a pillar of the project : L3 Telescope Caméra Data Management Outreach L1

More information

Data Management So,ware Stack Intro

Data Management So,ware Stack Intro Data Management So,ware Stack Intro Mario Jurić LSST Data Management Project Scien:st SLAC DM Stack Working Mee:ng 10-12 December, 2012 1 LSST Data Management Tasks Processes the incoming stream of images

More information

Software challenges in the implementation of large surveys: the case of J-PAS

Software challenges in the implementation of large surveys: the case of J-PAS Software challenges in the implementation of large surveys: the case of J-PAS 1/21 Paulo Penteado - IAG/USP pp.penteado@gmail.com http://www.ppenteado.net/ast/pp_lsst_201204.pdf (K. Taylor) (A. Fernández-Soto)

More information

Data Lab System Architecture

Data Lab System Architecture Data Lab System Architecture Data Lab Context Data Lab Architecture Astronomer s Desktop Web Page Cmdline Tools Legacy Apps User Code User Mgmt Data Lab Ops Monitoring Presentation Layer Authentication

More information

Dominique Fouchez. 12 Fevrier 2011

Dominique Fouchez. 12 Fevrier 2011 données données CPPM 12 Fevrier 2011 The Data données one 6.4-gigabyte image every 17 seconds 15 terabytes of raw scientific image data / night 60-petabyte final image data archive 20-petabyte final database

More information

LSST Data Management plans: Pipeline outputs and Level 2 vs. Level 3

LSST Data Management plans: Pipeline outputs and Level 2 vs. Level 3 LSST Data Management plans: Pipeline outputs and Level 2 vs. Level 3 Mario Juric Robert Lupton LSST DM Project Scien@st Algorithms Lead LSST SAC Name of Mee)ng Loca)on Date - Change in Slide Master 1 Data

More information

How To Use The Wynn Odi

How To Use The Wynn Odi WIYN ODI: Observing Process, Data Analysis and Archiving Pierre Martin Yale Survey Workshop, October 2009 ODI: Scientific Challenges ODI is designed to take advantage of the best seeing conditions at WIYN.

More information

The Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia

The Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia The Murchison Widefield Array Data Archive System Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia Agenda Dataflow Requirements Solutions & Lessons learnt Open solution

More information

irods at CC-IN2P3: managing petabytes of data

irods at CC-IN2P3: managing petabytes of data Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules irods at CC-IN2P3: managing petabytes of data Jean-Yves Nief Pascal Calvat Yonny Cardenas Quentin Le Boulc h

More information

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

Scientific Computing Meets Big Data Technology: An Astronomy Use Case Scientific Computing Meets Big Data Technology: An Astronomy Use Case Zhao Zhang AMPLab and BIDS UC Berkeley zhaozhang@cs.berkeley.edu In collaboration with Kyle Barbary, Frank Nothaft, Evan Sparks, Oliver

More information

Learning from Big Data in

Learning from Big Data in Learning from Big Data in Astronomy an overview Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ From traditional astronomy 2 to Big Data

More information

LSST and the Cloud: Astro Collaboration in 2016 Tim Axelrod LSST Data Management Scientist

LSST and the Cloud: Astro Collaboration in 2016 Tim Axelrod LSST Data Management Scientist LSST and the Cloud: Astro Collaboration in 2016 Tim Axelrod LSST Data Management Scientist DERCAP Sydney, Australia, 2009 Overview of Presentation LSST - a large-scale Southern hemisphere optical survey

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Migrating a (Large) Science Database to the Cloud

Migrating a (Large) Science Database to the Cloud The Sloan Digital Sky Survey Migrating a (Large) Science Database to the Cloud Ani Thakar Alex Szalay Center for Astrophysical Sciences and Institute for Data Intensive Engineering and Science (IDIES)

More information

LSST Evaluation of REDDnet and LStore

LSST Evaluation of REDDnet and LStore LSST Evaluation of REDDnet and LStore Evaluating data storage and sharing methods for a coming torrent of astronomy data. National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

More information

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform STREAM ANALYTIX Industry s only Multi-Engine Streaming Analytics Platform One Platform for All Create real-time streaming data analytics applications in minutes with a powerful visual editor Get a wide

More information

LSST All Hands Meeting SLAC, December 4-8 2006 (MAP)

LSST All Hands Meeting SLAC, December 4-8 2006 (MAP) LSST All Hands Meeting SLAC, December 4-8 2006 (MAP) Monday, December 4 th Plenary Session Day One, Kavli Auditorium Project Status 1:00 Welcome; Project and MREFC Status D. Sweeney 1:40 Directors Report

More information

STeP-IN SUMMIT 2013. June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

STeP-IN SUMMIT 2013. June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case) 10 th International Conference on Software Testing June 18 21, 2013 at Bangalore, INDIA by Sowmya Krishnan, Senior Software QA Engineer, Citrix Copyright: STeP-IN Forum and Quality Solutions for Information

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Massive Cloud Auditing using Data Mining on Hadoop

Massive Cloud Auditing using Data Mining on Hadoop Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed

More information

LSST Data Management System Applications Layer Simulated Data Needs Description: Simulation Needs for DC3

LSST Data Management System Applications Layer Simulated Data Needs Description: Simulation Needs for DC3 LSST Data Management System Applications Layer Simulated Data Needs Description: Simulation Needs for DC3 Draft 25 September 2008 A joint document from the LSST Data Management Team and Image Simulation

More information

Conquering the Astronomical Data Flood through Machine

Conquering the Astronomical Data Flood through Machine Conquering the Astronomical Data Flood through Machine Learning and Citizen Science Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ The Problem:

More information

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION EXECUTIVE SUMMARY Oracle business intelligence solutions are complete, open, and integrated. Key components of Oracle business intelligence

More information

Challenges in e-science: Research in a Digital World

Challenges in e-science: Research in a Digital World Challenges in e-science: Research in a Digital World Thom Dunning National Center for Supercomputing Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

More information

Visualizing and Analyzing Massive Astronomical Datasets with Partiview

Visualizing and Analyzing Massive Astronomical Datasets with Partiview Visualizing and Analyzing Massive Astronomical Datasets with Partiview Brian P. Abbott 1, Carter B. Emmart 1, Stuart Levy 2, and Charles T. Liu 1 1 American Museum of Natural History & Hayden Planetarium,

More information

The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project

The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project Alastair Duncan STFC Pre Coffee talk STFC July 2014 SCAPE Scalable Preservation Environments The

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved. Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat

More information

Data storage services at CC-IN2P3

Data storage services at CC-IN2P3 Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief Agenda Hardware: Storage on disk. Storage on tape. Software:

More information

Data Driven Discovery In the Social, Behavioral, and Economic Sciences

Data Driven Discovery In the Social, Behavioral, and Economic Sciences Data Driven Discovery In the Social, Behavioral, and Economic Sciences Simon Appleford, Marshall Scott Poole, Kevin Franklin, Peter Bajcsy, Alan B. Craig, Institute for Computing in the Humanities, Arts,

More information

Ultimate Guide to Oracle Storage

Ultimate Guide to Oracle Storage Ultimate Guide to Oracle Storage Presented by George Trujillo George.Trujillo@trubix.com George Trujillo Twenty two years IT experience with 19 years Oracle experience. Advanced database solutions such

More information

A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent

More information

Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo

Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Sensor Network Messaging Service Hive/Hadoop Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Contents 1 Introduction 2 What & Why Sensor Network

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

Summary of Data Management Principles Dark Energy Survey V2.1, 7/16/15

Summary of Data Management Principles Dark Energy Survey V2.1, 7/16/15 Summary of Data Management Principles Dark Energy Survey V2.1, 7/16/15 This Summary of Data Management Principles (DMP) has been prepared at the request of the DOE Office of High Energy Physics, in support

More information

White Paper November 2015. Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses

White Paper November 2015. Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses White Paper November 2015 Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses Our Evolutionary Approach to Integration With the proliferation of SaaS adoption, a gap

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Libraries and Large Data

Libraries and Large Data Libraries and Large Data Super Computing 2012 Elisabeth Long University of Chicago Library What is the Library s Interest in Big Data? Large Data and Libraries We ve Always Collected Data Intellectual

More information

Senior Business Intelligence/Engineering Analyst

Senior Business Intelligence/Engineering Analyst We are very interested in urgently hiring 3-4 current or recently graduated Computer Science graduate and/or undergraduate students and/or double majors. NetworkofOne is an online video content fund. We

More information

Meeting the challenges of today s oil and gas exploration and production industry.

Meeting the challenges of today s oil and gas exploration and production industry. Meeting the challenges of today s oil and gas exploration and production industry. Leveraging innovative technology to improve production and lower costs Executive Brief Executive overview The deep waters

More information

Software Development around a Millisecond

Software Development around a Millisecond Introduction Software Development around a Millisecond Geoffrey Fox In this column we consider software development methodologies with some emphasis on those relevant for large scale scientific computing.

More information

SUI Breakout 2.1 Architecture and Tools

SUI Breakout 2.1 Architecture and Tools SUI Breakout 2.1 Architecture and Tools 2:00 SUI Architecture Schuyler Van Dyk 2:15 Firefly Trey Roby 2:35 VOTools Ray Plante 2:55 ASCOT Simon Krughoff 3:15 MW- V UI tool Michael Wood- Vasey 3:30 LintoO

More information

CRITEO INTERNSHIP PROGRAM 2015/2016

CRITEO INTERNSHIP PROGRAM 2015/2016 CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with

More information

Orbiter Series Service Oriented Architecture Applications

Orbiter Series Service Oriented Architecture Applications Workshop on Science Agency Uses of Clouds and Grids Orbiter Series Service Oriented Architecture Applications Orbiter Project Overview Mark L. Green mlgreen@txcorp.com Tech-X Corporation, Buffalo Office

More information

ViSION Status Update. Dan Savu Stefan Stancu. D. Savu - CERN openlab

ViSION Status Update. Dan Savu Stefan Stancu. D. Savu - CERN openlab ViSION Status Update Dan Savu Stefan Stancu D. Savu - CERN openlab 1 Overview Introduction Update on Software Defined Networking ViSION Software Stack HP SDN Controller ViSION Core Framework Load Balancer

More information

WOS for Research. ddn.com. DDN Whitepaper. Utilizing irods to manage collaborative research. 2012 DataDirect Networks. All Rights Reserved.

WOS for Research. ddn.com. DDN Whitepaper. Utilizing irods to manage collaborative research. 2012 DataDirect Networks. All Rights Reserved. DDN Whitepaper WOS for Research Utilizing irods to manage collaborative research. 2012 DataDirect Networks. All Rights Reserved. irods and the DDN Web Object Scalar (WOS) Integration irods, an open source

More information

MANAGING AND MINING THE LSST DATA SETS

MANAGING AND MINING THE LSST DATA SETS MANAGING AND MINING THE LSST DATA SETS Astronomy is undergoing an exciting revolution -- a revolution in the way we probe the universe and the way we answer fundamental questions. New technology enables

More information

BaBar and ROOT data storage. Peter Elmer BaBar Princeton University ROOT2002 14 Oct. 2002

BaBar and ROOT data storage. Peter Elmer BaBar Princeton University ROOT2002 14 Oct. 2002 BaBar and ROOT data storage Peter Elmer BaBar Princeton University ROOT2002 14 Oct. 2002 The BaBar experiment BaBar is an experiment built primarily to study B physics at an asymmetric high luminosity

More information

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013 Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device

More information

Data Literacy For All: Astrophysics and Beyond (Astronomy is evidence-based forensic science, thus it is a data & information science)

Data Literacy For All: Astrophysics and Beyond (Astronomy is evidence-based forensic science, thus it is a data & information science) Data Literacy For All: Astrophysics and Beyond (Astronomy is evidence-based forensic science, thus it is a data & information science) Kirk Borne George Mason University, Fairfax, VA www.kirkborne.net

More information

Make the Most of Big Data to Drive Innovation Through Reseach

Make the Most of Big Data to Drive Innovation Through Reseach White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability

More information

GRID Stream Database Management for Scientific Applications Milena Ivanova (Koparanova) and Tore Risch. IT Department, Uppsala University, Sweden

GRID Stream Database Management for Scientific Applications Milena Ivanova (Koparanova) and Tore Risch. IT Department, Uppsala University, Sweden GRID Stream Database Management for Scientific Applications Milena Ivanova (Koparanova) and Tore Risch IT Department, Uppsala University, Sweden Outline Motivation Stream Data Management Computational

More information

Creating A Galactic Plane Atlas With Amazon Web Services

Creating A Galactic Plane Atlas With Amazon Web Services Creating A Galactic Plane Atlas With Amazon Web Services G. Bruce Berriman 1*, Ewa Deelman 2, John Good 1, Gideon Juve 2, Jamie Kinney 3, Ann Merrihew 3, and Mats Rynge 2 1 Infrared Processing and Analysis

More information

Unified Batch & Stream Processing Platform

Unified Batch & Stream Processing Platform Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built

More information

The Impact of PaaS on Business Transformation

The Impact of PaaS on Business Transformation The Impact of PaaS on Business Transformation September 2014 Chris McCarthy Sr. Vice President Information Technology 1 Legacy Technology Silos Opportunities Business units Infrastructure Provisioning

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Version 1.0

D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Version 1.0 D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Document Information Contract Number 288777 Project Website www.montblanc-project.eu Contractual Deadline

More information

Engineering the Data Processing Pipeline

Engineering the Data Processing Pipeline Engineering the Data Processing Pipeline Mark Stalzer Center for Advanced Computing Research California Institute of Technology stalzer@caltech.edu October 29, 2009 A systems engineering view of computational

More information

Data-Intensive Science and Scientific Data Infrastructure

Data-Intensive Science and Scientific Data Infrastructure Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific

More information

Managing large clusters resources

Managing large clusters resources Managing large clusters resources ID2210 Gautier Berthou (SICS) Big Processing with No Locality Job( /crawler/bot/jd.io/1 ) submi t Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth

More information

STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM

STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM Albert M. K. Cheng, Shaohong Fang Department of Computer Science University of Houston Houston, TX, 77204, USA http://www.cs.uh.edu

More information

Cloud Computing @ JPL Science Data Systems

Cloud Computing @ JPL Science Data Systems Cloud Computing @ JPL Science Data Systems Emily Law, GSAW 2011 Outline Science Data Systems (SDS) Space & Earth SDSs SDS Common Architecture Components Key Components using Cloud Computing Use Case 1:

More information

The PACS Software System. (A high level overview) Prepared by : E. Wieprecht, J.Schreiber, U.Klaas November,5 2007 Issue 1.

The PACS Software System. (A high level overview) Prepared by : E. Wieprecht, J.Schreiber, U.Klaas November,5 2007 Issue 1. The PACS Software System (A high level overview) Prepared by : E. Wieprecht, J.Schreiber, U.Klaas November,5 2007 Issue 1.0 PICC-ME-DS-003 1. Introduction The PCSS, the PACS ICC Software System, is the

More information

How To Teach Data Science

How To Teach Data Science The Past, Present, and Future of Data Science Education Kirk Borne @KirkDBorne http://kirkborne.net George Mason University School of Physics, Astronomy, & Computational Sciences Outline Research and Application

More information

Taming Big Data Storage with Crossroads Systems StrongBox

Taming Big Data Storage with Crossroads Systems StrongBox BRAD JOHNS CONSULTING L.L.C Taming Big Data Storage with Crossroads Systems StrongBox Sponsored by Crossroads Systems 2013 Brad Johns Consulting L.L.C Table of Contents Taming Big Data Storage with Crossroads

More information

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers

More information

IBM Deep Computing Visualization Offering

IBM Deep Computing Visualization Offering P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: parijatsharma@in.ibm.com Summary Deep Computing Visualization in Oil & Gas

More information

EMA Radar for Workload Automation (WLA): Q2 2012

EMA Radar for Workload Automation (WLA): Q2 2012 EMA Radar for Workload Automation (WLA): Q2 2012 By Torsten Volk, Senior Analyst Enterprise Management Associates (EMA) June 2012 Introduction Founded in 2000 in Las Vegas, Nevada, Flux offers a lightweight,

More information

NUIT Tech Talk: Trends in Research Data Mobility

NUIT Tech Talk: Trends in Research Data Mobility NUIT Tech Talk: Trends in Research Data Mobility Pascal Paschos NUIT Academic & Research Technologies, Research Computing Services Matt Wilson NUIT Cyberinfrastructure, Telecommunication and Network Services

More information

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief Optimizing Storage for Better TCO in Oracle Environments INFOSTOR Executive Brief a QuinStreet Excutive Brief. 2012 To the casual observer, and even to business decision makers who don t work in information

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007 Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the

More information

Data Aggregation and Cloud Computing

Data Aggregation and Cloud Computing Data Intensive Scalable Computing Harnessing the Power of Cloud Computing Randal E. Bryant February, 2009 Our world is awash in data. Millions of devices generate digital data, an estimated one zettabyte

More information

MANAGING SCIENTIFIC DATA WITH NDN

MANAGING SCIENTIFIC DATA WITH NDN MANAGING SCIENTIFIC DATA WITH NDN Chengyu Fan, Susmit Shannigrahi, Steve DiBenedetto, Catherine Olschanowsky, Christos Papadopoulos NDNcomm 2015 Sept 28, 2015 Los Angeles, CA Supported by NSF #13410999

More information

T a c k l i ng Big Data w i th High-Performance

T a c k l i ng Big Data w i th High-Performance Worldwide Headquarters: 211 North Union Street, Suite 105, Alexandria, VA 22314, USA P.571.296.8060 F.508.988.7881 www.idc-gi.com T a c k l i ng Big Data w i th High-Performance Computing W H I T E P A

More information

Global Scientific Data Infrastructures: The Big Data Challenges. Capri, 12 13 May, 2011

Global Scientific Data Infrastructures: The Big Data Challenges. Capri, 12 13 May, 2011 Global Scientific Data Infrastructures: The Big Data Challenges Capri, 12 13 May, 2011 Data-Intensive Science Science is, currently, facing from a hundred to a thousand-fold increase in volumes of data

More information

Quality Assurance Subsystem Design Document

Quality Assurance Subsystem Design Document Quality Assurance Subsystem Design Document Contents 1 Signatures 2 Revision history 3 Document number 4 Introduction 4.1 Description 4.2 Supporting Documentation 4.3 Requirements 4.4 Jargon 5 Institutional

More information

Copyright 2011 Sentry Data Systems, Inc. All Rights Reserved. No Unauthorized Reproduction.

Copyright 2011 Sentry Data Systems, Inc. All Rights Reserved. No Unauthorized Reproduction. The Datanex Platform is a healthcare focused cloud computing platform that allows solution providers to construct rich healthcare business intelligence applications that leverage the world s fastest and

More information

Enabling Cloud Architecture for Globally Distributed Applications

Enabling Cloud Architecture for Globally Distributed Applications The increasingly on demand nature of enterprise and consumer services is driving more companies to execute business processes in real-time and give users information in a more realtime, self-service manner.

More information

salsadpi: a dynamic provisioning interface for IaaS cloud

salsadpi: a dynamic provisioning interface for IaaS cloud salsadpi: a dynamic provisioning interface for IaaS cloud Tak-Lon (Stephen) Wu Computer Science, School of Informatics and Computing Indiana University, Bloomington, IN taklwu@indiana.edu Abstract On-demand

More information

How To Use Hp Vertica Ondemand

How To Use Hp Vertica Ondemand Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater

More information

Data Lab Operations Concepts

Data Lab Operations Concepts Data Lab Operations Concepts 1 Introduction This talk will provide an overview of Data Lab components to be implemented Core infrastructure User applications Science Capabilities User Interfaces The scope

More information

Five Steps to Integrate SalesForce.com with 3 rd -Party Systems and Avoid Most Common Mistakes

Five Steps to Integrate SalesForce.com with 3 rd -Party Systems and Avoid Most Common Mistakes Five Steps to Integrate SalesForce.com with 3 rd -Party Systems and Avoid Most Common Mistakes This white paper will help you learn how to integrate your SalesForce.com data with 3 rd -party on-demand,

More information

globus online Cloud-based services for (reproducible) science Ian Foster Computation Institute University of Chicago and Argonne National Laboratory

globus online Cloud-based services for (reproducible) science Ian Foster Computation Institute University of Chicago and Argonne National Laboratory globus online Cloud-based services for (reproducible) science Ian Foster Computation Institute University of Chicago and Argonne National Laboratory Computation Institute (CI) Apply to challenging problems

More information

UPS battery remote monitoring system in cloud computing

UPS battery remote monitoring system in cloud computing , pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology

More information

Oracle Data Integrator 11g New Features & OBIEE Integration. Presented by: Arun K. Chaturvedi Business Intelligence Consultant/Architect

Oracle Data Integrator 11g New Features & OBIEE Integration. Presented by: Arun K. Chaturvedi Business Intelligence Consultant/Architect Oracle Data Integrator 11g New Features & OBIEE Integration Presented by: Arun K. Chaturvedi Business Intelligence Consultant/Architect Agenda 01. Overview & The Architecture 02. New Features Productivity,

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University plale@cs.indiana.edu LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August

More information

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP Your business is swimming in data, and your business analysts want to use it to answer the questions of today and tomorrow. YOU LOOK TO

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information