Big Data Challenges in Simulation-based Science

Size: px
Start display at page:

Download "Big Data Challenges in Simulation-based Science"

Transcription

1 Big Data Challenges in Simulation-based Science Manish Parashar Rutgers Discovery Informatics Institute (RDI 2 ) NSF Center for Cloud and Autonomic Computing (CAC) *Hoang Bui, Tong Jin, Qian Sun, Fan Zhang and other ex-students and collaborators (ORNL, GT, UU, NCSU, SNL, LBNL, LLNL, Manish Parashar, Rutgers University Outline Data Grand Challenges in Simulation-base Science Rethinking the simulations -> insights pipeline The DataSpaces Project Conclusion 1

2 6/28/14 Modern Science & Society Transformed by Compute & Data New paradigms and prac/ces in science and engineering Data- driven, data and compute- intensive Inherently mul/- disciplinary Collabora/ve (university, na/onal, global) Many Challenges Computing, Data, Software, People Many Challenges. Computing Multicore; large and increasing core counts, deep memory hierarchies (TH-2: 54.9 PF, 3.12 M Cores, 1.4 PB, 25 MW) New prgm. models, concerns (fault tolerance, energy, etc) New models & technologies: Clouds, grids, hybrid manycore, accelerators, deep storage hierarchies, Data Generating more data than in all of human history: preserve, mine, share? How do we create data scientists/engineers? Software Complex applications on coupled compute-data-networked environments, tools needed Modern apps: 106+ lines, many groups contribute, take decades to develop, very long lifetimes People Multidisciplinary expertise essential! Appropriate academic program, career tracks 4 2

3 6/28/14 The Era of escience and Big Data SKA project will generate 1 EB data/day in 2020! Climate Data > 36TB > 2PB Keep increasing! Exa Bytes Climate, Environment Square Kilometer Array Clearly, modern scientific network/ instruments/experiments/ are producing Big Data!! Bytes per day Genomics Peta Bytes Tera Bytes Giga Bytes Climate, Environment TeraGrid/ XSEDE, Blue LHC Waters But what about HPC? Genomics Genome sequencing are doubling in output every 9 months! LHC will produce roughly 15PB data/ year! Many smaller datasets 2012 LHC LSST 100PB data from LSST by the end of this decade!! 2020 Credit: R. Pennington/A. Blatecky O Scientific Discovery through Simulations (I) ACI providing increasing computational scales, capabilities and capacities Scientific simulations running on high-end computing systems generate huge amounts of data! If a single core produces 2MB/minute on average, one of these machines could generate simulation data between ~170TB per hour -> ~700PB per day -> ~1.4EB per year Successful scientific discovery depends on a comprehensive understanding of this enormous simulation data How we enable the computation scientists to efficiently manage and explore extreme scale data: find the needles in haystack?? 3

4 Scientific Discovery through Simulations (II) Complex workflows integrating coupled models, data management/ processing, analytics Tight / loose coupling, data driven, ensembles Advanced numerical methods (E.g., Adaptive Mesh Refinement) Integrated (online) uncertainty quantification, analytics Complex, heterogeneous components Large data volumes and data rates Complex couplings, data redistribution, data transformations Dynamic data exchange patterns Strict performance, energy constraints Traditional Simulation -> Insight Pipelines Break Down Figure. Traditional data analysis pipeline Traditional simulation -> insight pipeline: Run large-scale simulation workflows on large supercomputers Dump data on parallel disk systems Export data to archives Move data to users sites usually selected subsets Perform data manipulations and analysis on mid-size clusters Collect experimental / observational data Move to analysis sites Perform comparison of experimental/observational to validate simulation data 4

5 Challenges Faced by Traditional HPC Data Pipelines Scalable data analytics challenge Can current data mining, manipulation and visualization algorithms still work effectively on extreme scale machine? I/O and storage challenge Increasing performance gap: disks are outpaced by computing speed Figure. Traditional data analysis pipeline Data movement challenge Lots of data movement between simulation and analysis, between coupled mutli-physics simulation components => longer latencies Improving data locality is critical: do work where the data resides! Energy challenge Future extreme systems are designed to have low-power chips however, much greater power consumption will be due to memory and data movement! The costs of data movement are increasing and dominating! The Cost of Data Movement Moving data between node memory and persistent storage is slow! The energy cost of moving data is a significant concern performance gap Energy _ move _ data = bitrate * length 2 cross_section_area_of_wire From K. Yelick, Software and Algorithms for Exascale: Ten Ways to Waste an Exascale Computer " 5

6 Challenges Faced by Traditional HPC Data Pipelines Traditional data analysis pipeline The costs of data movement (power and performance) are increasing and dominating! We need to Rethink the Data Management Pipeline! Reduce data movement Move computation/analytics closer to the data Add value to simulation data along the IO path Rethinking the Data Management Pipeline Hybrid Staging + In-Situ & In-Transit Execution Issues/Challenges Programming abstractions/systems Mapping and scheduling Control and data flow Autonomic runtime 6

7 Design space of possible workflow architectures Loca%on of the compute resources Same cores as the simula/on (in situ) Some (dedicated) cores on the same nodes Some dedicated nodes on the same machine Dedicated nodes on an external resource Data access, placement, and persistence Direct access to simula/on data structures Shared memory access via hand- off / copy Shared memory access via non- vola/le near node storage (NVRAM) Data transfer to dedicated nodes or external resources Synchroniza%on and scheduling Execute synchronously with simula/on every n th simula/on /me step Execute asynchronously Analysis Tasks Simulation Visualization DRAM DRAM Node 2 Node 1 Sharing cores with the simulation Using distinct cores on same node CPUs DRAM NVRAM SSD Hard Disk... Network Node N DRAM DRAM Simulation Node Network Communication Staging Node Processing data on remote nodes CPUs DRAM Staging option 2 NVRAM Staging option 3 SSD Hard Disk Staging option 1 DataSpaces In-situ/In-transit Data Management & Analytics ADIOS/DataSpaces dataspaces.org! Virtual shared-space programming abstraction! Simple API for coordination, interaction and messaging! Distributed, associative, in-memory object store! Online data indexing, flexible querying! Adaptive cross-layer runtime management! Hybrid in-situ/in-transit execution! Efficient, high-throughput/low-latency asynchronous data transport 7

8 DataSpaces: A Scalable Shared Space Abstraction for Hybrid Data Staging [HPDC10, JCC12]! Virtual shared-space abstraction! Simple API for coordination, interaction and messaging! Provides a global-view programming abstraction consistent with PGAS! Distributed, associative, in-deepmemory object store! Online data indexing, flexible querying! Adaptive cross-layer runtime management! Hybrid in-situ/in-transit execution! Data-centric mappings! High-throughput/low-latency memoryto-memory asynchronous data transport Dynamic coordination and interaction patterns between the coupled applications Transparent data redistribution Complex geometry-based queries In-space (online) data transformation and manipulations DataSpaces Abstraction: Indexing + DHT Dynamically constructed structured overlay using hybrid staging cores Index constructed online using SFC mappings and applications attributes E.g., application domain, data, field values of interest, etc. DHT used to maintain metadata information E.g., geometric descriptors for the shared data, FastBit indices, etc. Data objects load-balanced separately across staging cores The SFC maps the global domain to a set of intervals Intervals are non-contiguous and can lead to load imbalance Second mapping compacts the intervals into a contiguous virtual interval Split the contiguous interval equally to the DataSpaces nodes 8

9 DataSpaces: Scalability on ORNL Titan Time(s) GB 4GB 8GB 16GB 32GB 64GB 128GB 256GB Data redistribution time 64K/4K 32K/2K 16K/1K 8192/ / / /64 512/32 Throughput(GB/s) GB 4GB 8GB 16GB 32GB 64GB 128GB 256GB Aggregate data redistribution throughput 64K/4K 32K/2K 16K/1K 8192/ / / /64 512/32 Weak scaling with increasing number of processors Applications redistribute data using DataSpaces Application 1 runs on M processors and inserts data into the space Application 2 runs on N processors and retrieves data from the space Result: A 128 fold increase in the application sizes from 512 to 64K writers, total data size exchanged per step is increased from 2GB to 256GB DataSpaces: Enabling Coupled Scientific Workflows at Extreme Scales Multphysics Code Coupling at Extreme Scales [CCGrid10] Data-centric Mappings for In-Situ Workflows [IPDPS12] PGAS Extensions for Code Coupling [CCPE13] data_kernels.o 0x69 0x20 0x61 0x6d 0x20 0x63 0x6f 0x6f 0x6c 0x0 gcc -c data_kernels.c kernel_min { for i = 1, n for j = 1, m for k = 1, p if (min > A(i, j, k)) min = A(i, j, k) } Link Applications.o Applications executable aprun Compute nodes return() Staging nodes aprun rexec() Load Runtime execution system (Rexec) data_kernels.lua for i = 0, ni-1 do for j = 0, nj-1 do for k = 0, nk-1 do val = input:get_val(i,j,k) if min > val then min = val end end end Dynamic Code Deployment In-Staging [IPDPS11] 9

10 6/28/14 Applications Enabled Combustion: S3D uses chemical model to simulate combustion process in order to understand the ignition characteristic of bio-fuels for automotive usage [SC 2012] Fusion: EPSi simulation workflow provides insight into edge plasma physics in magnetic fusion devices by simulating movement of millions particles [CCGrid 2010] Subsurface modeling: IPARS uses multi-physics, multi-model formulations and coupled multi-block grids to support oil reservoir simulation of realistic, highresolution reservoir studies with a million or more grids [CCGrid 2011] Computational mechanics: FEM coupled with AMR is often used to solve heat transfer or fluid dynamic problems. AMR only performs refinements on important region of the mesh [SC 2014 (in progress)] Material Science: SNS workflow enables integration of data, analysis, and modeling tools for materials science experiments to accelerate scientific discoveries which requires beam lines data query capability Material Science: QMCPack utilizes quantum Monte Carlo (QMC) studies in heterogeneous catalysis of transition metal nanoparticles, phase transitions, properties of materials under pressure, and strongly correlated materials Workflows beyond simulations Content-based image retrieval (CBIR) on digitized peripheral blood smear specimens using low-level morphological features (MICAII 2012, CAC 2013, MICAII 2013) Asynchronous Replica Exchange - Monitor the progress of replicas, using the secondary structure prediction methods (J. Comput. Chem. 2008) Control fluid flow in micro-devices by placing a sequence of pillar obstacles (MTAGS 2013, CISE 2013) Monte Carlo numerical strategies to unravel how nucleosomes shape DNA into chromatin (J. Biophysical 2014) 10

11 In-Situ Feature Extraction and Tracking using Decentralized Online Clustering (DISC 12, ICAC 10) DOC workers executed in- situ on simula%on machines DOC Overlay Simula/on Compute Nodes Processor core runs simula/on Processor core runs DOC worker Benefits of runtime feature extraction and tracking (1) Scientists can follow the events of interest (or data of interest) (2) Scientists can do real-time monitoring of the running simulations One compute node In-situ viz. and monitoring with staging" Pixie3D 1024 cores Pixplot 8 cores ParaView Server 4 cores record.bp record.bp pixie3d.bp pixie3d.bp DataSpaces Pixmon 1 core (login node) 11

12 AMR-as-a-Service using DataSpaces FEM (TalyFEM) Grid GridField pgrid pgridfield DataSpaces Grid GridField pgrid pgridfield AMR Finite Element Model (FEM): Models using uniform 3D meshes for nearrealistic engineering problems such as heattransfer, fluid flow and phase transformation AMR Service: Refines localized areas of interest, e.g., with high local error DataSpaces-as-a-Service: Persistent staging servers run as a service on HEC platform Applications can connect, exchange data objects, and disconnect independently Overview of Recent DataSpaces Research Programming abstractions / system DataSpaces: Interaction, coordination and messaging abstractions for coupled scientific workflow [HPDC10, CCGrid10, HiPC12, JCC12] XpressSpaces: PGAS extensions for coupling using DataSpaces [CCGrid11, CCPE13] ActiveSpaces: Dynamic code deployment for in-staging data processing [IPDPS11] Runtime mechanisms Data-centric Task Mapping: Reduce data movement and increase intra-node data sharing [IPDPS12, DISC12] In-situ & In-transit Data Analytics: Simulation-time analysis of large volume data by combining in-situ and in-transit execution [SC12, DISC12] Cross-layer Adaptation: Adaptive cross-layer approach for dynamic data management in large scale simulation-analysis workflow [SC13] Value-based Data Indexing and Querying: Use FastBit to build in-situ, in-memory valuebased indexing and query support in the staging area [In review] Power/performance Tradeoffs: Characterizing power/performance tradeoffs for dataintensive simulations workflows [SC13] Data Staging over Deep Memory Hierarchy: Build distributed associative object store over hierarchical memory storage, e.g., DRAM/NVRAM/SSD [HiPC13] High-throughput/low-latency asynchronous data transport DART: Network independent transport library for high speed asynchronous data extraction and transfer [HPDC08] 12

13 6/28/14 Integrating In-Situ and In-Transit Analytics (SC 12) S3D: First-principles direct numerical simulation Simulation resolves features on the order of 10 simulation time steps Currently on the order of every 400th time step can be written to disk Temporal fidelity is compromised when analysis is done as a postprocess Recent data sets generated by S3D, developed at the Combus/on Research Facility, Sandia Na/onal Laboratories In-situ Topological Analysis as Part of S3D* Sta/s/cs Volume Rendering Topology Identify features of interest *J. C. Bennett et al., Combining In-Situ and In-Transit Processing to Enable Extreme-Scale Scientific Analysis, SC 12, Salt Lake City, Utah, November,

14 Integrating In-Situ and In-Transit Analytics (SC 12) Primary resources execute the main simulation and in situ computations Secondary resources provide a staging area whose cores act as containers for in transit computations 4896 cores total (4480 simula/on/in situ; 256 in transit; 160 task scheduling/data movement) Simula/on size: 1600x1372x430 All measurements are per simula/on /me step Simulation case study with S3D: Timing results for 4896 cores and analysis every 10 th simulation time step S3D& in>situ& data&movement& in>transit&& in>situ&sta1s1cs& hybrid&sta1s1cs& in>situ&visualiza1on& hybrid&visualiza1on& 1.0% 1.0%.43%.004% hybrid&topology& & 1.61% simula1on& 168.5& 0& 20& 40& 60& 80& 100& 120& 140& 160& seconds& 14

15 Simulation case study with S3D: Timing results for 4896 cores and analysis every 100 th simulation time step S3D% in<situ% data%movement% in<transit%% in<situ%sta/s/cs% hybrid%sta/s/cs% in<situ%visualiza/on% hybrid%visualiza/on% hybrid%topology%.1%.1%.04%.004%.16% simula/on% 1685% 0% 200% 400% 600% 800% 1000% 1200% 1400% 1600% seconds% Challenges Opportunities Ahead: The Vision of the Exascale Computing Initiative Achieve order operations per second and order bytes of storage Address the next generation of scientific, engineering, and large data problems Enable extreme scale computing: 1,000X capabilities of today's computers with a similar size and power footprint 20 pj per average operation => ~ x40 improvement over today's efficiency Billion-way concurrency Productive system based on marketable technologies Ecosystem for application development and execution, system management Deployed in early 2020s 15

16 Challenges Opportunities Ahead: Validation Workflows First Principles Fusion Code XGC, core- edge turbulence with UQ Ensemble of Models with UQ Model Model Model Model Knowledge Database Synthe/c Diagnos/c Use first principle calcula/ons on HPC to generate more accurate models Visual Analy/cs KSTAR Tokamak diagnos/c diagnos/c diagnos/c diagnos/c diagnos/c diagnos/c Analysis Compara/ve Analy/cs Visualiza/on Experiment mock up Automate the process of these workflows and move work/data to sa/sfy metrics specified by users, and track provenance Generate a DB of knowledge by codes, to use for predic/ve capability during/aker each experiment Challenges Opportunities Ahead: Pervasive Computational Ecosystems 16

17 Summary & Conclusions Complex applications running on high-end systems generate extreme amounts of data that must be managed and analyzed to get insights Data costs (performance, latency, energy) are quickly dominating Traditional data management/analytics pipelines are breaking down Hybrid data staging, in-situ workflow execution, etc. can address this challenges Users to efficiently intertwine applications, libraries, middleware for complex analytics Many challenges; Programming, mapping and scheduling, control and data flow, autonomic runtime management. The DataSpaces project explores solutions at various levels However, many challenges opportunities ahead!! Thank You!! EPSi Edge Physics Simulation Manish Parashar, Ph.D. Rutgers Discovery Informatics Institute (RDI 2 ) Cloud & Autonomic Computing Center (CAC) Rutgers, The State University of New Jersey WWW: rdi2.rutgers.edu WWW: dataspaces.org 17

Big Data Challenges in Simulation-based Science

Big Data Challenges in Simulation-based Science Big Data Challenges in Simulation-based Science Manish Parashar* Rutgers Discovery Informatics Institute (RDI 2 ) Department of Computer Science *Hoang Bui, Tong Jin, Qian Sun, Fan Zhang, ADIOS Team, and

More information

Addressing Big Data Challenges in Simulation-based Science

Addressing Big Data Challenges in Simulation-based Science Addressing Big Data Challenges in Simulation-based Science Manish Parashar* Rutgers Discovery Informatics Institute (RDI2) NSF Center for Cloud and Autonomic Computing (CAC) Department of Electrical and

More information

Big Data Challenges in Simulation-based Science

Big Data Challenges in Simulation-based Science Big Data Challenges in Simulation-based Science Manish Parashar* Rutgers Discovery Informatics Institute (RDI2) NSF Center for Cloud and Autonomic Computing (CAC) Department of Electrical and Computer

More information

Addressing Crosscutting Compute/Data Challenges

Addressing Crosscutting Compute/Data Challenges Rutgers Discovery Informatics Institute (RDI 2 ) New Jersey s Center for Advanced Computation Addressing Crosscutting Compute/Data Challenges Manish Parashar parashar@rutgers.edu Manish Parashar, Rutgers

More information

New Jersey Big Data Alliance

New Jersey Big Data Alliance Rutgers Discovery Informatics Institute (RDI 2 ) New Jersey s Center for Advanced Computation New Jersey Big Data Alliance Manish Parashar Director, Rutgers Discovery Informatics Institute (RDI 2 ) Professor,

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

Building Platform as a Service for Scientific Applications

Building Platform as a Service for Scientific Applications Building Platform as a Service for Scientific Applications Moustafa AbdelBaky moustafa@cac.rutgers.edu Rutgers Discovery Informa=cs Ins=tute (RDI 2 ) The NSF Cloud and Autonomic Compu=ng Center Department

More information

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

More information

Exploring Software Defined Federated Infrastructures for Science

Exploring Software Defined Federated Infrastructures for Science Exploring Software Defined Federated Infrastructures for Science Manish Parashar NSF Cloud and Autonomic Computing Center (CAC) Rutgers Discovery Informatics Institute (RDI 2 ) Rutgers, The State University

More information

PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D. PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD Natasha Balac, Ph.D. Brief History of SDSC 1985-1997: NSF national supercomputer center; managed by General Atomics

More information

Visualization and Data Analysis

Visualization and Data Analysis Working Group Outbrief Visualization and Data Analysis James Ahrens, David Rogers, Becky Springmeyer Eric Brugger, Cyrus Harrison, Laura Monroe, Dino Pavlakos Scott Klasky, Kwan-Liu Ma, Hank Childs LLNL-PRES-481881

More information

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO The Supercomputing Company Supercomputing Big Data Because some great things never change One other thing that hasn t changed. Cray

More information

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory 21 st Century Research Continuum Theory Theory embodied in computation Hypotheses tested through experiment SCIENTIFIC METHODS

More information

Data Semantics Aware Cloud for High Performance Analytics

Data Semantics Aware Cloud for High Performance Analytics Data Semantics Aware Cloud for High Performance Analytics Microsoft Future Cloud Workshop 2011 June 2nd 2011, Prof. Jun Wang, Computer Architecture and Storage System Laboratory (CASS) Acknowledgement

More information

Hank Childs, University of Oregon

Hank Childs, University of Oregon Exascale Analysis & Visualization: Get Ready For a Whole New World Sept. 16, 2015 Hank Childs, University of Oregon Before I forget VisIt: visualization and analysis for very big data DOE Workshop for

More information

Big Data Management in the Clouds and HPC Systems

Big Data Management in the Clouds and HPC Systems Big Data Management in the Clouds and HPC Systems Hemera Final Evaluation Paris 17 th December 2014 Shadi Ibrahim Shadi.ibrahim@inria.fr Era of Big Data! Source: CNRS Magazine 2013 2 Era of Big Data! Source:

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

Data-Intensive Science and Scientific Data Infrastructure

Data-Intensive Science and Scientific Data Infrastructure Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

HPC & Visualization. Visualization and High-Performance Computing

HPC & Visualization. Visualization and High-Performance Computing HPC & Visualization Visualization and High-Performance Computing Visualization is a critical step in gaining in-depth insight into research problems, empowering understanding that is not possible with

More information

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps Yu Su, Yi Wang, Gagan Agrawal The Ohio State University Motivation HPC Trends Huge performance gap CPU: extremely fast for generating

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

Big Data and Scientific Discovery

Big Data and Scientific Discovery Big Data and Scientific Discovery Bill Harrod Office of Science William.Harrod@science.doe.gov! February 26, 2014! Big Data and Scien*fic Discovery Next genera*on scien*fic breakthroughs require: Major

More information

BSC vision on Big Data and extreme scale computing

BSC vision on Big Data and extreme scale computing BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,

More information

EOFS Workshop Paris Sept, 2011. Lustre at exascale. Eric Barton. CTO Whamcloud, Inc. eeb@whamcloud.com. 2011 Whamcloud, Inc.

EOFS Workshop Paris Sept, 2011. Lustre at exascale. Eric Barton. CTO Whamcloud, Inc. eeb@whamcloud.com. 2011 Whamcloud, Inc. EOFS Workshop Paris Sept, 2011 Lustre at exascale Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com Agenda Forces at work in exascale I/O Technology drivers I/O requirements Software engineering issues

More information

Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

More information

Towards Scalable Visualization Plugins for Data Staging Workflows

Towards Scalable Visualization Plugins for Data Staging Workflows Towards Scalable Visualization Plugins for Data Staging Workflows David Pugmire Oak Ridge National Laboratory Oak Ridge, Tennessee James Kress University of Oregon Eugene, Oregon Jeremy Meredith, Norbert

More information

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University plale@cs.indiana.edu LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August

More information

HPC technology and future architecture

HPC technology and future architecture HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr

More information

Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research

Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Trends: Data on an Exponential Scale Scientific data doubles every year Combination of inexpensive sensors + exponentially

More information

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University

More information

Jean-Pierre Panziera Teratec 2011

Jean-Pierre Panziera Teratec 2011 Technologies for the future HPC systems Jean-Pierre Panziera Teratec 2011 3 petaflop systems : TERA 100, CURIE & IFERC Tera100 Curie IFERC 1.25 PetaFlops 256 TB ory 30 PB disk storage 140 000+ Xeon cores

More information

Solvers, Algorithms and Libraries (SAL)

Solvers, Algorithms and Libraries (SAL) Solvers, Algorithms and Libraries (SAL) D. Knoll (LANL), J. Wohlbier (LANL), M. Heroux (SNL), R. Hoekstra (SNL), S. Pautz (SNL), R. Hornung (LLNL), U. Yang (LLNL), P. Hovland (ANL), R. Mills (ORNL), S.

More information

Data Requirements from NERSC Requirements Reviews

Data Requirements from NERSC Requirements Reviews Data Requirements from NERSC Requirements Reviews Richard Gerber and Katherine Yelick Lawrence Berkeley National Laboratory Summary Department of Energy Scientists represented by the NERSC user community

More information

Mission Need Statement for the Next Generation High Performance Production Computing System Project (NERSC-8)

Mission Need Statement for the Next Generation High Performance Production Computing System Project (NERSC-8) Mission Need Statement for the Next Generation High Performance Production Computing System Project () (Non-major acquisition project) Office of Advanced Scientific Computing Research Office of Science

More information

The Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): A Vision for Large-Scale Climate Data

The Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): A Vision for Large-Scale Climate Data The Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): A Vision for Large-Scale Climate Data Lawrence Livermore National Laboratory? Hank Childs (LBNL) and Charles Doutriaux (LLNL) September

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Use Cases for Large Memory Appliance/Burst Buffer

Use Cases for Large Memory Appliance/Burst Buffer Use Cases for Large Memory Appliance/Burst Buffer Rob Neely Bert Still Ian Karlin Adam Bertsch LLNL-PRES- 648613 This work was performed under the auspices of the U.S. Department of Energy by under contract

More information

Cluster, Grid, Cloud Concepts

Cluster, Grid, Cloud Concepts Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of

More information

In situ data analysis and I/O acceleration of FLASH astrophysics simulation on leadership-class system using GLEAN

In situ data analysis and I/O acceleration of FLASH astrophysics simulation on leadership-class system using GLEAN In situ data analysis and I/O acceleration of FLASH astrophysics simulation on leadership-class system using GLEAN Venkatram Vishwanath 1, Mark Hereld 1, Michael E. Papka 1, Randy Hudson 2, G. Cal Jordan

More information

HPC Trends and Directions in the Earth Sciences Per Nyberg nyberg@cray.com Sr. Director, Business Development

HPC Trends and Directions in the Earth Sciences Per Nyberg nyberg@cray.com Sr. Director, Business Development HPC Trends and Directions in the Earth Sciences Per Nyberg nyberg@cray.com Sr. Director, Business Development Cray Solutions for the Earth Sciences Why Cray? Cray solutions support: Range of modeling capabilities

More information

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007 Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Big Data Mining Services and Knowledge Discovery Applications on Clouds Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades

More information

Trends in High-Performance Computing for Power Grid Applications

Trends in High-Performance Computing for Power Grid Applications Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views

More information

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010 Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,

More information

Big Data. George O. Strawn NITRD

Big Data. George O. Strawn NITRD Big Data George O. Strawn NITRD Caveat auditor The opinions expressed in this talk are those of the speaker, not the U.S. government Outline What is Big Data? NITRD's Big Data Research Initiative Big Data

More information

Appendix 1 ExaRD Detailed Technical Descriptions

Appendix 1 ExaRD Detailed Technical Descriptions Appendix 1 ExaRD Detailed Technical Descriptions Contents 1 Application Foundations... 3 1.1 Co- Design... 3 1.2 Applied Mathematics... 5 1.3 Data Analytics and Visualization... 8 2 User Experiences...

More information

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) Goal Develop and deploy comprehensive, integrated, sustainable, and secure cyberinfrastructure (CI) to accelerate research

More information

BMW11: Dealing with the Massive Data Generated by Many-Core Systems. Dr Don Grice. 2011 IBM Corporation

BMW11: Dealing with the Massive Data Generated by Many-Core Systems. Dr Don Grice. 2011 IBM Corporation BMW11: Dealing with the Massive Data Generated by Many-Core Systems Dr Don Grice IBM Systems and Technology Group Title: Dealing with the Massive Data Generated by Many Core Systems. Abstract: Multi-core

More information

Beyond Embarrassingly Parallel Big Data. William Gropp www.cs.illinois.edu/~wgropp

Beyond Embarrassingly Parallel Big Data. William Gropp www.cs.illinois.edu/~wgropp Beyond Embarrassingly Parallel Big Data William Gropp www.cs.illinois.edu/~wgropp Messages Big is big Data driven is an important area, but not all data driven problems are big data (despite current hype).

More information

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems November 3-6, 1999 in Cambridge Massachusetts, USA Characterizing the Performance of Dynamic Distribution

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

MCA Standards For Closely Distributed Multicore

MCA Standards For Closely Distributed Multicore MCA Standards For Closely Distributed Multicore Sven Brehmer Multicore Association, cofounder, board member, and MCAPI WG Chair CEO of PolyCore Software 2 Embedded Systems Spans the computing industry

More information

Autonomics, Cyberinfrastructure Federation, and Software-Defined Environments for Science

Autonomics, Cyberinfrastructure Federation, and Software-Defined Environments for Science Autonomics, Cyberinfrastructure Federation, and Software-Defined Environments for Science Manish Parashar NSF Cloud and Autonomic Computing Center (CAC) Rutgers Discovery Informatics Institute (RDI 2 )

More information

Big Data and Cloud Computing for GHRSST

Big Data and Cloud Computing for GHRSST Big Data and Cloud Computing for GHRSST Jean-Francois Piollé (jfpiolle@ifremer.fr) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge

More information

Data Storage and Data Transfer. Dan Hitchcock Acting Associate Director, Advanced Scientific Computing Research

Data Storage and Data Transfer. Dan Hitchcock Acting Associate Director, Advanced Scientific Computing Research Data Storage and Data Transfer Dan Hitchcock Acting Associate Director, Advanced Scientific Computing Research Office of Science Organization 2 Data to Enable Scientific Discovery DOE SC Investments in

More information

IBM Deep Computing Visualization Offering

IBM Deep Computing Visualization Offering P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: parijatsharma@in.ibm.com Summary Deep Computing Visualization in Oil & Gas

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

High Performance Computing OpenStack Options. September 22, 2015

High Performance Computing OpenStack Options. September 22, 2015 High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,

More information

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel Big Data and Analytics: A Conceptual Overview Mike Park Erik Hoel In this technical workshop This presentation is for anyone that uses ArcGIS and is interested in analyzing large amounts of data We will

More information

Performance Monitoring of Parallel Scientific Applications

Performance Monitoring of Parallel Scientific Applications Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure

More information

MapReduce and Hadoop Distributed File System V I J A Y R A O

MapReduce and Hadoop Distributed File System V I J A Y R A O MapReduce and Hadoop Distributed File System 1 V I J A Y R A O The Context: Big-data Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) Google collects 270PB data in a month (2007), 20000PB

More information

NA-ASC-128R-14-VOL.2-PN

NA-ASC-128R-14-VOL.2-PN L L N L - X X X X - X X X X X Advanced Simulation and Computing FY15 19 Program Notebook for Advanced Technology Development & Mitigation, Computational Systems and Software Environment and Facility Operations

More information

for my computation? Stefano Cozzini Which infrastructure Which infrastructure Democrito and SISSA/eLAB - Trieste

for my computation? Stefano Cozzini Which infrastructure Which infrastructure Democrito and SISSA/eLAB - Trieste Which infrastructure Which infrastructure for my computation? Stefano Cozzini Democrito and SISSA/eLAB - Trieste Agenda Introduction:! E-infrastructure and computing infrastructures! What is available

More information

IBM WebSphere Distributed Caching Products

IBM WebSphere Distributed Caching Products extreme Scale, DataPower XC10 IBM Distributed Caching Products IBM extreme Scale v 7.1 and DataPower XC10 Appliance Highlights A powerful, scalable, elastic inmemory grid for your business-critical applications

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India 3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India Call for Papers Cloud computing has emerged as a de facto computing

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015

In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015 In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015 June 29-30, 2015 Contacts Alexandre Boudnik Senior Solution Architect, EPAM Systems Alexandre_Boudnik@epam.com

More information

HPC Programming Framework Research Team

HPC Programming Framework Research Team HPC Programming Framework Research Team 1. Team Members Naoya Maruyama (Team Leader) Motohiko Matsuda (Research Scientist) Soichiro Suzuki (Technical Staff) Mohamed Wahib (Postdoctoral Researcher) Shinichiro

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Application of Grid-Enabled Technologies for Solving Optimization Problems in Data Driven Reservoir Systems

Application of Grid-Enabled Technologies for Solving Optimization Problems in Data Driven Reservoir Systems Application of Grid-Enabled Technologies for Solving Optimization Problems in Data Driven Reservoir Systems M. Parashar, H. Klie, U. Catalyurek, T. Kurc, V. Matossian, J. Saltz, M.F. Wheeler ITR Collaborators

More information

Parallel Analysis and Visualization on Cray Compute Node Linux

Parallel Analysis and Visualization on Cray Compute Node Linux Parallel Analysis and Visualization on Cray Compute Node Linux David Pugmire, Oak Ridge National Laboratory and Hank Childs, Lawrence Livermore National Laboratory and Sean Ahern, Oak Ridge National Laboratory

More information

NITRD and Big Data. George O. Strawn NITRD

NITRD and Big Data. George O. Strawn NITRD NITRD and Big Data George O. Strawn NITRD Caveat auditor The opinions expressed in this talk are those of the speaker, not the U.S. government Outline What is Big Data? Who is NITRD? NITRD's Big Data Research

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

Kriterien für ein PetaFlop System

Kriterien für ein PetaFlop System Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working

More information

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory June 2010 Highlights First Petaflop Supercomputer

More information

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India 1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto

More information

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell R&D Manager, Scalable System So#ware Department Sandia National Laboratories is a multi-program laboratory managed and

More information

Putting Eggs in Many Baskets: Data Considerations in the Cloud. Rob Futrick, CTO

Putting Eggs in Many Baskets: Data Considerations in the Cloud. Rob Futrick, CTO Putting Eggs in Many Baskets: Data Considerations in the Cloud Rob Futrick, CTO We believe utility access to technical computing power accelerates discovery & invention The Innovation Bottleneck: Scientists/Engineers

More information

Elements of Scalable Data Analysis and Visualization

Elements of Scalable Data Analysis and Visualization Elements of Scalable Data Analysis and Visualization www.ultravis.org DOE CGF Petascale Computing Session Tom Peterka tpeterka@mcs.anl.gov Mathematics and Computer Science Division 0. Preface - Science

More information

Conquering the Astronomical Data Flood through Machine

Conquering the Astronomical Data Flood through Machine Conquering the Astronomical Data Flood through Machine Learning and Citizen Science Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ The Problem:

More information

Towards Analytical Data Management for Numerical Simulations

Towards Analytical Data Management for Numerical Simulations Towards Analytical Data Management for Numerical Simulations Ramon G. Costa, Fábio Porto, Bruno Schulze {ramongc, fporto, schulze}@lncc.br National Laboratory for Scientific Computing - RJ, Brazil Abstract.

More information

HPCOR 2014. Data Management Policies Co-Chairs: Bill Allcock, Julia White. HPCOR 2014, June 18-19, Oakland, CA 1

HPCOR 2014. Data Management Policies Co-Chairs: Bill Allcock, Julia White. HPCOR 2014, June 18-19, Oakland, CA 1 HPCOR 2014 Data Management Policies Co-Chairs: Bill Allcock, Julia White HPCOR 2014, June 18-19, Oakland, CA 1 Contributors Bill Allcock, ANL Sasha Ames, LLNL Ashley Barker, ORNL Laura Biven, DOE Jeff

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information

Data Grids. Lidan Wang April 5, 2007

Data Grids. Lidan Wang April 5, 2007 Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural

More information

Doctor of Philosophy in Computer Science

Doctor of Philosophy in Computer Science Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects

More information

LSKA 2010 Survey Report Job Scheduler

LSKA 2010 Survey Report Job Scheduler LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010 1. Motivation Recently, the computing becomes much more complex. However,

More information

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver 1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution

More information

SciDAC Petascale Data Storage Institute

SciDAC Petascale Data Storage Institute SciDAC Petascale Data Storage Institute Advanced Scientific Computing Advisory Committee Meeting October 29 2008, Gaithersburg MD Garth Gibson Carnegie Mellon University and Panasas Inc. SciDAC Petascale

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

HPC data becomes Big Data. Peter Braam peter.braam@braamresearch.com

HPC data becomes Big Data. Peter Braam peter.braam@braamresearch.com HPC data becomes Big Data Peter Braam peter.braam@braamresearch.com me 1983-2000 Academia Maths & Computer Science Entrepreneur with startups (5x) 4 startups sold Lustre emerged Held executive jobs with

More information

Make the Most of Big Data to Drive Innovation Through Reseach

Make the Most of Big Data to Drive Innovation Through Reseach White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability

More information

On the Role of Indexing for Big Data in Scientific Domains

On the Role of Indexing for Big Data in Scientific Domains On the Role of Indexing for Big Data in Scientific Domains Arie Shoshani Lawrence Berkeley National Lab BIGDATA and EXTREME-SCALE COMPUTING April 3-May, 23 Outline q Examples of indexing needs in scientific

More information

Data Deduplication in a Hybrid Architecture for Improving Write Performance

Data Deduplication in a Hybrid Architecture for Improving Write Performance Data Deduplication in a Hybrid Architecture for Improving Write Performance Data-intensive Salable Computing Laboratory Department of Computer Science Texas Tech University Lubbock, Texas June 10th, 2013

More information