Data-centric Renovation of Scientific Workflow in the Age of Big Data
|
|
- Maurice Glenn
- 8 years ago
- Views:
Transcription
1 Data-centric Renovation of Scientific Workflow in the Age of Big Data Ryong Lee, Ph. D. Dept. of Scientific Big Data Research Korea Institute of Science and Technology Information Korea
2 Outline Overview Scientific Data Projects in KISTI On-going Work: Data-centric Renovation of Scientific Workflow for Oceanographers Towards Further Scientific Data Customization Services Conclusions and Future Work
3 Scientific Data Projects in KISTI Towards an Advanced Foundation for Data-Intensive Scientific Research National Scientific Data Governance Scientific Data Mgt. and Sharing Scientific Data Analytic Platforms Design and Development of Scientific Data Governance Systems Establishment of International Co-operative Network Development and Distribution of Scientific Data Management and Sharing Platforms R&Ds of Advanced Technologies for Big Data based Scientific Work Supercomputer Storage HPC Network
4 A Novel Data Service specialized in Scientific Big Data Customization Motivated by Oceanographers exploration and analyses with high-resolution and long-term remote sensing data from Satellites Collaborating with KIOST(Korea Institute of Ocean Science & Technology) and KOPRI(Korea Polar Research Institute) in research projects on climate change and red-tide analysis/detection Climate Change Analysis Red-tide Analysis/Detection Supporting Data-intensive Scientific Analysis Tasks Remote Sensing Data taken by Satellites are overwhelming their processing capabilities Customizing the big data for analytics is an growing strong demand
5 (Korea Polar Research Institute) Case Study 1: Exploring the effects of climate changes Non-linear Relationship between Biology and Environmental Changes For better estimation, Multiple remote sensing data need to be compared Unprecedented scale of (Hundreds of TBs) Array-based remote sensing data (via satellites) should be handled efficiently
6 Case Study 2: Red-tide Analysis/Prediction (Korea Institute of Ocean Science & Technology 2002~2013 Red-Tide in South Coast of Korea: A digitized data from hand-made estimation Damage (KRW) Red-tides cause serious economic damages quite unexpectedly Long-term and high-res. remote sensing data should be examined intensively
7 Enhancing Remote Data Handling Capability High Res. & Global Range High Res. A Novel Challenge: Overcoming the practical limits for Handling Global & High-Res. Remote Sensing Big Data Untouchable Domain 250m Scientists Practice: Boundary of Computational Limit in most small-scale science Labs. 500m Low Res. 1000m Local Global 7
8 Realizing Oceanographers Dream to equip with better working env. Renovation of Remote Sensing Big-Data Customization Process As-Is: Require ments MODIS Data Transform (L1Aà..à L3 SMI) File-based Data Analysis (L3 Level) To-Be: Customized Data Transform As-Is: MODIS Data Transform (L1Aà..à L2 ) To-Be: Slow Array-based Data Management (Global Area, 1km) File-based Data Analysis (L2 Level) (Local Area, 1/0.5/0.25km) Customized Data Transform Array-based Data Analysis Slow Array-based Data Management Customized Data Transform Array-based Data Management SciDB based Data Mgt. Base Array Derived Arrays Platform Design (KISTI) Bottleneck A2 A1 L3SMI in SciDB MODIS L2 A3 UDF Extension f(a2, A3) MODIS L3 BIN MODIS L3 SMI Selective Loading Array Manipulation & Computation Complex Analysis in SciDB Goal: Global Climate Change Analysis Goal: Red-tide Analysis/Detection User Environment for Scientific Data Analytics UI for Customizing Data Transform Functions - Transform Customization - Monitoring of Transform - Loading into Array DBMS UI for Array-based A nalysis Functions - SciDB Viewer - Array Data Manipulation - Array Data Provenance UDF f(a1, A2) Parallel Com puting Quick and Smart!! Array-based Data Analysis MODIS L1A MODIS L1B Quick and Smart!! Data Visualization Functions - SciDB Array Visualization - Display of Satellite Image - Big Data Summarization
9 A-DISC (Array-based Data-Intensive Scientific Computing Platform) What We Pursue: To support a variety of Scientists' Discovery with Big Data in their friendly manner What We Enable Array-based Data-Intensive Scientific Computing Platform (A-DISC) Scalable Versatile Scientists-friendly How We Approach Big Data Preprocessing on Parallelism Scientific Data Customization Interactive Scientific Data Workspace What We Conduct HPC-based MODIS Satellite Remote Sensing Data Translation Array DBMS based Scientific Data Analysis Support Scientific Understanding Global Ecology Analysis BigSat-Converter Scientific Data Workspace Red-tide Analysis and Detection
10 Renovate the Workflow: HPC* and Array-based Data Processing User User User Customization Transforming on Parallelism Loading into Array DBMS Customizing Options Big Data Store (Transformed Data Mgt.) Array-based Data Analytics Web-based Interactive Big Data Analytics Customized Data Transform Parallel and Distributed Computing (SGE) Big Data Store User-customized Data, ready for Analysis HDF-to-Array Transform & Load (L1A, L1B, L2, L3) s n a r T Fast form NASA MODIS Aqua * HPC (High-Performance Computing) Array Data Manipulation & Analysis Scientific Data on Array DBMS
11 BigSat-Converter A Server-Client System for Customized Transform of Remote Sensing Big Data Currently, MODIS Aqua/Terrain L1A, L2, RGB, L3 converting on a HPC cluster Planning to support various remote sensing data customizations
12 Enhancing the Power of Data Transformation with HPC User Administrator Customization: Specification of Region/Period/Product/etc. User Customization Administration: Managing the utilization of Computing and Storage Resources Transforming on Parallelism Customizing Options Master Parallel Computing (Sun Grid Engine) 10G Infiniband Big Data Store (Transformed Data Mgt.) Workers HDF-formatted Data Big Data Store (L1A, L1B, L2, L3) NAS Storage 15 TB Format NTFS CPU User-customized Data, ready for Analysis Master & 9 Worker Nodes Memory Storage 2260 MHz, 8 Core 18 GB 250GB 12
13 Significant Benefits of HPC in Remote Sensing Data Transform L1B L2 Total Single 60m 15s 74m 12s 134m 27s BigSat-Con verter 7m 18s 10m 25s 17m 43s Speed-Up 8.25X 7.12X 7.59X Satellite images of Korean peninsula Improvement of Process Improvement of Data model Legacy Code Re-designed Code Performance Improvement: 7.59 times faster 1 month 4 days
14 Array-based Remote Sensing Data Management and Manipulation L1B RGB Array - Dimension: longitude x latitude x time ( x x 3 days ) - Attributes: <Red, Green, Blue> Swath - 1 swath (in RGB), every 5 mins - Image size : 136 x day: 288 swaths L3 SMI in SciDB Jan. 1-31st, 2014 Dimension: (lon. x lat x time) 4320 x 2160 x 31 (days) Attribute: <chlorophyll>
15 Scientific Data Workspace An integrated workspace for scientific data manipulations and analyses Array data on clusters are accessible easily on a graphic user interface R-based analytical programming is supported on a HPC cluster
16 Applied to the field for Red-Tide Analysis A domestic news article regarding to our achievement with KIOST on the red-tide analysis issue A screenshot of our system application
17 Summary: How we enhanced Scientists Capabilities to Big Data Customization of Remote Sensing Big Data Increasing public access to scientific data of huge volume Scientists data requests are various, not being satisfied by the forms given from data publishers Customization service will become important more and more Array-based Data Manipulation and Analysis Scientific data are often stored in a formatted file (HDF, etc.) Adopting Array DBMS Technology can boost the processing Data loading into Array DBMS (for R based analysis) Finally, letting scientists work much easily with Array DBMS
18 Towards Further Scientific Data Customization Services Customization Platforms for Utilizing Scientific Big Data are growing demands from Public Requests for resolving many Natural and Social Problems Enhancing and Boosting Scientific Big Data Processes are an interdisciplinary work to understand domain knowledge for developing unprecedented technology and systems It s not a simple combination of various IT technologies, but rather a well-crafted artwork which should work as part of our society, constructing an elementary foundation for the whole organic integration of various systems
19 Conclusions and Future Work At the moment, we are conducting R&Ds of Scientific Data Customization Service focusing on remote sensing data transform for practical requests We will continue to enhance the developing system for various scientific data as well as remote sensing data towards generic scientific customization services We are always soliciting international cooperation for sharing further R&D issues and valuable experience on common interests Contact Me: Ryong Lee, Ph.D. ( ryonglee@kisti.re.kr )
20 Thank you very much for your kind attention!!
Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research
Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Trends: Data on an Exponential Scale Scientific data doubles every year Combination of inexpensive sensors + exponentially
More informationIn-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps Yu Su, Yi Wang, Gagan Agrawal The Ohio State University Motivation HPC Trends Huge performance gap CPU: extremely fast for generating
More informationICSTI 2014 General Assembly October 18-19, 2014
ICSTI 2014 General Assembly October 18-19, 2014 TACC Workshop Sunday, October 19 th, 2014 Enhancing Discoverability and Accessibility of Scientific and Technical Research Information and Data The TACC
More informationData-Intensive Science and Scientific Data Infrastructure
Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific
More informationQuick Reference Selling Guide for Intel Lustre Solutions Overview
Overview The 30 Second Pitch Intel Solutions for Lustre* solutions Deliver sustained storage performance needed that accelerate breakthrough innovations and deliver smarter, data-driven decisions for enterprise
More informationWith DDN Big Data Storage
DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big
More informationBig Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades
More informationGEOGRAPHIC INFORMATION SYSTEMS
GEOGRAPHIC INFORMATION SYSTEMS WHAT IS A GEOGRAPHIC INFORMATION SYSTEM? A geographic information system (GIS) is a computer-based tool for mapping and analyzing spatial data. GIS technology integrates
More informationIS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration
IS-ENES/PrACE Meeting EC-EARTH 3 A High-resolution Configuration Motivation Generate a high-resolution configuration of EC-EARTH to Prepare studies of high-resolution ESM in climate mode Prove and improve
More informationCYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) Goal Develop and deploy comprehensive, integrated, sustainable, and secure cyberinfrastructure (CI) to accelerate research
More informationCluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
More informationHPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
More information1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India
1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto
More informationOn a Hadoop-based Analytics Service System
Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology
More information3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India
3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India Call for Papers Cloud computing has emerged as a de facto computing
More informationParallel Large-Scale Visualization
Parallel Large-Scale Visualization Aaron Birkland Cornell Center for Advanced Computing Data Analysis on Ranger January 2012 Parallel Visualization Why? Performance Processing may be too slow on one CPU
More informationBig Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?
More informationKorea Industrial Supercomputing 2013. Oct. 23, 2013. Sang Min Lee, Ph.D. Hyungwook Park KISTI National Institute of Supercomputing and Networking
Korea Industrial Supercomputing 2013 Oct. 23, 2013 Sang Min Lee, Ph.D. Hyungwook Park KISTI National Institute of Supercomputing and Networking HPC Support for SME Blue ocean-type technology and product
More informationHigh-Performance Visualization of Geographic Data
High-Performance Visualization of Geographic Data Presented by Budhendra Bhaduri Alexandre Sorokine Geographic Information Science and Technology Computational Sciences and Engineering Managed by UT-Battelle
More informationData Centric Interactive Visualization of Very Large Data
Data Centric Interactive Visualization of Very Large Data Bruce D Amora, Senior Technical Staff Gordon Fossum, Advisory Engineer IBM T.J. Watson Research/Data Centric Systems #OpenPOWERSummit Data Centric
More informationNASA s Big Data Challenges in Climate Science
NASA s Big Data Challenges in Climate Science Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at IEEE Big Data 2014 Workshop October 29, 2014 1 2 7-km GEOS-5 Nature Run
More informationMake the Most of Big Data to Drive Innovation Through Reseach
White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability
More informationPanasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory
Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory June 2010 Highlights First Petaflop Supercomputer
More informationHPC technology and future architecture
HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr
More informationThe Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland
The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which
More informationOptimized Hadoop for Enterprise
Optimized Hadoop for Enterprise Smart Big data Platform provides Reliability, Security, and Ease of Use + Big Data, Valuable Resource for Forecasting the Future of Businesses + Offers integrated and end-to-end
More informationHow To Build A Cloud Computer
Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology
More informationManaging Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery
Center for Information Services and High Performance Computing (ZIH) Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Richard Grunzke*, Jens Krüger, Sandra Gesing, Sonja
More information2015 Workshops for Professors
SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market
More informationPARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationData Requirements from NERSC Requirements Reviews
Data Requirements from NERSC Requirements Reviews Richard Gerber and Katherine Yelick Lawrence Berkeley National Laboratory Summary Department of Energy Scientists represented by the NERSC user community
More informationSGI HPC Systems Help Fuel Manufacturing Rebirth
SGI HPC Systems Help Fuel Manufacturing Rebirth Created by T A B L E O F C O N T E N T S 1.0 Introduction 1 2.0 Ongoing Challenges 1 3.0 Meeting the Challenge 2 4.0 SGI Solution Environment and CAE Applications
More informationClusters: Mainstream Technology for CAE
Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux
More informationHow In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
More informationDennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research
Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research 2 Cloud Concepts Data Center Architecture The cloud flavors: IaaS, PaaS, SaaS Our world of client devices plus the cloud
More informationSilviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)
Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Outline Introduction EO challenges; EO and classical/cloud computing; EO Services The computing platform Cluster -> Grid -> Cloud
More informationRemote Graphical Visualization of Large Interactive Spatial Data
Remote Graphical Visualization of Large Interactive Spatial Data ComplexHPC Spring School 2011 International ComplexHPC Challenge Cristinel Mihai Mocan Computer Science Department Technical University
More informationLSKA 2010 Survey Report Job Scheduler
LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010 1. Motivation Recently, the computing becomes much more complex. However,
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationBig data analy+cs for global change monitoring and research in forestry and agriculture. Lubia Vinhas
Big data analy+cs for global change monitoring and research in forestry and agriculture Lubia Vinhas Earth observa+on satellites and geosensor webs provide key informa+on about global change but that informa+on
More informationIntroduction of KISTI and NISN Resource and Services Bioinformatics applications Conclusion
Introduction of KISTI and NISN Resource and Services Bioinformatics applications Conclusion President National Nano-Technology Policy Center National Institute of Supercomputing and Networking Div. of
More informationScala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
More informationTUT NoSQL Seminar (Oracle) Big Data
Timo Raitalaakso +358 40 848 0148 rafu@solita.fi TUT NoSQL Seminar (Oracle) Big Data 11.12.2012 Timo Raitalaakso MSc 2000 Work: Solita since 2001 Senior Database Specialist Oracle ACE 2012 Blog: http://rafudb.blogspot.com
More informationTowards Analytical Data Management for Numerical Simulations
Towards Analytical Data Management for Numerical Simulations Ramon G. Costa, Fábio Porto, Bruno Schulze {ramongc, fporto, schulze}@lncc.br National Laboratory for Scientific Computing - RJ, Brazil Abstract.
More informationA SciDB-based Framework for Efficient Satellite Data Storage and Query based on Dynamic Atmospheric Event Trajectory
A SciDB-based Framework for Efficient Satellite Data Storage and Query based on Dynamic Atmospheric Event Trajectory ABSTRACT Luboš Krčál Nanyang Technological University, Singapore Czech Technical University
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationPerformance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
More informationNASA's Strategy and Activities in Server Side Analytics
NASA's Strategy and Activities in Server Side Analytics Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at the ESGF/UVCDAT Conference Lawrence Livermore National Laboratory
More informationHPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014
HPC Cluster Decisions and ANSYS Configuration Best Practices Diana Collier Lead Systems Support Specialist Houston UGM May 2014 1 Agenda Introduction Lead Systems Support Specialist Cluster Decisions Job
More informationDutch HPC Cloud: flexible HPC for high productivity in science & business
Dutch HPC Cloud: flexible HPC for high productivity in science & business Dr. Axel Berg SARA national HPC & e-science Support Center, Amsterdam, NL April 17, 2012 4 th PRACE Executive Industrial Seminar,
More informationMulticore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
More informationHow to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.
How to Ingest Data into Google BigQuery using Talend for Big Data A Technical Solution Paper from Saama Technologies, Inc. July 30, 2013 Table of Contents Intended Audience What you will Learn Background
More informationOn-Demand Supercomputing Multiplies the Possibilities
Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Image courtesy of Wolfram Research, Inc. On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server
More informationHPC & Visualization. Visualization and High-Performance Computing
HPC & Visualization Visualization and High-Performance Computing Visualization is a critical step in gaining in-depth insight into research problems, empowering understanding that is not possible with
More informationHigh Performance Computing
High Parallel Computing Hybrid Program Coding Heterogeneous Program Coding Heterogeneous Parallel Coding Hybrid Parallel Coding High Performance Computing Highly Proficient Coding Highly Parallelized Code
More informationHue Streams. Seismic Compression Technology. Years of my life were wasted waiting for data loading and copying
Hue Streams Seismic Compression Technology Hue Streams real-time seismic compression results in a massive reduction in storage utilization and significant time savings for all seismic-consuming workflows.
More informationData Semantics Aware Cloud for High Performance Analytics
Data Semantics Aware Cloud for High Performance Analytics Microsoft Future Cloud Workshop 2011 June 2nd 2011, Prof. Jun Wang, Computer Architecture and Storage System Laboratory (CASS) Acknowledgement
More informationOutline. What is Big data and where they come from? How we deal with Big data?
What is Big Data Outline What is Big data and where they come from? How we deal with Big data? Big Data Everywhere! As a human, we generate a lot of data during our everyday activity. When you buy something,
More informationVisualization @ SUN. Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems
Visualization @ SUN Shared Visualization 1.1 Software Scalable Visualization 1.1 Solutions Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems The Data Tsunami Visualization is
More informationIT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez
IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationNews and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
More informationComputational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar
Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working
More informationAchieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
More informationAdobe Insight, powered by Omniture
Adobe Insight, powered by Omniture Accelerating government intelligence to the speed of thought 1 Challenges that analysts face 2 Analysis tools and functionality 3 Adobe Insight 4 Summary Never before
More informationHigh Performance Computing Cloud Offerings from IBM Technical Computing IBM Redbooks Solution Guide
High Performance Computing Cloud Offerings from IBM Technical Computing IBM Redbooks Solution Guide The extraordinary demands that engineering, scientific, and research organizations place upon big data
More informationStatistical Analysis and Visualization for Cyber Security
Statistical Analysis and Visualization for Cyber Security Joanne Wendelberger, Scott Vander Wiel Statistical Sciences Group, CCS-6 Los Alamos National Laboratory Quality and Productivity Research Conference
More informationIn this issue of CG&A, researchers share their
Editor: Theresa-Marie Rhyne The Top 10 Challenges in Extreme-Scale Visual Analytics Pak Chung Wong Pacific Northwest National Laboratory Han-Wei Shen Ohio State University Christopher R. Johnson University
More informationGeneral Parallel File System (GPFS) Native RAID For 100,000-Disk Petascale Systems
General Parallel File System (GPFS) Native RAID For 100,000-Disk Petascale Systems Veera Deenadhayalan IBM Almaden Research Center 2011 IBM Corporation Hard Disk Rates Are Lagging There have been recent
More informationJeff Wolf Deputy Director HPC Innovation Center
Public Presentation for Blue Gene Consortium Nov. 19, 2013 www.hpcinnovationcenter.com Jeff Wolf Deputy Director HPC Innovation Center This work was performed under the auspices of the U.S. Department
More informationBig Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park
Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable
More informationBig Data Means at Least Three Different Things. Michael Stonebraker
Big Data Means at Least Three Different Things. Michael Stonebraker The Meaning of Big Data - 3 V s Big Volume With simple (SQL) analytics With complex (non-sql) analytics Big Velocity Drink from a fire
More informationWindows Server 2012 授 權 說 明
Windows Server 2012 授 權 說 明 PROCESSOR + CAL HA 功 能 相 同 的 記 憶 體 及 處 理 器 容 量 虛 擬 化 Windows Server 2008 R2 Datacenter Price: NTD173,720 (2 CPU) Packaging All features Unlimited virtual instances Per processor
More informationBringing the Cloud Underground: Lessons for Bringing the Next IT Revolution to Geoscience
Bringing the Cloud Underground: Lessons for Bringing the Next IT Revolution to Geoscience Grant Sanden and Yannai Segal Enersoft Inc. Summary This article describes the emerging technology of cloud computing
More informationEvoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca
Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q
More informationSupercomputing on Windows. Microsoft (Thailand) Limited
Supercomputing on Windows Microsoft (Thailand) Limited W hat D efines S upercom puting A lso called High Performance Computing (HPC) Technical Computing Cutting edge problems in science, engineering and
More informationEarly Cloud Experiences with the Kepler Scientific Workflow System
Available online at www.sciencedirect.com Procedia Computer Science 9 (2012 ) 1630 1634 International Conference on Computational Science, ICCS 2012 Early Cloud Experiences with the Kepler Scientific Workflow
More informationA Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent
More informationBig Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel
Big Data and Analytics: A Conceptual Overview Mike Park Erik Hoel In this technical workshop This presentation is for anyone that uses ArcGIS and is interested in analyzing large amounts of data We will
More informationSimple Introduction to Clusters
Simple Introduction to Clusters Cluster Concepts Cluster is a widely used term meaning independent computers combined into a unified system through software and networking. At the most fundamental level,
More informationEMBL Identity & Access Management
EMBL Identity & Access Management Rupert Lück EMBL Heidelberg e IRG Workshop Zürich Apr 24th 2008 Outline EMBL Overview Identity & Access Management for EMBL IT Requirements & Strategy Project Goal and
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationData Movement and Storage. Drew Dolgert and previous contributors
Data Movement and Storage Drew Dolgert and previous contributors Data Intensive Computing Location Viewing Manipulation Storage Movement Sharing Interpretation $HOME $WORK $SCRATCH 72 is a Lot, Right?
More informationAutomating Big Data Benchmarking for Different Architectures with ALOJA
www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationBuilding a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
More informationHadoop & SAS Data Loader for Hadoop
Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle
More informationbwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 24.
bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 24. November 2010 Richling/Kredel (URZ/RUM) bwgrid Treff WS 2010/2011 1 / 17 Course
More informationXSEDE Data Analytics Use Cases
XSEDE Data Analytics Use Cases 14th Jun 2013 Version 0.3 XSEDE Data Analytics Use Cases Page 1 Table of Contents A. Document History B. Document Scope C. Data Analytics Use Cases XSEDE Data Analytics Use
More informationData Management/Visualization on the Grid at PPPL. Scott A. Klasky Stephane Ethier Ravi Samtaney
Data Management/Visualization on the Grid at PPPL Scott A. Klasky Stephane Ethier Ravi Samtaney The Problem Simulations at NERSC generate GB s TB s of data. The transfer time for practical visualization
More informationA Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System
A Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System Young-Ho Kim, Eun-Ji Lim, Gyu-Il Cha, Seung-Jo Bae Electronics and Telecommunications
More informationData Analytics at NERSC. Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services
Data Analytics at NERSC Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services NERSC User Meeting August, 2015 Data analytics at NERSC Science Applications Climate, Cosmology, Kbase, Materials,
More informationSURFsara HPC Cloud Workshop
SURFsara HPC Cloud Workshop doc.hpccloud.surfsara.nl UvA workshop 2016-01-25 UvA HPC Course Jan 2016 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current
More informationMississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC
Mississippi State University High Performance Computing Collaboratory Brief Overview Trey Breckenridge Director, HPC Mississippi State University Public university (Land Grant) founded in 1878 Traditional
More informationHow To Use Hadoop For Gis
2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data: Using ArcGIS with Apache Hadoop David Kaiser Erik Hoel Offering 1330 Esri UC2013. Technical Workshop.
More informationTemporal variation in snow cover over sea ice in Antarctica using AMSR-E data product
Temporal variation in snow cover over sea ice in Antarctica using AMSR-E data product Michael J. Lewis Ph.D. Student, Department of Earth and Environmental Science University of Texas at San Antonio ABSTRACT
More informationNITRD and Big Data. George O. Strawn NITRD
NITRD and Big Data George O. Strawn NITRD Caveat auditor The opinions expressed in this talk are those of the speaker, not the U.S. government Outline What is Big Data? Who is NITRD? NITRD's Big Data Research
More informationMapReduce and Hadoop Distributed File System
MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) bina@buffalo.edu http://www.cse.buffalo.edu/faculty/bina Partially
More information