Open Cirrus : A Global Testbed for Cloud Computing Research
|
|
- Maude George
- 8 years ago
- Views:
Transcription
1 Open Cirrus : A Global Testbed for Cloud Computing Research David O Hallaron Director, Intel Labs Pittsburgh Carnegie Mellon University
2 Open Cirrus Testbed Sponsored by HP, Intel, and Yahoo! (w/additional support from NSF). 9 sites worldwide, target of around 20 in the next two years. Each site cores. Shared hardware infrastructure (~15K cores), s, research, apps. 2 Dave O Hallaron DIC Workshop, 2009
3 Open Cirrus Context Goals 1. Foster new systems and s research around cloud computing 2. Catalyze open-source stack and APIs for the cloud Motivation Enable more tier-2 and tier-3 public and private cloud providers How are we different? Support for systems research and applications research Access to bare metal, integrated virtual-physical migration Federation of heterogeneous datacenters Global signon, monitoring, storage s. 3 Dave O Hallaron DIC Workshop, 2009
4 Intel BigData Cluster Open Cirrus site hosted by Intel Labs Pittsburgh Operational since Jan nodes, 1440 cores, 1416 GB DRAM, 500 TB disk Supporting 50 users, 20 projects from CMU, Pitt, Intel, GaTech Cluster management, location and power aware scheduling, physical virtual migration (Tashi), cache savvy algorithms (Hi- Spade), realtime streaming frameworks (SLIPstream), optical datacenter interconnects (CloudConnect), log-based architectures (LBA) Machine translation, speech recognition, programmable matter simulation, ground model generation, online education, realtime brain activity decoding, realtime gesture and object recognition, federated perception, automated food recognition. Idea for a research project on Open Cirrus? Send short abstract to Mike Kozuch, Intel Labs Pittsburgh, michael.a.kozuch@intel.com 4 Dave O Hallaron DIC Workshop, 2009
5 Open Cirrus Stack Management and control subsystem Compute + network + storage resources Power + cooling Physical Resource set (PRS) 5 Dave O Hallaron DIC Workshop, 2009 Credit: John Wilkes (HP)
6 Open Cirrus Stack PRS clients, each with their own physical data center Research Tashi NFS storage HDFS storage PRS 6 Dave O Hallaron DIC Workshop, 2009
7 Open Cirrus Stack Virtual clusters (e.g., Tashi) Virtual cluster Virtual cluster Research Tashi NFS storage HDFS storage PRS 7 Dave O Hallaron DIC Workshop, 2009
8 Open Cirrus Stack BigData App Hadoop 1. Application running 2. On Hadoop 3. On Tashi virtual cluster 4. On a PRS 5. On real hardware Virtual cluster Virtual cluster Research Tashi NFS storage HDFS storage PRS 8 Dave O Hallaron DIC Workshop, 2009
9 Open Cirrus Stack BigData app Hadoop Experiment/ save/restore Virtual cluster Virtual cluster Research Tashi NFS storage HDFS storage PRS 9 Dave O Hallaron DIC Workshop, 2009
10 Open Cirrus Stack Platform s BigData App Hadoop Experiment/ save/restore Virtual cluster Virtual cluster Research Tashi NFS storage HDFS storage PRS 10 Dave O Hallaron DIC Workshop, 2009
11 Open Cirrus Stack Platform s User s BigData App Hadoop Experiment/ save/restore Virtual cluster Virtual cluster Research Tashi NFS storage HDFS storage PRS 11 Dave O Hallaron DIC Workshop, 2009
12 Open Cirrus Stack Platform s User s BigData App Hadoop Experiment/ save/restore Virtual cluster Virtual cluster Research Tashi NFS storage HDFS storage PRS 12 Dave O Hallaron DIC Workshop, 2009
13 System Organization Compute nodes are divided into dynamically-allocated, vlanisolated PRS subdomains Open research Tashi development Apps running in a VM mgmt infrastructure (e.g., Tashi) Apps switch back and forth between virtual and phyiscal. Production storage Proprietary research Open workload monitoring and trace collection 13 Dave O Hallaron DIC Workshop, 2009
14 Open Cirrus Stack - PRS PRS goals Provide mini-datacenters to researchers Isolate experiments from each other Stable base for other research PRS approach Allocate sets of physical co-located nodes, isolated inside VLANs. PRS code from HP Labs being merged into Apache Tashi project. Credit: Kevin Lai (HP), Richard Gass, Michael Ryan, Michael Kozuch, and David O Hallaron (Intel) 14 Dave O Hallaron DIC Workshop, 2009
15 Open Cirrus Stack - Tashi An open source Apache Software Foundation project sponsored by Intel, CMU, and HP. Research infrastructure for cloud computing on Big Data Implements AWS interface Daily production use on Intel cluster for 6 months Manages pool of 80 physical nodes ~20 projects/40 users from CMU, Pitt, Intel Research focus: Location-aware co-scheduling of VMs, storage, and power. Integrated physical/virtual migration (using PRS) Credit: Mike Kozuch, Michael Ryan, Richard Gass, Dave O Hallaron (Intel), Greg Ganger, Mor Harchol-Balter, Julio Lopez, Jim Cipar, Elie Kravat, Anshul Ghandi, Michael Stroucken (CMU) 15 Dave O Hallaron DIC Workshop, 2009
16 Tashi High-Level Design Most decisions happen in the scheduler; manages compute/storage/power in concert Scheduler Cluster Manager Node Virtualization Service Node Storage Service Node Services are instantiated through virtual machines Node Node Data location and power information is exposed to scheduler and s The storage aggregates the capacity of the commodity nodes to house Big Data repositories. Node CM maintains databases and routes messages; decision logic is limited 16 Dave O Hallaron DIC Workshop, 2009 Cluster nodes are assumed to be commodity machines
17 Throughput/disk (MB/s) 3.6X 3.5X 11X 9.2X Location Matters (calculated) 300 Calculated (40 racks * 30 nodes * 2 disks) Disk-1G SSD-1G Disk-10G SSD-10G Random Placement Location-Aware Placement 17 Dave O Hallaron DIC Workshop, 2009
18 Throughput/disk (MB/s) 2.9X 4.7X Location Matters (measured) Measured (2 racks * 14 nodes * 6 disks) ssh xinetd Random Placement Location-aware Placement 18 Dave O Hallaron DIC Workshop, 2009
19 Open Cirrus Stack Hadoop An open-source Apache Software Foundation project sponsored by Yahoo! Provides a parallel programming model (MapReduce), a distributed file system, and a parallel database (HDFS) Dave O Hallaron DIC Workshop, 2009
20 Typical Web Service Data center db db Application Application server Application server Application server server HTTP server Query Result External client Characteristics: Small queries and results Little client computation Moderate server computation Moderate data accessed per query Examples: Web sites serving dynamic content 20 Dave O Hallaron DIC Workshop, 2009
21 Big Data Service Data-intensive computing system (e.g. Hadoop) External data sources Parallel data server Parallel compute server Parallel query server Query Result External client d 1 d 2 d 3 Parallel file system (e.g., GFS, HDFS) Source dataset Derived datasets Characteristics: Small queries and results Massive data and computation performed on server 21 Dave O Hallaron DIC Workshop, 2009 Examples: Search Photo scene completion Log processing Science analytics
22 Streaming Data Service External data sources Parallel data server Parallel compute server Parallel query server Continuous query stream Continuous query results External client and sensors d 1 d 2 d 3 Source dataset Derived datasets Characteristics: Application lives on client Client uses cloud as an accelerator Data transferred with query Variable, latency sensitive HPC on server Often combines with Big Data 22 Dave O Hallaron DIC Workshop, 2009 Examples: Perceptual computing on high data-rate sensors: real time brain activity detection, object recognition, gesture recognition
23 Streaming Data Service Gestris Interactive Gesture Recognition Two-player Gestris (gesture-tetris) implementation 2 video sources Uses a simplified volumetric event detection algorithm 10 cores, 3GHz each: -1 camera input, scaling -1 game + display -8 for volumetric matching (4 for each video stream) Achieves full 15fps rate Arm gesture selects action Credit: Lily Mummert, Babu Pillai, Rahul Sukthankar (Intel), Martial Hebert, Pyre Matikainen (CMU) 23 Dave O Hallaron DIC Workshop, 2009
24 Streaming Data Meets Big Data Real-time Brain Activity Decoding Magnetoencephalography (MEG) measures the magnetic fields associated with brain activity. Temporal and spatial resolution offers unprecedented insights into brain dynamics. MEG ECoG Credit: Dean Pomerleau (Intel), Tom Mitchell, Gus Sudre and Mark Palatucci (CMU), Wei Wang, Doug Weber and Anto Bagic (UPitt) 24 Dave O Hallaron DIC Workshop, 2009
25 Localizing Sources of Magnetic Activity Goal: determine spatiotemporal pattern of brain activity most likely to have caused measured magnetic field Magnetic Field Measurements Estimated Brain Activity Ill-posed problem that applies to both MEG and EEG. Very computationally expensive Important for better mapping to fmri results, further neuroscience understanding of brain processes and (maybe) improve decoding. 25 Dave O Hallaron DIC Workshop, 2009
26 Big Data Background Processing Source localization pipeline MRI data MEG or EEG field data Reconstruct brain (~ 40 hr/ subject) Create co-registered boundary model (~ 1 hr / subject) Pre-processing & Filtering (~ 1 hr / session) Brain Structural Information Model of electro-magnetic field from sources to sensors (~ 5 min / session) Brain activity estimates (movies, time series) (~ 15 min / session) Electromagnetic Field Measurements 26 Dave O Hallaron DIC Workshop, 2009
27 Hand Foot Celery Streaming/Big Data Service Real-Time MEG/EEG Decoding Real-Time Decoding Of Brain Activity Decoded Results Brain Activity Decoding Hand Celery Hand Foot Airplane Off-line Decoder Training (once) Brain activity estimates Cloud cluster Source Localization Preprocess & filter Stimulus MEG/EEG Electro-magnetic Imaging field Data 27 Dave O Hallaron DIC Workshop, 2009 Off-line Source Modeling (once)
28 Summary and Lessons Using the cloud as an accelerator for interactive streaming/big data apps is an important usage model. Location-aware and power-aware workload scheduling still open problems. Need integrated physical/virtual allocations to combat cluster squatting. Storage models are still a problem. GFS-style storage systems not mature, impact of SSDs unknown We need open source architecture and reference implementations. Access model Local and global s Application frameworks Need to investigate new application frameworks Map-reduce/Hadoop not always appropriate 28 Dave O Hallaron DIC Workshop, 2009
S06: Open-Source Stack for Cloud Computing
S06: Open-Source Stack for Cloud Computing Milind Bhandarkar Yahoo! Richard Gass Intel Michael Kozuch Intel Michael Ryan Intel 1 Agenda Sessions: (A) Introduction 8.30-9.00 (B) Hadoop 9.00-10.00 Break
More informationCloud Computing mit mathematischen Anwendungen
Cloud Computing mit mathematischen Anwendungen Vorlesung SoSe 2009 Dr. Marcel Kunze Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) KIT the cooperation of Forschungszentrum
More informationOpen Cirrus TM Cloud Computing Testbed: Federated Data Centers for Open Source Systems and Services Research
Open Cirrus TM Cloud Computing Testbed: Federated Data Centers for Open Source Systems and Services Research Roy Campbell, 5 Indranil Gupta, 5 Michael Heath, 5 Steven Y. Ko, 5 Michael Kozuch, 3 Marcel
More informationElastic Cloud Computing in the Open Cirrus Testbed implemented via Eucalyptus
Elastic Cloud Computing in the Open Cirrus Testbed implemented via Eucalyptus International Symposium on Grid Computing 2009 (Taipei) Christian Baun The cooperation of and Universität Karlsruhe (TH) Agenda
More informationOpen Cirrus: Towards an Open Source Cloud Stack
Open Cirrus: Towards an Open Source Cloud Stack Karlsruhe Institute of Technology (KIT) HPC2010, Cetraro, June 2010 Marcel Kunze KIT University of the State of Baden-Württemberg and National Laboratory
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
More informationBig Workflow: More than Just Intelligent Workload Management for Big Data
Big Workflow: More than Just Intelligent Workload Management for Big Data Michael Feldman White Paper February 2014 EXECUTIVE SUMMARY Big data applications represent a fast-growing category of high-value
More informationElasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
More informationThe Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014
The Flash-Transformed Financial Data Center Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014 Forward-Looking Statements During our meeting today we will
More informationCloud Design and Implementation. Cheng Li MPI-SWS Nov 9 th, 2010
Cloud Design and Implementation Cheng Li MPI-SWS Nov 9 th, 2010 1 Modern Computing CPU, Mem, Disk Academic computation Chemistry, Biology Large Data Set Analysis Online service Shopping Website Collaborative
More informationShareability and Locality Aware Scheduling Algorithm in Hadoop for Mobile Cloud Computing
Shareability and Locality Aware Scheduling Algorithm in Hadoop for Mobile Cloud Computing Hsin-Wen Wei 1,2, Che-Wei Hsu 2, Tin-Yu Wu 3, Wei-Tsong Lee 1 1 Department of Electrical Engineering, Tamkang University
More informationBig Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
More informationData Semantics Aware Cloud for High Performance Analytics
Data Semantics Aware Cloud for High Performance Analytics Microsoft Future Cloud Workshop 2011 June 2nd 2011, Prof. Jun Wang, Computer Architecture and Storage System Laboratory (CASS) Acknowledgement
More informationA Cloud Test Bed for China Railway Enterprise Data Center
A Cloud Test Bed for China Railway Enterprise Data Center BACKGROUND China Railway consists of eighteen regional bureaus, geographically distributed across China, with each regional bureau having their
More informationVirtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
More informationHadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
More informationApache Hadoop FileSystem and its Usage in Facebook
Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs
More informationScalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
More informationHadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationData Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
More informationHYPER-CONVERGED INFRASTRUCTURE STRATEGIES
1 HYPER-CONVERGED INFRASTRUCTURE STRATEGIES MYTH BUSTING & THE FUTURE OF WEB SCALE IT 2 ROADMAP INFORMATION DISCLAIMER EMC makes no representation and undertakes no obligations with regard to product planning
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationLecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
More informationBeyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.
Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
More informationCSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
More informationMapReduce and Hadoop Distributed File System
MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) bina@buffalo.edu http://www.cse.buffalo.edu/faculty/bina Partially
More informationHadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
More informationOpen Cirrus: Enabling System Software Research in Computer Science
OpenCirrus: Enabling System Software Research in Computer Science through Clusters of Bare Metal Michael Kozuch Intel Labs, Pittsburgh November 2011 2 Motivation Connecting the Cloud to Open Research Cloud
More informationHadoop on the Gordon Data Intensive Cluster
Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,
More informationFrom GWS to MapReduce: Google s Cloud Technology in the Early Days
Large-Scale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org MapReduce/Hadoop
More informationPerformance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
More informationAccelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationHortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
More informationHadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed
More informationViswanath Nandigam Sriram Krishnan Chaitan Baru
Viswanath Nandigam Sriram Krishnan Chaitan Baru Traditional Database Implementations for large-scale spatial data Data Partitioning Spatial Extensions Pros and Cons Cloud Computing Introduction Relevance
More informationPerformance measurement of a private Cloud in the OpenCirrus Testbed
Performance measurement of a private Cloud in the OpenCirrus Testbed 4th Workshop on Virtualization in High-Performance Cloud Computing (VHPC '09) Euro-Par 2009 Delft August 25th 2009 Christian Baun KIT
More informationAmazon EC2 Product Details Page 1 of 5
Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of
More informationLustre * Filesystem for Cloud and Hadoop *
OpenFabrics Software User Group Workshop Lustre * Filesystem for Cloud and Hadoop * Robert Read, Intel Lustre * for Cloud and Hadoop * Brief Lustre History and Overview Using Lustre with Hadoop Intel Cloud
More informationUnlocking the Intelligence in. Big Data. Ron Kasabian General Manager Big Data Solutions Intel Corporation
Unlocking the Intelligence in Big Data Ron Kasabian General Manager Big Data Solutions Intel Corporation Volume & Type of Data What s Driving Big Data? 10X Data growth by 2016 90% unstructured 1 Lower
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationHadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
More informationParallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationSoftware-Defined Networks Powered by VellOS
WHITE PAPER Software-Defined Networks Powered by VellOS Agile, Flexible Networking for Distributed Applications Vello s SDN enables a low-latency, programmable solution resulting in a faster and more flexible
More informationA Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent
More informationNetwork-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks
Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks Praveenkumar Kondikoppa, Chui-Hui Chiu, Cheng Cui, Lin Xue and Seung-Jong Park Department of Computer Science,
More informationIntel Cloud Builder Guide to Cloud Design and Deployment on Intel Platforms
Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Platforms Ubuntu* Enterprise Cloud Executive Summary Intel Cloud Builder Guide Intel Xeon Processor Ubuntu* Enteprise Cloud Canonical*
More informationWhat is Big Data? Concepts, Ideas and Principles. Hitesh Dharamdasani
What is Big Data? Concepts, Ideas and Principles Hitesh Dharamdasani # whoami Security Researcher, Malware Reversing Engineer, Developer GIT > George Mason > UC Berkeley > FireEye > On Stage Building Data-driven
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationDATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
More informationScalable Services for Digital Preservation
Scalable Services for Digital Preservation A Perspective on Cloud Computing Rainer Schmidt, Christian Sadilek, and Ross King Digital Preservation (DP) Providing long-term access to growing collections
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationAzure Data Lake Analytics
Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data
More informationThe Open Cloud Near-Term Infrastructure Trends in Cloud Computing
The Open Cloud Near-Term Infrastructure Trends in Cloud Computing Markus Leberecht BELNET Networking Conference 25-Oct-2012 1 Growth & IT Challenges Drive Need for Cloud Computing IT Pros Growth IT Challenges
More informationData Centric Computing Revisited
Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data
More informationManifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
More informationAutomating Big Data Benchmarking for Different Architectures with ALOJA
www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.
More informationCloud Computing Now and the Future Development of the IaaS
2010 Cloud Computing Now and the Future Development of the IaaS Quanta Computer Division: CCASD Title: Project Manager Name: Chad Lin Agenda: What is Cloud Computing? Public, Private and Hybrid Cloud.
More informationData-Intensive Programming. Timo Aaltonen Department of Pervasive Computing
Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti
More informationAlternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix
Alternative Deployment Models for Cloud Computing in HPC Applications Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix The case for Cloud in HPC Build it in house Assemble in the cloud?
More informationHadoop and its Usage at Facebook. Dhruba Borthakur dhruba@apache.org, June 22 rd, 2009
Hadoop and its Usage at Facebook Dhruba Borthakur dhruba@apache.org, June 22 rd, 2009 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed on Hadoop Distributed File System Facebook
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationFrom Internet Data Centers to Data Centers in the Cloud
From Internet Data Centers to Data Centers in the Cloud This case study is a short extract from a keynote address given to the Doctoral Symposium at Middleware 2009 by Lucy Cherkasova of HP Research Labs
More informationHow To Use Hadoop For Gis
2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data: Using ArcGIS with Apache Hadoop David Kaiser Erik Hoel Offering 1330 Esri UC2013. Technical Workshop.
More informationCIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.
CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Cloud Computing and Amazon Web Services Cloud Computing Amazon
More informationNVIDIA GPUs in the Cloud
NVIDIA GPUs in the Cloud 4 EVOLVING CLOUD REQUIREMENTS On premises Off premises Hybrid Cloud Connecting clouds New workloads Components to disrupt 5 GLOBAL CLOUD PLATFORM Unified architecture enabled by
More informationBig Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationUnisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise
Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Introducing Unisys All in One software based weather platform designed to reduce server space, streamline operations, consolidate
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
More informationSistemi Operativi e Reti. Cloud Computing
1 Sistemi Operativi e Reti Cloud Computing Facoltà di Scienze Matematiche Fisiche e Naturali Corso di Laurea Magistrale in Informatica Osvaldo Gervasi ogervasi@computer.org 2 Introduction Technologies
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationJournal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra
More informationDistributed File Systems
Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationDell Reference Configuration for DataStax Enterprise powered by Apache Cassandra
Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra A Quick Reference Configuration Guide Kris Applegate kris_applegate@dell.com Solution Architect Dell Solution Centers Dave
More informationFPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab
FPGA Accelerator Virtualization in an OpenPOWER cloud Fei Chen, Yonghua Lin IBM China Research Lab Trend of Acceleration Technology Acceleration in Cloud is Taking Off Used FPGA to accelerate Bing search
More informationTHE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
More informationSCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS
Sean Lee Solution Architect, SDI, IBM Systems SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Agenda Converging Technology Forces New Generation Applications Data Management Challenges
More informationCloud Computing Paradigm
Cloud Computing Paradigm Julio Guijarro Automated Infrastructure Lab HP Labs Bristol, UK 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
More informationWhat s Happening to the Mainframe? Mobile? Social? Cloud? Big Data?
December, 2014 What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data? Glenn Anderson IBM Lab Services and Training Today s mainframe is a hybrid system z/os Linux on Sys z DB2 Analytics Accelerator
More informationHDFS Space Consolidation
HDFS Space Consolidation Aastha Mehta*,1,2, Deepti Banka*,1,2, Kartheek Muthyala*,1,2, Priya Sehgal 1, Ajay Bakre 1 *Student Authors 1 Advanced Technology Group, NetApp Inc., Bangalore, India 2 Birla Institute
More informationDell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
More informationEinsatzfelder von IBM PureData Systems und Ihre Vorteile.
Einsatzfelder von IBM PureData Systems und Ihre Vorteile demirkaya@de.ibm.com Agenda Information technology challenges PureSystems and PureData introduction PureData for Transactions PureData for Analytics
More informationAn Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationDesign and Evolution of the Apache Hadoop File System(HDFS)
Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop
More informationCloud Optimize Your IT
Cloud Optimize Your IT Windows Server 2012 The information contained in this presentation relates to a pre-release product which may be substantially modified before it is commercially released. This pre-release
More informationCloud Computing through Virtualization and HPC technologies
Cloud Computing through Virtualization and HPC technologies William Lu, Ph.D. 1 Agenda Cloud Computing & HPC A Case of HPC Implementation Application Performance in VM Summary 2 Cloud Computing & HPC HPC
More informationCloud Computing Where ISR Data Will Go for Exploitation
Cloud Computing Where ISR Data Will Go for Exploitation 22 September 2009 Albert Reuther, Jeremy Kepner, Peter Michaleas, William Smith This work is sponsored by the Department of the Air Force under Air
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More information