Bringing Compute to the Data Alternatives to Moving Data. Part of EUDAT s Training in the Fundamentals of Data Infrastructures
|
|
- Norman Nash
- 8 years ago
- Views:
Transcription
1 Bringing Compute to the Data Alternatives to Moving Data Part of EUDAT s Training in the Fundamentals of Data Infrastructures
2 Introduction Why consider alternatives? The traditional approach Alternative approaches: Distributed Computing Workflows Bringing the Compute to the Data
3 Why should alternative approaches be considered? Moving data is still hard, even when you re using the right tools. Data volumes are expected to continue to increase, and this is expected to happen more rapidly then increases in transfer speeds Alternatives require thinking about things differently, so it may be wise to start thinking about alternatives before current techniques break down
4 Traditional Approach Input data is stored at location A Compute resource is at location B Output data is required at location C 1. Move data from A to B 2. Perform computation at B 3. Move data from B to C (A & C are often the same place) A B C
5 Traditional approach Data Compute
6 Alternative Approaches: A Disclaimer None of the following approaches provide a silver bullet! Not all approaches will be useful for all problems and in some case, using these approaches can make things worse These should complement existing approaches and be used where appropriate
7 Distributed Computing Here, the idea is that you might not need to do all of the compute at B. In general, this approach could make things worse, depending on your data transfer pattern It will not be suitable for all kinds of problem Many of the considerations here are traditional parallel computing concepts
8 Distributed Computing as Parallel Computing Is the problem trivially parallel? Is it possible to solve parts of the problem using only part of the input data, and simply recombine the output at the end of a run? If all processors have access to all the data at the start, is it then possible for them to proceed with little or no communication during the runs? If there is the need to communicate during a run, how intensive are these communications? Do you have all-to-alls?
9 When might Distributed Computing be a good alternative? When input data starts off distributed Fairly common with large scale experimental data: Sensors, detectors, etc. When input data is already mirrored When you ve had to move the data before anyway and you could have moved it to multiple places instead of just one When the computation is trivially parallelisable or requires only limited communication
10 A B1 B2 B3 B4 C
11 A1 A2 A3 A4 B1 B2 B3 B4 C
12 A1 A2 A3 A4 B1 B2 B3 B4 C
13 A1 B1 B2 C
14 A1 A2 B1 B2 B3 C
15 Is this Grid Computing? There are definite overlaps between these ideas of distributed computing and the grid computing that promised so much in the last decade Grid is not such a cool topic anymore, but many of the ideas could be reused in different contexts (possibly hidden from an end-user) This way of computing may still come into its own for certain kinds of big data problems
16 Scientific Computation in the cloud? Likely to be a while before this can get close to existing approaches in terms of efficiency, but it is being used in some places e.g. Amazon has Cluster Compute and Cluster GPU instances (see Some data sets are already in the cloud, e.g. Annotated Human Genome Data provided by ENSEMBL Various US Census Databases from The US Census Bureau UniGene provided by the National Center for Biotechnology Information Freebase Data Dump from Freebase.com
17 Big Input Data Likely to become more common as more and more data is stored and available for re-use Projects like EUDAT will make it easier to access to stored data This will be the case for much data-intensive science Where here I use this term in the context of the fourth paradigm : computers as datascopes
18 Workflows Related to distributed computing Sometimes referred to as programming in the large Again, this potentially requires more data movement The idea is to break the computation down so that some of it can be done at A, some of it can be done at B, and some of it can be done at C. Also, instead of doing everything at B, this could instead be done at B1, B2, B3, B4,
19 Simple Motivating Example Big Input Data A B Small Output Data C
20 A A B1 B B2 C C
21 or a more realistic case? Image Source:
22 Difficulties with this approach Change to computation algorithm likely A trade-off, but it might only need to be done once Orchestration Coordinating computation at multiple sites Workflows can help with this Can help to address the added complexities of Multiple jurisdictions / access policies Job scheduling Automation
23 Approaches to orchestration Local Each compute service works independently Data can be pushed or pulled between services (or some combination) The route that the data should take can be passed with the data predetermined at the service communicated manually to the service for each run Orchestrated The usual workflow approch A workflow engine communicates with services or processing elements to control data flow
24 An aside: Push & Pull Push Service 1 completes processing. Service 1 makes a call to service 2 and sends the data to service 2 The arrival of data triggers service 2 to run Pull Service 1 runs and stores its output locally Service 2 runs (triggered manually) Service 2 initiates data transfer from service 1 Service 1 Service 1 Service 2 Service 2
25 Workflow Engines Scientific Workflows Kepler, Taverna, Triana, Pegasus (Condor), VisTrails Unicore, OGSA-DAI (for database-oriented flows) General Purpose / Business Orientated Service Oriented Architecture Solutions BPEL engines, e.g., Oracle BPEL Process Manager SAP Exchange Infrastructure WebSphere Process Server Many of these based on web services Datacentre orientated Hadoop (MapReduce), Storm (stream processing)
26 Moving the Compute to the Data A more general idea which is related to both the previous approaches This approach relies to some extent on having an infrastructure that supports this approach Can work particularly well where A and C are the same place
27 Computing Close To The Data Relational Database Systems Send a query as SQL Virtual Machines Send a VM image to a virtualisation environment on a machine which can directly mount the data Allow a user to submit a script or executable on a machine close to the data SPARQL endpoints on RDF triple stores Data Services (e.g. as Web Services) with some API beyond file transfer Prefiltering / transformation / subsetting Application As A Service
28 Implications for Data Centres These approaches rely on data centres to provide computational resources and services Cons: Interface required to accept query or compute job Compute/processing resources required Pros: Less strain on the network
29 Conclusions Data movement will always be required Moving large amounts of data is never likely to be easy There is not one single solution, but by considering alternative approaches to big data problems may help you to solve problems and answer questions that would have otherwise been impossible
30 Acknowledgements These slides were produced by Adam Carter (EPCC, The University of Edinburgh) as part of the EUDAT project ( The University of Edinburgh You are welcome to re-use these slides under the terms of CC BY 4.0 (
SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION
SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION Kirandeep Kaur Khushdeep Kaur Research Scholar Assistant Professor, Department Of Cse, Bhai Maha Singh College Of Engineering, Bhai Maha Singh
More informationAzure Data Lake Analytics
Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data
More informationHPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
More informationWorkflow Tools at NERSC. Debbie Bard djbard@lbl.gov NERSC Data and Analytics Services
Workflow Tools at NERSC Debbie Bard djbard@lbl.gov NERSC Data and Analytics Services NERSC User Meeting August 13th, 2015 What Does Workflow Software Do? Automate connection of applications Chain together
More informationSan Diego Supercomputer Center, UCSD. Institute for Digital Research and Education, UCLA
Facilitate Parallel Computation Using Kepler Workflow System on Virtual Resources Jianwu Wang 1, Prakashan Korambath 2, Ilkay Altintas 1 1 San Diego Supercomputer Center, UCSD 2 Institute for Digital Research
More informationData Mining in the Swamp
WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all
More informationCPET 581 Cloud Computing: Technologies and Enterprise IT Strategies
CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies Lecture 8 Cloud Programming & Software Environments Part 1 of 2 Spring 2013 A Specialty Course for Purdue University s M.S. in Technology
More informationReal Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationParallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.
Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development
More informationPilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing
Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing Andre Luckow, Peter M. Kasson, Shantenu Jha STREAMING 2016, 03/23/2016 RADICAL, Rutgers, http://radical.rutgers.edu
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationIndustry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, mobitko@ra.rockwell.com Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
More informationTackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
More informationBig Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
More informationMicrosoft Research Windows Azure for Research Training
Copyright 2013 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationMicrosoft Research Microsoft Azure for Research Training
Copyright 2014 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the
More informationData-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
More informationCloud Computing. Summary
Cloud Computing Lecture 1 2011-2012 https://fenix.ist.utl.pt/disciplinas/cn Summary Teaching Staff. Rooms and Schedule. Goals. Context. Syllabus. Reading Material. Assessment and Grading. Important Dates.
More informationData Services @neurist and beyond
s @neurist and beyond Siegfried Benkner Department of Scientific Computing Faculty of Computer Science University of Vienna http://www.par.univie.ac.at Department of Scientific Computing Parallel Computing
More informationpomsets: Workflow management for your cloud
: Workflow management for your cloud Michael J Pan Nephosity 20 April, 2010 : Workflow management for your cloud Definition Motivation Issues with workflow management + grid computing Workflow management
More informationUnderstanding the Value of In-Memory in the IT Landscape
February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to
More informationebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry.
Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry. www.persistent.com 3 4 5 5 7 9 10 11 12 13 From the Vantage Point
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationIntel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
More informationWSO2 Message Broker. Scalable persistent Messaging System
WSO2 Message Broker Scalable persistent Messaging System Outline Messaging Scalable Messaging Distributed Message Brokers WSO2 MB Architecture o Distributed Pub/sub architecture o Distributed Queues architecture
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationThe Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project
The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project Alastair Duncan STFC Pre Coffee talk STFC July 2014 SCAPE Scalable Preservation Environments The
More informationManaging large clusters resources
Managing large clusters resources ID2210 Gautier Berthou (SICS) Big Processing with No Locality Job( /crawler/bot/jd.io/1 ) submi t Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth
More informationLeveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline
More informationEmerging Requirements and DBMS Technologies:
Emerging Requirements and DBMS Technologies: When Is Relational the Right Choice? Carl Olofson Research Vice President, IDC April 1, 2014 Agenda 2 Why Relational in the First Place? Evolution of Databases
More informationCloud Computing Trends
UT DALLAS Erik Jonsson School of Engineering & Computer Science Cloud Computing Trends What is cloud computing? Cloud computing refers to the apps and services delivered over the internet. Software delivered
More informationContents. Preface Acknowledgements. Chapter 1 Introduction 1.1
Preface xi Acknowledgements xv Chapter 1 Introduction 1.1 1.1 Cloud Computing at a Glance 1.1 1.1.1 The Vision of Cloud Computing 1.2 1.1.2 Defining a Cloud 1.4 1.1.3 A Closer Look 1.6 1.1.4 Cloud Computing
More informationSURVEY ON SCIENTIFIC DATA MANAGEMENT USING HADOOP MAPREDUCE IN THE KEPLER SCIENTIFIC WORKFLOW SYSTEM
SURVEY ON SCIENTIFIC DATA MANAGEMENT USING HADOOP MAPREDUCE IN THE KEPLER SCIENTIFIC WORKFLOW SYSTEM 1 KONG XIANGSHENG 1 Department of Computer & Information, Xinxiang University, Xinxiang, China E-mail:
More informationTRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
More information2015 The MathWorks, Inc. 1
25 The MathWorks, Inc. 빅 데이터 및 다양한 데이터 처리 위한 MATLAB의 인터페이스 환경 및 새로운 기능 엄준상 대리 Application Engineer MathWorks 25 The MathWorks, Inc. 2 Challenges of Data Any collection of data sets so large and complex
More informationBIG DATA SOLUTION DATA SHEET
BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationSurfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,
More informationFinal Project Proposal. CSCI.6500 Distributed Computing over the Internet
Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least
More informationThe basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
More informationChallenges for cloud software engineering
Challenges for cloud software engineering Ian Sommerville St Andrews University Why is cloud software engineering different or is it? What needs to be done to make cloud software engineering easier for
More informationA Professional Big Data Master s Program to train Computational Specialists
A Professional Big Data Master s Program to train Computational Specialists Anoop Sarkar, Fred Popowich, Alexandra Fedorova! School of Computing Science! Education for Employable Graduates: Critical Questions
More informationBig Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013
Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device
More informationEpimorphics Linked Data Publishing Platform
Epimorphics Linked Data Publishing Platform Epimorphics Services for G-Cloud Version 1.2 15 th December 2014 Authors: Contributors: Review: Andy Seaborne, Martin Merry Dave Reynolds Epimorphics Ltd, 2013
More informationBIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview
BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM An Overview Contents Contents... 1 BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM... 1 Program Overview... 4 Curriculum... 5 Module 1: Big Data: Hadoop
More information1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India
1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto
More informationEnergy efficiency in HPC :
Energy efficiency in HPC : A new trend? A software approach to save power but still increase the number or the size of scientific studies! 19 Novembre 2012 The EDF Group in brief A GLOBAL LEADER IN ELECTRICITY
More informationWORKFLOW ENGINE FOR CLOUDS
WORKFLOW ENGINE FOR CLOUDS By SURAJ PANDEY, DILEBAN KARUNAMOORTHY, and RAJKUMAR BUYYA Prepared by: Dr. Faramarz Safi Islamic Azad University, Najafabad Branch, Esfahan, Iran. Workflow Engine for clouds
More informationCloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad
Cloud Computing: Computing as a Service Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad Abstract: Computing as a utility. is a dream that dates from the beginning from the computer
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationBig Data at Cloud Scale
Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For
More informationHPC technology and future architecture
HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr
More informationVirtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013
Virtualisation Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013 Virtualisation @ RAL Context at RAL Hyper-V Services Platform Scientific Computing Department
More informationCloud Computing Paradigm
Cloud Computing Paradigm Julio Guijarro Automated Infrastructure Lab HP Labs Bristol, UK 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
More informationIn-Database Analytics
Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing
More informationInformation Architecture
The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationA Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationIBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
More informationGrid Computing vs Cloud
Chapter 3 Grid Computing vs Cloud Computing 3.1 Grid Computing Grid computing [8, 23, 25] is based on the philosophy of sharing information and power, which gives us access to another type of heterogeneous
More informationCloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A
Identifier: Date: Activity: Authors: Status: Link: Cloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A J O I N T A C T I O N ( S A 1, J R A 3 ) F I
More informationBig Data Integration: A Buyer's Guide
SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology
More informationSriram Krishnan, Ph.D. sriram@sdsc.edu
Sriram Krishnan, Ph.D. sriram@sdsc.edu (Re-)Introduction to cloud computing Introduction to the MapReduce and Hadoop Distributed File System Programming model Examples of MapReduce Where/how to run MapReduce
More informationBasic Scheduling in Grid environment &Grid Scheduling Ontology
Basic Scheduling in Grid environment &Grid Scheduling Ontology By: Shreyansh Vakil CSE714 Fall 2006 - Dr. Russ Miller. Department of Computer Science and Engineering, SUNY Buffalo What is Grid Computing??
More informationBig Data Processing and Analytics for Mouse Embryo Images
Big Data Processing and Analytics for Mouse Embryo Images liangxiu han Zheng xie, Richard Baldock The AGILE Project team FUNDS Research Group - Future Networks and Distributed Systems School of Computing,
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationHow to Do/Evaluate Cloud Computing Research. Young Choon Lee
How to Do/Evaluate Cloud Computing Research Young Choon Lee Cloud Computing Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing
More informationDBpedia German: Extensions and Applications
DBpedia German: Extensions and Applications Alexandru-Aurelian Todor FU-Berlin, Innovationsforum Semantic Media Web, 7. Oktober 2014 Overview Why DBpedia? New Developments in DBpedia German Problems in
More informationA Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel
A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated
More informationScientific versus Business Workflows
2 Scientific versus Business Workflows Roger Barga and Dennis Gannon The formal concept of a workflow has existed in the business world for a long time. An entire industry of tools and technology devoted
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationUsing Big Data and GIS to Model Aviation Fuel Burn
Using Big Data and GIS to Model Aviation Fuel Burn Gary M. Baker USDOT Volpe Center 2015 Transportation DataPalooza June 17, 2015 The National Transportation Systems Center Advancing transportation innovation
More informationData Lab System Architecture
Data Lab System Architecture Data Lab Context Data Lab Architecture Astronomer s Desktop Web Page Cmdline Tools Legacy Apps User Code User Mgmt Data Lab Ops Monitoring Presentation Layer Authentication
More informationIs a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationCloud 101: What is the cloud?
Cloud 101: What is the cloud? What is the cloud? There have not been many phenomena in the history of IT that have received so much attention or hype as cloud computing. It seems odd, therefore, that this
More informationScaling in the Cloud with AWS. By: Eli White (CTO & Co-Founder @ mojolive) eliw.com - @eliw - mojolive.com
Scaling in the Cloud with AWS By: Eli White (CTO & Co-Founder @ mojolive) eliw.com - @eliw - mojolive.com Welcome! Why is this guy talking to us? Please ask questions! 2 What is Scaling anyway? Enabling
More informationBig Data and the Cloud Trends, Applications, and Training
Big Data and the Cloud Trends, Applications, and Training Stavros Christodoulakis MUSIC/TUC Lab School of Electronic and Computer Engineering Technical University of Crete stavros@ced.tuc.gr Data Explosion
More informationCLOUD COMPUTING USING HADOOP TECHNOLOGY
CLOUD COMPUTING USING HADOOP TECHNOLOGY DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY SALEM B.NARENDRA PRASATH S.PRAVEEN KUMAR 3 rd year CSE Department, 3 rd year CSE Department, Email:narendren.jbk@gmail.com
More informationFrom Distributed Computing to Distributed Artificial Intelligence
From Distributed Computing to Distributed Artificial Intelligence Dr. Christos Filippidis, NCSR Demokritos Dr. George Giannakopoulos, NCSR Demokritos Big Data and the Fourth Paradigm The two dominant paradigms
More informationINSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043
INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 0 INFORMATION TECHNOLOGY TUTORIAL QUESTION BANK Name : Cloud computing Code : A60519 Class : III B. Tech II Semester Branch
More information3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India
3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India Call for Papers Cloud computing has emerged as a de facto computing
More informationGigaSpaces Real-Time Analytics for Big Data
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationCloud Computing An Elephant In The Dark
Cloud Computing An Elephant In The Dark Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Cloud Computing 1394/2/7 1 / 60 Amir
More informationBig Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
More informationClouds vs Grids KHALID ELGAZZAR GOODWIN 531 ELGAZZAR@CS.QUEENSU.CA
Clouds vs Grids KHALID ELGAZZAR GOODWIN 531 ELGAZZAR@CS.QUEENSU.CA [REF] I Foster, Y Zhao, I Raicu, S Lu, Cloud computing and grid computing 360-degree compared Grid Computing Environments Workshop, 2008.
More informationSpeak<geek> Tech Brief. RichRelevance Distributed Computing: creating a scalable, reliable infrastructure
3 Speak Tech Brief RichRelevance Distributed Computing: creating a scalable, reliable infrastructure Overview Scaling a large database is not an overnight process, so it s difficult to plan and implement
More informationBig Data Spatial Analytics An Introduction
2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data Spatial Analytics An Introduction Marwa Mabrouk Mansour Raad Esri iu UC2013. Technical Workshop
More informationThe evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through
More informationEucalyptus-Based. GSAW 2010 Working Group Session 11D. Nehal Desai
GSAW 2010 Working Group Session 11D Eucalyptus-Based Event Correlation Nehal Desai Member of the Tech. Staff, CSD/CSTS/CSRD, The Aerospace Corporation Dr. Craig A. Lee, lee@aero.org Senior Scientist, CSD/CSTS/CSRD,
More informationScheduling in the Cloud
Scheduling in the Cloud Jon Weissman Distributed Computing Systems Group Department of CS&E University of Minnesota Introduction Cloud Context fertile platform for scheduling research re-think old problems
More informationReal Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
More informationOptimal Deployment of Geographically Distributed Workflow Engines on the Cloud
Optimal Deployment of Geographically Distributed Workflow Engines on the Cloud Long Thai, Adam Barker, Blesson Varghese, Ozgur Akgun and Ian Miguel School of Computer Science, University of St Andrews,
More informationAmazon Web Services. Luca Clementi clem@sdsc.edu Sriram Krishnan sriram@sdsc.edu. NBCR Summer Institute, August 2009
Amazon Web Services Luca Clementi clem@sdsc.edu Sriram Krishnan sriram@sdsc.edu NBCR Summer Institute, August 2009 Introduction Outline Different type of hosting Cloud offering Amazon Web Service Offering
More informationService Orchestration: The Key to the Evolution of the Virtual Data Center
Service Orchestration: The Key to the Evolution of the Virtual Data Center By Jim Metzler, Cofounder, Webtorials Editorial/Analyst Division Introduction Data center managers are faced with an array of
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More information