THE STATE OF GEO BIG DATA IN OPEN SOURCE. Rob Emanuele
|
|
- Timothy Wade
- 8 years ago
- Views:
Transcription
1 THE STATE OF GEO BIG DATA IN OPEN SOURCE Rob Emanuele
2 Who am I? open source geospatial developer working with big geo data. developer at Azavea in Philadelphia, US. maintainer of the GeoTrellis project.
3 GEOBIGDATA FROM A DEVELOPER S PERSPECTIVE
4 Frank Warmerdam PHOTO CREDIT: IAN TURTON
5 Frank Warmerdam Inventor of GDAL Founding director of OSGeo Worked at Google on geospatial systems using MapReduce
6 PlanetLabs
7 PlanetLabs
8 PlanetLabs
9 PlanetLabs
10 PlanetLabs
11 PlanetLabs Processes over 100,000 scenes per day 3-5 meter resolution
12 PlanetLabs - Pipeline position spatially apply geometric corrections apply radiometric corrections apply cloud masking ortho-rectify
13 PlanetLabs - Pipeline GDAL GRASS OSSIM OpenCV
14 BIG DATA IS ABOUT ORCHESTRATION
15 MapReduce (Hadoop) Inflexible: Everything must be a MapReduce job Running locally is painful Debugging is painful
16 PlanetLabs - JobServer PostgreSQL database for job management PostGIS for storing indexed imagery metadata Tasks are orchestrated by machines receiving imagery and the next stage in the pipeline.
17 PlanetLabs - JobServer Allows pipeline operations to be written with C++/python tooling Running batch is very similar to running local Easier to debug
18 PlanetLabs - JobServer workers hitting database causes slowness Postgres/PostGIS is amazing, robust and very fast, but has its limit. It has sharding capabilities for horizontal scalability, but I haven t seen it used in geospatial (is anyone using this?)
19 Horizontal vs Vertical Scalability
20 MIXING HORIZONTAL AND VERTICAL SCALING IS GOING TO CAUSE PAIN.
21 PlanetLabs - JobServer Managing resource allocation is difficult Fault tolerance is hard Advanced orchestration like complex prioritization and task specification are tough, non-geo problems to be solving.
22 ORCHESTRATION IS HARD
23 PlanetLabs - JobServer Managing resource allocation is difficult Fault tolerance is hard Advanced orchestration like complex prioritization and task specification are tough, non-geo problems to be solving.
24 BIG DATA IS ABOUT DEPLOYMENT
25 DEPLOYMENT IS HARD
26 DEPLOYMENT IS HARD; CLOUD PROVIDERS HELP
27 Cloud Providers Amazon Web Services (AWS) Google Cloud Platform OpenStack (e.g. RackSpace)
28 Cloud Providers Amazon Web Services (AWS) Google Cloud Platform OpenStack
29 AWS A set of services for running software on the cloud Many services. SQS, CloudFormation, ECS, EFS, EBS, SWF, Elastic Beanstalk, DynamoDB, Redshift
30 AWS - EC2 Virtual machines that run a variety of hardware specs and operating systems. Spot Instances are cheap! Open source tooling for devops
31 AWS - S3 Object store High availability, distributed access Can share publicly or based on authentication
32 Landsat 8 on AWS Landsat 8 images are published to a public s3 bucket Over 85 TB worth of imagery ts/landsat/
33 Nasa NEX on AWS Downscaled Climate Projections (NEXDCP30) Global Daily Downscaled Projections (NEX-GDDP) MOD13Q1 (Vegetation Indices 16-Day L3 Global 250m) Landsat GLS (Global Land Survey)
34 Nasa NEX on AWS Downscaled Climate Projections (NEXDCP30) Global Daily Downscaled Projections (NEX-GDDP) MOD13Q1 (Vegetation Indices 16-Day L3 Global 250m) Landsat GLS (Global Land Survey)
35 Downsampled Climate Projections Monthly temperature and precipitation data over contiguous US Historical from models, 4 RCP scenarios from netcdf files Over 5 TB of data
36 Local Climate Impact Assessment Modeling Funded by US Department of Energy Azavea in cooperation with Nature Conservancy Goal to make climate model data useful to local regional planners
37
38
39 Hadoop
40 Matei Zaharia
41
42 Apache Spark Open sourced in 2010 under BSD license Formally maintained by UC Berkeley s AMPLab Donated to the Apache Software Foundation in 2013 and relicensed as Apache 2.0 Graduated to a top level Apache project in 2014
43 Apache Spark a distributed computation engine. An API that lets you work with distributed data as a collection. Language bindings for use with Java, Python, and R.
44
45
46
47
48 GeoTrellis a Scala library for geospatial data types and operations. enables Spark with raster capabilities. storage and bounded retrievals from HDFS, Accumulo, and S3
49
50 Accumulo BigTable clone (columnar database) Records stored on HDFS Lexicographically sorted table index
51
52
53
54 Space Filling Curves
55 Space Filling Curves github.com/locationtech/sfcurve
56 Other projects using SFCurve GeoMesa GeoWave
57 Zonal Summaries
58 Zonal Summaries
59 Benchmark Results Yearly Average, 2006 to 2100 Single Layer, GB uncompressed
60 Benchmark Results Yearly Average, 2006 to 2100 Single Layer, GB uncompressed 40 m3.xlarge instances (estimated $2.00 USD per hour on spot market)
61 Summary big data is about orchestration. big data is about deployment. the state of geo big data is the state of big data, with work towards enabling geospatial data types. use Apache Spark! spatial indexing of distributed data is a hot topic.
62 LET S DEVELOP AND USE THE BEST TOOLS POSSIBLE
63 THANK gitter.im/geotrellis/geotrellis github.com/geotrellis/geotrellis
Scalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
More informationA programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
More informationReal Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationRazvoj Java aplikacija u Amazon AWS Cloud: Praktična demonstracija
Razvoj Java aplikacija u Amazon AWS Cloud: Praktična demonstracija Robert Dukarić University of Ljubljana Faculty of Computer and Information Science Laboratory for information systems integration Competence
More informationScalable Application. Mikalai Alimenkou http://xpinjection.com 11.05.2012
Scalable Application Development on AWS Mikalai Alimenkou http://xpinjection.com 11.05.2012 Background Java Technical Lead/Scrum Master at Zoral Labs 7+ years in software development 5+ years of working
More informationThing Big: How to Scale Your Own Internet of Things. Walter'Pernstecher'-'pernstec@amazon.de' Dr.'Markus'Schmidberger'-'schmidbe@amazon.
Thing Big: How to Scale Your Own Internet of Things Walter'Pernstecher'-'pernstec@amazon.de' Dr.'Markus'Schmidberger'-'schmidbe@amazon.de' Internet of Things is the network of physical objects or "things"
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationAli Ghodsi Head of PM and Engineering Databricks
Making Big Data Simple Ali Ghodsi Head of PM and Engineering Databricks Big Data is Hard: A Big Data Project Tasks Tasks Build a Hadoop cluster Challenges Clusters hard to setup and manage Build a data
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationScientific Computing Meets Big Data Technology: An Astronomy Use Case
Scientific Computing Meets Big Data Technology: An Astronomy Use Case Zhao Zhang AMPLab and BIDS UC Berkeley zhaozhang@cs.berkeley.edu In collaboration with Kyle Barbary, Frank Nothaft, Evan Sparks, Oliver
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationAIST Data Symposium. Ed Lenta. Managing Director, ANZ Amazon Web Services
AIST Data Symposium Ed Lenta Managing Director, ANZ Amazon Web Services Why are companies adopting cloud computing and AWS so quickly? #1: Agility The primary reason businesses are moving so quickly to
More informationTrends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
More informationCIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.
CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Cloud Computing and Amazon Web Services Cloud Computing Amazon
More informationOpen Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)
Open Cloud System (Integration of Eucalyptus, Hadoop and into deployment of University Private Cloud) Thinn Thu Naing University of Computer Studies, Yangon 25 th October 2011 Open Cloud System University
More informationLeveraging Big Data Technologies to Support Research in Unstructured Data Analytics
Leveraging Big Data Technologies to Support Research in Unstructured Data Analytics BY FRANÇOYS LABONTÉ GENERAL MANAGER JUNE 16, 2015 Principal partenaire financier WWW.CRIM.CA ABOUT CRIM Applied research
More informationHow To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
More informationCAPTURING & PROCESSING REAL-TIME DATA ON AWS
CAPTURING & PROCESSING REAL-TIME DATA ON AWS @ 2015 Amazon.com, Inc. and Its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent
More informationBig Data for everyone Democratizing big data with the cloud. Steffen Krause Technical Evangelist @AWS_Aktuell skrause@amazon.de
Big Data for everyone Democratizing big data with the cloud Steffen Krause Technical Evangelist @AWS_Aktuell skrause@amazon.de Does this Data make me look big? Overview Designing big data solutions in
More informationTechnology and Cost Considerations for Cloud Deployment: Amazon Elastic Compute Cloud (EC2) Case Study
Creating Value Delivering Solutions Technology and Cost Considerations for Cloud Deployment: Amazon Elastic Compute Cloud (EC2) Case Study Chris Zajac, NJDOT Bud Luo, Ph.D., Michael Baker Jr., Inc. Overview
More informationBig Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
More informationBig Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
More informationBest Practices for Sharing Imagery using Amazon Web Services. Peter Becker
Best Practices for Sharing Imagery using Amazon Web Services Peter Becker Objectives Making Imagery Accessible Store massive volumes of imagery on inexpensive cloud storage Use elastic compute for image
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationSAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES
SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT
More informationMicroservices on AWS
Microservices on AWS AWS Summit Berlin 2016 Matthias Jung, Solutions Architect Julien Simon, Evangelist April, 12 th, 2016 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda
More informationApplying Apache Hadoop to NASA s Big Climate Data!
National Aeronautics and Space Administration Applying Apache Hadoop to NASA s Big Climate Data! Use Cases and Lessons Learned! Glenn Tamkin (NASA/CSC)!! Team: John Schnase (NASA/PI), Dan Duffy (NASA/CO),!
More informationLast time. Today. IaaS Providers. Amazon Web Services, overview
Last time General overview, motivation, expected outcomes, other formalities, etc. Please register for course Online (if possible), or talk to Yvonne@CS Course evaluation forgotten Please assign one volunteer
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationAdministrative Issues
Administrative Issues Make use of office hours We will have to make sure that you have tried yourself before you ask Monitor AWS expenses regularly Always do the cost calculation before launching services
More informationApache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source
Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC http://www.ignite.incubator.apache.org #apacheignite Agenda Apache Ignite (tm) In- Memory
More informationwww.boost ur skills.com
www.boost ur skills.com AWS CLOUD COMPUTING WORKSHOP Write us at training@boosturskills.com BOOSTURSKILLS No 1736 1st Amrutha College Road Kasavanhalli,Off Sarjapur Road,Bangalore-35 1) Introduction &
More informationDistributed Systems. Lec 2: Example use cases: Cloud computing, Big data, Web services
Distributed Systems Lec 2: Example use cases: Cloud computing, Big data, Web services 1 Example Use Cases Cloud computing (today) What it means and how it began Big data (today) Role of distributed systems
More informationComparing Ganeti to other Private Cloud Platforms. Lance Albertson Director lance@osuosl.org @ramereth
Comparing Ganeti to other Private Cloud Platforms Lance Albertson Director lance@osuosl.org @ramereth About me OSU Open Source Lab Server hosting for Open Source Projects Open Source development projects
More informationApache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
More informationCloud Computing Summary and Preparation for Examination
Basics of Cloud Computing Lecture 8 Cloud Computing Summary and Preparation for Examination Satish Srirama Outline Quick recap of what we have learnt as part of this course How to prepare for the examination
More informationReferences. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline
References Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of
More informationIntroduction to Database Systems CSE 444
Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon References Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of
More informationUsing Big Data and GIS to Model Aviation Fuel Burn
Using Big Data and GIS to Model Aviation Fuel Burn Gary M. Baker USDOT Volpe Center 2015 Transportation DataPalooza June 17, 2015 The National Transportation Systems Center Advancing transportation innovation
More informationBackground on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros
David Moses January 2014 Paper on Cloud Computing I Background on Tools and Technologies in Amazon Web Services (AWS) In this paper I will highlight the technologies from the AWS cloud which enable you
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More informationCloud Security in Map/Reduce An Analysis July 31, 2009. Jason Schlesinger ropyrusk@gmail.com
Cloud Security in Map/Reduce An Analysis July 31, 2009 Jason Schlesinger ropyrusk@gmail.com Presentation Overview Contents: 1. Define Cloud Computing 2. Introduce and Describe Map/Reduce 3. Introduce Hadoop
More informationComparing Open Source Private Cloud (IaaS) Platforms
Comparing Open Source Private Cloud (IaaS) Platforms Lance Albertson OSU Open Source Lab Associate Director of Operations lance@osuosl.org / @ramereth About me OSU Open Source Lab Server hosting for Open
More informationFast and Expressive Big Data Analytics with Python. Matei Zaharia UC BERKELEY
Fast and Expressive Big Data Analytics with Python Matei Zaharia UC Berkeley / MIT UC BERKELEY spark-project.org What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationEEDC. Scalability Study of web apps in AWS. Execution Environments for Distributed Computing
EEDC Execution Environments for Distributed Computing 34330 Master in Computer Architecture, Networks and Systems - CANS Scalability Study of web apps in AWS Sergio Mendoza sergio.mendoza@est.fib.upc.edu
More informationScaling in the Cloud with AWS. By: Eli White (CTO & Co-Founder @ mojolive) eliw.com - @eliw - mojolive.com
Scaling in the Cloud with AWS By: Eli White (CTO & Co-Founder @ mojolive) eliw.com - @eliw - mojolive.com Welcome! Why is this guy talking to us? Please ask questions! 2 What is Scaling anyway? Enabling
More informationCloud Computing Now and the Future Development of the IaaS
2010 Cloud Computing Now and the Future Development of the IaaS Quanta Computer Division: CCASD Title: Project Manager Name: Chad Lin Agenda: What is Cloud Computing? Public, Private and Hybrid Cloud.
More informationCloud 101. Mike Gangl, Caltech/JPL, michael.e.gangl@jpl.nasa.gov 2015 California Institute of Technology. Government sponsorship acknowledged
Cloud 101 Mike Gangl, Caltech/JPL, michael.e.gangl@jpl.nasa.gov 2015 California Institute of Technology. Government sponsorship acknowledged Outline What is cloud computing? Cloud service models Deployment
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationIntroduction to AWS in Higher Ed
Introduction to AWS in Higher Ed Lori Clithero loricli@amazon.com 206.227.5054 University of Washington Cloud Day 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 2 Cloud democratizes
More informationWhat is Cloud Computing? Tackling the Challenges of Big Data. Tackling The Challenges of Big Data. Matei Zaharia. Matei Zaharia. Big Data Collection
Introduction What is Cloud Computing? Cloud computing means computing resources available on demand Resources can include storage, compute cycles, or software built on top (e.g. database as a service)
More informationHBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367
HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive
More informationA Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
More informationwww.basho.com Technical Overview Simple, Scalable, Object Storage Software
www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...
More informationBig Data and Cloud Computing for GHRSST
Big Data and Cloud Computing for GHRSST Jean-Francois Piollé (jfpiolle@ifremer.fr) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge
More informationBig Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
More informationScaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf
Scaling Out With Apache Spark DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Your hosts Mathijs Kattenberg Technical consultant Jeroen Schot Technical consultant
More informationCloud Computing. A new kind of developers? Presentation by. Nick Barcet nick.barcet@canonical.com
Cloud Computing A new kind of developers? Presentation by Nick Barcet nick.barcet@canonical.com www.canonical.com July 2011 Cloud computing stack Salesforce.com, GoogleDocs, Office, etc... GoogleApps,
More informationLets SAAS-ify that Desktop Application
Lets SAAS-ify that Desktop Application Chirag Jog Clogeny 1 About me o Chirag Jog o Computer Science Passout, PICT o Currently CTO at Clogeny Technologies. o Working on some cutting-edge Products in Cloud
More informationGeoCloud Project Report GEOSS Clearinghouse
GeoCloud Project Report GEOSS Clearinghouse Qunying Huang, Doug Nebert, Chaowei Yang, Kai Liu 2011.12.06 Description of Application GEOSS clearinghouse is a FGDC, GEO, and NASA project that connects directly
More informationAn Open Source Memory-Centric Distributed Storage System
An Open Source Memory-Centric Distributed Storage System Haoyuan Li, Tachyon Nexus haoyuan@tachyonnexus.com September 30, 2015 @ Strata and Hadoop World NYC 2015 Outline Open Source Introduction to Tachyon
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationServers. Servers. NAT Public Subnet: 172.30.128.0/20. Internet Gateway. VPC Gateway VPC: 172.30.0.0/16
.0 Why Use the Cloud? REFERENCE MODEL Cloud Development April 0 Traditionally, deployments require applications to be bound to a particular infrastructure. This results in low utilization, diminished efficiency,
More informationCloud Computing Training
Cloud Computing Training TechAge Labs Pvt. Ltd. Address : C-46, GF, Sector 2, Noida Phone 1 : 0120-4540894 Phone 2 : 0120-6495333 TechAge Labs 2014 version 1.0 Cloud Computing Training Cloud Computing
More informationCloud Big Data Architectures
Cloud Big Data Architectures Lynn Langit QCon Sao Paulo, Brazil 2016 About this Workshop Real-world Cloud Scenarios w/aws, Azure and GCP 1. Big Data Solution Types 2. Data Pipelines 3. ETL and Visualization
More informationHadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
More informationBig Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
More informationBIG DATA USING HADOOP
+ Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data
More informationMesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)
UC BERKELEY Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II) Anthony D. Joseph LASER Summer School September 2013 My Talks at LASER 2013 1. AMP Lab introduction 2. The Datacenter
More informationGetting Started with Database As a Service on OpenStack
White Paper Getting Started with Database As a Service on OpenStack Today s Database Management Challenges The last decade of computing technologies have been dominated by the proliferation of virtualization
More informationBuilding 1000 node cluster on EMR Manjeet Chayel
Building 1000 node cluster on EMR Manjeet Chayel What is EMR? Amazon Elas+c MapReduce Hadoop- as- a- service Map- Reduce engine What is EMR? Integrated with tools Massively parallel Integrated to AWS services
More informationCloud Computing. Lecture 24 Cloud Platform Comparison 2014-2015
Cloud Computing Lecture 24 Cloud Platform Comparison 2014-2015 1 Up until now Introduction, Definition of Cloud Computing Pre-Cloud Large Scale Computing: Grid Computing Content Distribution Networks Cycle-Sharing
More informationBig data blue print for cloud architecture
Big data blue print for cloud architecture -COGNIZANT Image Area Prabhu Inbarajan Srinivasan Thiruvengadathan Muralicharan Gurumoorthy Praveen Codur 2012, Cognizant Next 30 minutes Big Data / Cloud challenges
More informationOpenStack. Orgad Kimchi. Principal Software Engineer. Oracle ISV Engineering. 1 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
OpenStack Orgad Kimchi Principal Software Engineer Oracle ISV Engineering 1 Copyright 2013, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is intended to outline
More informationAWS Performance Tuning
AWS Performance Tuning Markus Albe @Percona Fernando Ipar @Percona Ryan Lowe @Square PLNY 2012 Amazon Web Services Cloud Formation CloudFront CloudSearch CloudWatch DirectConnect DynamoDB ec2 ElastiCache
More informationCloud Hosting. QCLUG presentation - Aaron Johnson. Amazon AWS Heroku OpenShift
Cloud Hosting QCLUG presentation - Aaron Johnson Amazon AWS Heroku OpenShift What is Cloud Hosting? According to the Wikipedia - 2/13 Cloud computing, or in simpler shorthand just "the cloud", focuses
More informationBig Data Analysis: Apache Storm Perspective
Big Data Analysis: Apache Storm Perspective Muhammad Hussain Iqbal 1, Tariq Rahim Soomro 2 Faculty of Computing, SZABIST Dubai Abstract the boom in the technology has resulted in emergence of new concepts
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationLeveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline
More informationLustre * Filesystem for Cloud and Hadoop *
OpenFabrics Software User Group Workshop Lustre * Filesystem for Cloud and Hadoop * Robert Read, Intel Lustre * for Cloud and Hadoop * Brief Lustre History and Overview Using Lustre with Hadoop Intel Cloud
More informationA Cost-Evaluation of MapReduce Applications in the Cloud
1/23 A Cost-Evaluation of MapReduce Applications in the Cloud Diana Moise, Alexandra Carpen-Amarie Gabriel Antoniu, Luc Bougé KerData team 2/23 1 MapReduce applications - case study 2 3 4 5 3/23 MapReduce
More informationCloud Providers, SciCloudand
Basics of Cloud Computing Lecture 2 Cloud Providers, SciCloudand Research on Cloud at UT Satish Srirama Outline Cloud computing services recap Amazon cloud services Elastic Compute Cloud (EC2) Storage
More informationSpark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY
Spark in Action Fast Big Data Analytics using Scala Matei Zaharia University of California, Berkeley www.spark- project.org UC BERKELEY My Background Grad student in the AMP Lab at UC Berkeley» 50- person
More informationIn Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
More information6.S897 Large-Scale Systems
6.S897 Large-Scale Systems Instructor: Matei Zaharia" Fall 2015, TR 2:30-4, 34-301 bit.ly/6-s897 Outline What this course is about" " Logistics" " Datacenter environment What this Course is About Large-scale
More informationSearch and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
More informationCSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei
CSE 344 Introduction to Data Management Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei Homework 8 Big Data analysis on billion triple dataset using Amazon Web Service (AWS) Billion Triple Set: contains
More informationMonitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
More informationMassively! Continuous Integration! A case study for Jenkins at cloud-scale
Massively! Continuous Integration! A case study for Jenkins at cloud-scale Thank you to our sponsors Platinum Sponsor Gold Sponsors Silver Sponsors Bronze Sponsors Jesse Dowdle, Sr Manager of Development
More informationHadoop on OpenStack Cloud. Dmitry Mescheryakov Software Engineer, @MirantisIT
Hadoop on OpenStack Cloud Dmitry Mescheryakov Software Engineer, @MirantisIT Agenda OpenStack Sahara Demo Hadoop Performance on Cloud Conclusion OpenStack Open source cloud computing platform 17,209 commits
More informationAmazon Hosted ESRI GeoPortal Server. GeoCloud Project Report
Amazon Hosted ESRI GeoPortal Server GeoCloud Project Report Description of Application Operating Organization The USDA participated in the FY 2011 Federal Geographic Data Committee (FGDC) GeoCloud Sandbox
More informationBig Data Analytics Hadoop and Spark
Big Data Analytics Hadoop and Spark Shelly Garion, Ph.D. IBM Research Haifa 1 What is Big Data? 2 What is Big Data? Big data usually includes data sets with sizes beyond the ability of commonly used software
More informationDéveloppement logiciel pour le Cloud (TLC)
Développement logiciel pour le Cloud (TLC) 7. Infrastructure-as-a-Service Guillaume Pierre Université de Rennes 1 Fall 2012 http://www.globule.org/~gpierre/ Développement logiciel pour le Cloud (TLC) 1
More informationCUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series www.cumulux.com
` CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS Review Business and Technology Series www.cumulux.com Table of Contents Cloud Computing Model...2 Impact on IT Management and
More informationCLOUD COMPUTING THOMAS BOLTZE CTO SKY & SAND THOMAS.BOLTZE@SKY-SAND.COM
Sky&Sand CLOUD COMPUTING THOMAS BOLTZE CTO SKY & SAND THOMAS.BOLTZE@SKY-SAND.COM What is Cloud Computing? A new way to bill for resources: Hourly (virtual server, RDS) Data volume (S3, Glacier, SendGrid)
More informationIntroduction to Cloud Computing
Discovery 2015: Cloud Computing Workshop June 20-24, 2011 Berkeley, CA Introduction to Cloud Computing Keith R. Jackson Lawrence Berkeley National Lab What is it? NIST Definition Cloud computing is a model
More informationApplication Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
More information