How To Use Hadoop For Gis
|
|
|
- Katherine Anthony
- 5 years ago
- Views:
Transcription
1 2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data: Using ArcGIS with Apache Hadoop David Kaiser Erik Hoel Offering 1330 Esri UC2013. Technical Workshop.
2 In this technical workshop This presentation is for anyone that uses ArcGIS and is interested in analyzing large amounts of data We will cover: - Big Data overview - The Hadoop platform - How Esri s GIS Tools for Hadoop enables developers to process spatial data on Hadoop - How ArcGIS can leverage these custom Hadoop applications for GIS analysis
3 Big Data Overview Esri UC2013. Technical Workshop.
4 Big Data Within ArcGIS, Geoprocessing was enhanced at 10.1 SP1 to support 64-bit address spaces - This is sufficient to handle traditional large GIS datasets However, this solution may run into problems when confronted with datasets of a size that are colloquially referred to as Big Data - Internet scale datasets
5 Age of Data Ubiquity Data is now central to our existence both for corporations and individuals Nimble, thin, data-centric apps exploiting massive data sets generated by both enterprises and consumers Hardware era: years Software era: years Data era:?
6 Age of Data Ubiquity Data is now central to our existence both for corporations and individuals "The Internet has caused a Cambrian explosion Nimble, thin, of data-centric new life forms, new apps applications, exploiting new massive datadata sets generated centric APIs, by literally both enterprises thousands and thousands consumers of new APIs are born every month." - Mike Hoskins Hardware era: years Software era: years Data era:?
7 Sensor data challenge Sensors are continuously observing our world - With increases in networking technology, we are moving away from pure remote sensing and moving towards direct sensing, where every sensor reports it s own data One researcher says: We are instrumenting the universe, referring to what is often called The Internet of Things
8 Internet of Things IoT - uniquely identifiable objects and their virtual representations in an Internet-like structure; the term was proposed by Kevin Ashton (MIT) in 1999 Your old refrigerator was a dumb refrigerator Your new refrigerator is a smart refrigerator - It s a digital asset aware of itself - It records time, temperatures, vibrations - It records the electricity it s consuming - It s connected to the Internet as an IP device More realistically, commercial jets or valuable equipment By continuously observing things, we obtain a massive amount of valuable information
9 Volume of data the rise of data anarchy If all 7 billion people on Earth joined Twitter and continually tweeted for one century, they would generate one zettabyte of data (billion terabytes) Almost double that amount, 1.8 zettabytes (1.8 x ), was generated globally in 2011 (and 2.8 ZB in 2012) Rising 40-50% per year Estimated over 40 ZB in 2020
10 Volume of data the rise of data anarchy If all 7 billion people on Earth joined Twitter and continually tweeted for one century, they would generate one zettabyte of data (Hadhazy, 2010) Almost double that amount, 1.8 zettabytes (1.8 x ), was generated globally in 2011(and 2.8 ZB in 2012) Rising 40-50% per year Estimated over 40 ZB in 2020
11 Example For every hour that it runs, a commercial jet aircraft can create 20 gigabytes of operations information For a single journey across the Atlantic Ocean, a twoengine jet can create over 125 GB of data Multiply that by the more than 100,000 flights flown each day, and you get an understanding of the enormous amount of data that exists - E.g., 5-10 petabytes/day; 2-4 exabytes/year
12 All this data has business value Smart Sensors - Electrical meters GPS Telemetry - Vehicle tracking, smartphone data collectors Internet Services - E-Commerce transactions, social media Monitoring Sensors - Heavy equipment, aircraft, etc.
13 Value when analyzing data at mass scale As observations increase in frequency - Each individual observation is worth less - as the set of all observations becomes more valuable One single metric from the jet aircraft is much less useful than the analysis of that metric against the same metric from every known flight of that aircraft over time Big Data is the accumulation and analytical processes that uses this data for business value
14 What is Hadoop? Esri UC2013. Technical Workshop. Racks of Hadoop Servers inside the Facebook.com Data Center
15 The need for distributed systems As data volumes have grown, individual computing systems have not scaled equivalently, e.g., - Single-node databases - Single computers Big data has led to distributed systems: - Data is stored across a number of servers - Processing the data takes place in the server where the data is stored
16 Legacy system architecture Processors Disks Network Bottleneck
17
18
19 Distributed system architecture Processing Elements
20 Distributed system architecture Processing Elements
21 Until Now Google implemented their enterprise on a distributed network of many nodes, fusing storage and processing into each node Hadoop is an open source implementation of the framework that Google has built their business around for many years
22 Until Now Google implemented their enterprise on a distributed network of many nodes, fusing storage and processing into each node Hadoop is an open implementation of the framework that Google has built their business around for many years
23 Hadoop Open-source data processing framework Provides scalable, distributed system for both storage and computation of data - It supports the running of applications on large clusters of commodity hardware - Hadoop was derived from Google's MapReduce and Google File System (GFS) scientific papers Platform consists of the kernel, HDFS, and MapReduce - Other applications have been built on Hadoop; this includes Hive, HBase, etc.
24 MapReduce A programming model for processing large data sets with a parallel distributed algorithm on a cluster A MapReduce program is comprised of: - A Map() procedure that performs filtering and comparing, and - A Reduce() procedure that performs a summary operation The MapReduce System marshals the distributed servers, runs the various tasks in parallel, manages all communications and data transfers, and provides for redundancy and failures MapReduce libraries have been written in many languages; Hadoop is a popular open source implementation
25 High Level MapReduce Walk-Through An instance of a MapReduce program is a job The job accepts arguments for data input and outputs The job combines - Functions from the MapReduce framework - Splitting large inputs into smaller pieces - Reading inputs and writing outputs - Functions that are written by the application developer - Map function, maps input values to keys - Reduce function, reduces many keys to one
26 High Level MapReduce Walk-Through An instance of a MapReduce program is a job MapReduce Job
27 High Level MapReduce Walk-Through The job accepts arguments for data input and outputs MapReduce Job data hdfs://path/to/input hdfs://path/to/output
28 High Level MapReduce Walk-Through The MapReduce framework splits the data space MapReduce Job Split data hdfs://path/to/input hdfs://path/to/output
29 High Level MapReduce Walk-Through Many map() functions are run in parallel MapReduce Job Split map() data map() map() map() hdfs://path/to/input hdfs://path/to/output
30 High Level MapReduce Walk-Through The framework runs combine to join map() outputs MapReduce Job Split Combine map() data map() map() map() hdfs://path/to/input hdfs://path/to/output
31 High Level MapReduce Walk-Through The framework performs a shuffle and sort on all data MapReduce Job Split Combine Shuffle / Sort map() data map() map() map() hdfs://path/to/input hdfs://path/to/output
32 High Level MapReduce Walk-Through The reduce() functions work against the sorted data MapReduce Job Split Combine Shuffle / Sort map() data map() reduce() map() map() reduce() hdfs://path/to/input hdfs://path/to/output
33 High Level MapReduce Walk-Through Reducers each write a part of the results to a file MapReduce Job Split Combine Shuffle / Sort map() data map() reduce() part 1 map() reduce() part 2 map() hdfs://path/to/input hdfs://path/to/output
34 High Level MapReduce Walk-Through When the job has finished, ArcGIS can retrieve the results Split Combine Shuffle / Sort map() data map() reduce() part 1 map() reduce() part 2 map() hdfs://path/to/input hdfs://path/to/output
35 MapReduce Polygon Count Example Split 1 / Record 1 AK AK OR AK WA Split 2 / Record 1 WA OR AK WA Split 3 / Record 1 WA OR OR OR Map Map Map AK 1 AK 1 OR 1 AK 1 WA 1 WA 1 OR 1 AK 1 WA 1 WA 1 OR 1 OR 1 OR 1 Shuffle / Sort WA 1 WA 1 WA 1 AK 1 AK 1 AK 1 AK 1 OR 1 OR 1 OR 1 OR 1 OR 1 Reduce Reduce Washington Alaska Oregon WA 3 AK 4 OR 5
36 Yahoo! Hadoop Cluster in 2011 ~10,000 servers in a number of clusters Largest cluster is 1,600 nodes Nearly 1 Petabyte of user data Yahoo! runs nearly 10,000 research jobs per month
37 We didn t bring 10,000 servers to the user conference but we do have a small Hadoop cluster we can use for demos
38 GIS Tools for Hadoop Credit: Fuqing Zhang and Yonghui Weng, Pennsylvania State University; Frank Marks, NOAA; Gregory P. Esri UC2013. Technical Workshop. Johnson, Romy Schneider, John Cazes, Karl Schulz, Bill Barth, The University of Texas at Austin
39 GIS Tools for Hadoop Hadoop is a data processing system that is designed to store and process large amounts of data The most common Hadoop data processing task is to reduce a large amount of data to a smaller, more manageable amount of data The GIS Tools for Hadoop provide query functions and API methods that enable Hadoop application developers to perform this data reduction process on spatial data
40
41 Data Reduction Patterns Need to reduce large volumes of data into manageable datasets that can be processed in the ArcGIS Platform - Filtering - Grouping - Simple binning against grid cells - Aggregation into known spatial areas - More complex patterns if desired
42 GIS Tools for Hadoop Developer API to support data reduction workflows - Release in March 2013 at Esri Developer Summit Spatial reduction with full geometry capabilities - Relational operators (touch, disjoint, ) - Topological operators (buffer, union, intersection, ) - Accessible via - SQL Expressions - Java Small set of GP Tools to migrate data and invoke jobs Esri UC2013. Technical Workshop.
43 GIS Tools for Hadoop tools samples Spatial Framework for Hadoop hive Tools and samples using the open source resources that solve specific problems Spatial type functions for Hive, a SQL Engine on Hadoop JSON helper utilities spatial-sdk-hive.jar json spatial-sdk-json.jar Geoprocessing Tools for Hadoop HadoopTools.pyt Geometry API Java esri-geometry-api.jar Geoprocessing tools that Copy to/from Hadoop Convert to/from JSON Invoke Hadoop jobs Java geometry library for spatial data processing Esri UC2013. Technical Workshop.
44 DEMONSTRATION MapReduce Demos Esri UC2013. Technical Workshop.
45 Earthquake Measurement Data Data format is CSV, already stored in HDFS file 1964/03/28 06:42:57.00,57.98,-151.6,33.0,5.2,ML,0 1964/03/28 06:43:54.40,58.26, ,4.0,6.1,ML,0 1964/03/28 06:50:52.00,56.9,-151.7,33.0,5.1,ML,0 1964/03/28 06:53:35.90,58.79, ,20.0,5.7,ML,0 1964/03/28 07:09:07.60,59.79,-148.1,13.0,5.3,ML,0 1964/03/28 07:10:22.00,58.83, ,17.0,6.2,ML,0 1964/03/28 07:16:16.70,58.0,-150.7,33.0,5.3,ML,0 1964/03/28 07:24:24.60,59.62,-148.7,20.0,5.1,ML,0 Coordinates Depth Magnitude
46 DEMONSTRATION Grouping by Polygons In this demo, we use ArcGIS to provide the polygonal boundaries of the U.S. States, and the Hadoop application aggregates earthquake data by the state boundaries Esri UC2013. Technical Workshop.
47 DEMONSTRATION Grouping into Grid Cells In this demo, we compute arbitrary grid cells such as a 1km grid, and aggregate earthquakes by these cells Esri UC2013. Technical Workshop.
48 The Future GIS Tools for Hadoop is a developer story - Our initial foray into Big Data Esri will continue forward in this domain - User stories and technology will be the focus Swing by the island to chat further Please fill out the evaluations: offering 1330
49 Questions? Esri UC2013. Technical Workshop.
50
Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel
Big Data and Analytics: A Conceptual Overview Mike Park Erik Hoel In this technical workshop This presentation is for anyone that uses ArcGIS and is interested in analyzing large amounts of data We will
Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park
Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable
Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
Big Data Spatial Analytics An Introduction
2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data Spatial Analytics An Introduction Marwa Mabrouk Mansour Raad Esri iu UC2013. Technical Workshop
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.
Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has
Manifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
Investigating Hadoop for Large Spatiotemporal Processing Tasks
Investigating Hadoop for Large Spatiotemporal Processing Tasks David Strohschein [email protected] Stephen Mcdonald [email protected] Benjamin Lewis [email protected] Weihe
Hadoop Big Data for Processing Data and Performing Workload
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
Data-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan [email protected] Abstract Every day, we create 2.5 quintillion
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
Big Data R&D Initiative
Big Data R&D Initiative Howard Wactlar CISE Directorate National Science Foundation NIST Big Data Meeting June, 2012 Image Credit: Exploratorium. The Landscape: Smart Sensing, Reasoning and Decision Environment
Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
Are You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
Hadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
Large scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Big Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Mr. Apichon Witayangkurn [email protected] Department of Civil Engineering The University of Tokyo
Sensor Network Messaging Service Hive/Hadoop Mr. Apichon Witayangkurn [email protected] Department of Civil Engineering The University of Tokyo Contents 1 Introduction 2 What & Why Sensor Network
A bit about Hadoop. Luca Pireddu. March 9, 2012. CRS4Distributed Computing Group. [email protected] (CRS4) Luca Pireddu March 9, 2012 1 / 18
A bit about Hadoop Luca Pireddu CRS4Distributed Computing Group March 9, 2012 [email protected] (CRS4) Luca Pireddu March 9, 2012 1 / 18 Often seen problems Often seen problems Low parallelism I/O is
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani
Big Data and Hadoop Sreedhar C, Dr. D. Kavitha, K. Asha Rani Abstract Big data has become a buzzword in the recent years. Big data is used to describe a massive volume of both structured and unstructured
An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
Using Big Data and GIS to Model Aviation Fuel Burn
Using Big Data and GIS to Model Aviation Fuel Burn Gary M. Baker USDOT Volpe Center 2015 Transportation DataPalooza June 17, 2015 The National Transportation Systems Center Advancing transportation innovation
Application Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 [email protected], 2 [email protected],
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Keywords: Big Data, HDFS, Map Reduce, Hadoop
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning
A very short Intro to Hadoop
4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,
High Performance Spatial Queries and Analytics for Spatial Big Data. Fusheng Wang. Department of Biomedical Informatics Emory University
High Performance Spatial Queries and Analytics for Spatial Big Data Fusheng Wang Department of Biomedical Informatics Emory University Introduction Spatial Big Data Geo-crowdsourcing:OpenStreetMap Remote
Latest Developments in Oceanographic Applications of GIS including!
Latest Developments in Oceanographic Applications of GIS including! Near-Real-Time Interactive Map Exemplars & Scientific Empowerment Through Storytelling Dawn Wright Esri Chief Scientist Affiliate Professor,
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.
A Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
Maximizing Hadoop Performance with Hardware Compression
Maximizing Hadoop Performance with Hardware Compression Robert Reiner Director of Marketing Compression and Security Exar Corporation November 2012 1 What is Big? sets whose size is beyond the ability
Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014
White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
BIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
HADOOP: Scalable, Flexible Data Storage and Analysis
HADOOP: Scalable, Flexible Data Storage and Analysis By Mike Olson Beginning in the early 000s, Google faced a serious challenge. Its mission to organize the world s information meant that it was crawling,
Big Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
How Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
marlabs driving digital agility WHITEPAPER Big Data and Hadoop
marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil
Scalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
Hadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
Massive Cloud Auditing using Data Mining on Hadoop
Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013
Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
A Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
BIG DATA USING HADOOP
+ Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data
Data Mining in the Swamp
WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all
HDFS. Hadoop Distributed File System
HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files
Real Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
Map Reduce & Hadoop Recommended Text:
Big Data Map Reduce & Hadoop Recommended Text:! Large datasets are becoming more common The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
A programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
BIG DATA SOLUTION DATA SHEET
BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest
Market Basket Analysis Algorithm on Map/Reduce in AWS EC2
Market Basket Analysis Algorithm on Map/Reduce in AWS EC2 Jongwook Woo Computer Information Systems Department California State University Los Angeles [email protected] Abstract As the web, social networking,
Are You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
Hadoop and its Usage at Facebook. Dhruba Borthakur [email protected], June 22 rd, 2009
Hadoop and its Usage at Facebook Dhruba Borthakur [email protected], June 22 rd, 2009 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed on Hadoop Distributed File System Facebook
Parallel Processing of cluster by Map Reduce
Parallel Processing of cluster by Map Reduce Abstract Madhavi Vaidya, Department of Computer Science Vivekanand College, Chembur, Mumbai [email protected] MapReduce is a parallel programming model
Enabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
Search and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS
Sean Lee Solution Architect, SDI, IBM Systems SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Agenda Converging Technology Forces New Generation Applications Data Management Challenges
BIRT in the World of Big Data
BIRT in the World of Big Data David Rosenbacher VP Sales Engineering Actuate Corporation 2013 Actuate Customer Days Today s Agenda and Goals Introduction to Big Data Compare with Regular Data Common Approaches
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director [email protected] Dave Smelker, Managing Principal [email protected]
Apache Hadoop FileSystem and its Usage in Facebook
Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System [email protected] Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs
A Survey on Big Data Concepts and Tools
A Survey on Big Data Concepts and Tools D. Rajasekar 1, C. Dhanamani 2, S. K. Sandhya 3 1,3 PG Scholar, 2 Assistant Professor, Department of Computer Science and Engineering, Sri Krishna College of Engineering
The Next Wave of Data Management. Is Big Data The New Normal?
The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
Prepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
Large-Scale Data Processing
Large-Scale Data Processing Eiko Yoneki [email protected] http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
