The little elephant driving Big Data
|
|
- Gloria Stanley
- 8 years ago
- Views:
Transcription
1 The little elephant driving Big Data Despite the funny-sounding name, Hadoop is a serious enterprise software suite that drives Big Data Hadoop enables the storage and processing of very large databases in a cluster of inexpensive servers Internet companies such as Yahoo, Facebook, LinkedIn and many others use Hadoop to manage their databases There is a growing community of startups focused on expanding and commercializing Hadoop technology DEBORAH WEINSWIG Executive Director Head of Global Retail & Technology Fung Business Intelligence Centre deborahweinswig@fung1937.com New York:
2 The little elephant driving Big Data Executive Summary Hadoop represents the circulatory and central nervous systems of Big Data. Despite the funny- sounding name, Hadoop is a serious enterprise software suite that enables the storage and processing of very large databases in a cluster of inexpensive servers. Why Hadoop? We are awash in data, and the situation is likely to grow even more acute as machines, appliances and our clothing become part of the Internet of Things. Cisco Systems forecasts that the amount of Internet Protocol traffic passing through data centers will grow at a 23% CAGR to 8.6 zettabytes (that s 8.6 x bytes) during A Hadoop system has two main parts: a distributed file system that handles the storage of data across a cluster of servers (or nodes), and a management program that coordinates the storage of data and the running of programs within the individual nodes. A key distinction of Hadoop is that the individual nodes both store the data and handle processing in a parallel fashion. This provides redundancy of both storage and processing, so that if a server drops out, no data is lost and no processing is interrupted. Hadoop is based on software originally written at Google, which its author reverse- engineered and altruistically made available in the public domain as open source. That led to widespread adoption by many Internet companies and enterprises. The funny- sounding name comes from the toy stuffed elephant that belonged to the son of its creator. Hadoop s adoption built upon itself and fostered an ecosystem of Hadoop tools and add- ons. Extensions and tools for Hadoop also have odd- sounding names: Pig, Hive and HBase. A community of startups focused on expanding and commercializing Hadoop technology has also emerged. This report contains a list of 13 startups, which have raised a total of nearly $1.7 billion. One Hadoop- focused company, Hortonworks, recently raised $115 million in an IPO, and Cloudera is the next leading contender to go public, having already raised $1.2 billion. Most Internet names you know Yahoo, Facebook, LinkedIn and many others use Hadoop to manage their databases. Since Hadoop, founded in 2006, is getting a bit old in Internet years, other technologies continuing the tradition of odd- sounding names such as Percolator, Dremel and Pregel have emerged for handling large- scale databases. Interestingly, Google, the original creator of the underlying technology, has moved on to the aforementioned technologies and is therefore no longer a big Hadoop user. Hadoop is the little yellow stuffed animal with the funny name that powers many of the Internet services we depend on today. 2
3 History of Hadoop In the early 2000s, Google faced an immense technical challenge: how to organize the entire world s information, which was stored on the Internet and steadily growing in volume. No commercially available software was up to the task, and Google s custom- designed hardware was running out of steam. Google engineers Jeff Dean and Sanjay Ghemawat designed two tools to solve this problem Google File System (GFS) for fault- tolerant, reliable and scalable storage, and Google Map/Reduce (GMR), for parallel data analysis across a large number of servers which they described in an academic paper published in At that time, Doug Cutting was a well- known open- source software developer who was working on a web- indexing program and was facing similar challenges. Cutting replaced the data collection and processing code in his web crawler with reverse- engineered versions of GFS and GMR and named the framework after his two- year- old son s toy elephant, Hadoop. Learning of Cutting s work, Yahoo! invested in Hadoop s development, and Cutting decided that Hadoop would remain open source and therefore free to use, and available for expansion and improvement by everyone. By 2006, established and emerging web companies had started to use Hadoop in production systems. Today, the Apache Software Foundation coordinates Hadoop development, and Mr. Cutting is Chief Architect at Cloudera, which was founded in 2008 to commercialize Hadoop technology. (The Apache HTTP Server software, commonly just called Apache, is the world s most widely used software for running web servers.) What is Hadoop? According to its current home, Apache, Hadoop is a framework for running applications on large cluster[s] built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. The first part of that description means that Hadoop lets large banks of computers analyze data; the latter part means Hadoop has built- in redundancy that can recover from the failure of a server or a rack of servers, and the framework can process data that is changing over time. Hadoop has two main parts: Map/Reduce, which divides a large piece of data into many small fragments; this analysis can be executed or re- executed on any node in a cluster Hadoop Distributed File System (HDFS), which stores data within the nodes of the cluster; this file system scheme offers high aggregate bandwidth across the cluster Prominent Hadoop users include Facebook, Yahoo! and LinkedIn. We are awash in data, and the amount of data that needs to be processed will only get larger as the Internet of Things grows to include smart home appliances and other objects, including wearable technology that contains multiple sensors that generate reams of data. Figure 1 illustrates Cisco Systems forecast that the amount of Internet Protocol traffic in zettabytes passing through data centers will grow at a 23% 21 CAGR during One zettabyte is 10 bytes, or 10,000,000,000,000,000,000,000 bytes. 3
4 Figure 1. Global Data Center IP Traffic Growth Ze$abytes per Year Source: Cisco Global Cloud Index, The SAS Institute outlines the benefits of Hadoop: It s inexpensive: Hadoop uses low- cost commodity hardware It s scalable: More nodes can be added to increase capacity and processing power It can use unstructured data: Any data type can be used It employs parallel processing and redundancy: Hadoop can process multiple copies of data and redirect jobs from malfunctioning servers Hadoop has its limitations, including: Other methods for rationalizing clusters with data center infrastructure are being developed Data security is fragmented Map/Reduce is batch oriented Its ecosystem lacks easy- to- use, full- feature tools for data integration and other functions Skilled Hadoop professionals are few and expensive Technology continues to evolve, and next- generation alternatives to Hadoop are emerging. Google, the pioneer of Hadoop technology, is moving away from Map/Reduce and is embracing other technologies such as Percolator, Dremel and Pregel. 4
5 Figure 2 shows the three key parts of a Hadoop cluster. How Does Hadoop Work? Client Machines load data into the cluster, submit Map/Reduce jobs that describe how the jobs should be processed, and then retrieve and view the results of the finished jobs Master Nodes oversee data storage with HDFS and running parallel computations via Map/Reduce Slave Nodes store the data and run the computations Figure 2. Hadoop Server Roles Source: BradHedlund.com 5
6 Figure 3 illustrates the Hadoop Distributed File System, which breaks up a large file into pieces that are redundantly stored on several different servers, offering the ability to recover completely from the failure of one (or even two) servers. Figure 3. Source: Cloudera Figure 4 illustrates the Map/Reduce software framework, which similarly breaks a problem into smaller pieces, which are redundantly processed, offering the ability to complete the analysis in the event of the failure of one or more servers. Figure 4. Source: Cloudera 6
7 Hadoop Components A Hadoop setup contains the following core software modules: Hadoop Module Function Common Distributed File System YARN Map/Reduce Hadoop libraries and utilities Storing data across the cluster Managing and scheduling activities in the cluster Manages data processing across multiple servers More Unusual Names in the Hadoop Ecosystem Apache Pig, Hive and HBase The success of Hadoop has sparked an ecosystem of related software: Apache Pig: A high- level platform for creating Map/Reduce programs using Hadoop Apache Hive: A data warehouse infrastructure on top of Hadoop for data summarization, query and analysis Apache HBase: An open- source, non- relational distributed database written in Java Market Opportunity IDC estimates that the worldwide Big Data technology market which comprises hardware, software and services will reach $32 billion in Gartner estimates that the global data- related enterprise software market will hit $110 billion in Of this figure, startup Cloudera projects that $30 billion represents analytical workloads and operational data stores, which stands in contrast to transactional workloads (used for analysis rather than commerce). Cloudera says that this market is the most immediately addressable by Hadoop technology, as well as one of the fastest- growing segments of the data- related enterprise market. 7
8 Who Uses It? As of 2008, many of the world s best- known Internet companies such as ebay, Facebook, LinkedIn, Yahoo!, and others had already adopted Hadoop as the software foundation for their big- data processing activities. Figure 6 shows the top- 20 selected Hadoop users, sorted by the number of nodes. These nodes represent 90% of the more than 17,000 Hadoop nodes counted by Apache. Data for Hadoop users Amazon, Google, IBM, and Twitter were not available. Figure 6. Selected Top Hadoop Users 5,000 Number of Nodes 4,000 3,000 2,000 1,000 0 Yahoo! LinkedIn Facebook SpoBfy Criteo Inmobi ebay CRS4 Adknowledge Neptune AOL FOX Audience Specific Media Search Wikia ecircle Lydia News Analysis A9 (Amazon) ARA.COM.TR Cornell Univ. Last.fm Source: Apache Top Hadoop Technology Companies: Source: edureka! 8
9 Hadoop Startups Hadoop startups have been well funded by venture capitalists. Figure 7 shows selected Hadoop startups, whose funding totals nearly $1.7 billion, with Cloudera receiving more than half of the total. The company reportedly generated more than $100 million in revenue in 2014 and was recently valued at $4.1 billion. Figure 7. Selected Venture- Capital Investments in Hadoop Startups Total Funding ($ Million) Company Description Location Innovator and largest contributor to Hadoop community Palo Alto, CA 1,200 MapR Technologies Enterprise- grade platform for mission- critical and real- time production San Jose, CA 174 Platfora Analytics that transforms raw data into interactive, in- memory business intelligence San Mateo, CA 65.2 Altiscale Hadoop- as- a- service Palo Alto, CA Trifacta Platform to transform raw, complex data for analysis San Francisco, CA 41.3 Datameer End- to- end Big Data analytics application native for Hadoop San Francisco, CA 36.8 DataTorrent Real- time stream processing platform Santa Clara, CA 23.8 Alpine Data Labs Solutions to simplify the process of building predictive models in Big Data San Francisco, CA 23.5 Splice Machine Hadoop- based, SQL- compliant database San Francisco, CA 22 Qubole Self- service platform for Big Data analytics built on Amazon, Microsoft and Google clouds Mountain View, CA 20 Cask Brings virtualization to Hadoop data and apps Palo Alto, CA 12.5 Nuevora Analytics solutions for marketing effectiveness, customer management and risk mitigation San Ramon, CA 2.3 Xplenty Platform enabling the transformation of data into business insights Tel Aviv 42 2 Source: CrunchBase Hortonworks, which makes business software focused on the development and support of Apache Hadoop, raised $10p0 million in an initial public offering on December 11, Conclusion At the time of its introduction, Hadoop brought Google s technology for the low- cost analysis of very large data sets on a cluster of inexpensive PC hardware to the world, thanks to its author s history in the open- source software community. Subsequently, many Internet search engines and large enterprises adopted the software. In addition, a community developed around the software, made up of tools and startups created to enhance the technology; it has led to one IPO so far. The Hadoop community continues to thrive, even though it is somewhat aged by Internet standards and some newer alternatives have emerged. That s quite an achievement for a piece of software named after a yellow stuffed elephant. 9
10 Deborah Weinswig, CPA Executive Director Head of Global Retail & Technology Fung Business Intelligence Centre New York: Hong Kong: Cam Bolden Marie Driscoll, CFA John Harmon, CFA Amy Hedrick Aragorn Ho John Mercer Charlie Poon Kiril Popov Stephanie Reilly Lan Rosengard Jing Wang 10
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationForecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
More informationThe Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop
More informationBig Data and Industrial Internet
Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University keijo.heljanko@aalto.fi 16.6-2015
More informationApplication Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
More informationHadoop Big Data for Processing Data and Performing Workload
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationHadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationFrom Wikipedia, the free encyclopedia
Page 1 sur 5 Hadoop From Wikipedia, the free encyclopedia Apache Hadoop is a free Java software framework that supports data intensive distributed applications. [1] It enables applications to work with
More informationManifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
More informationDATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
More informationAGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationData Mining in the Swamp
WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all
More informationBig Data and Hadoop for the Executive A Reference Guide
Big Data and Hadoop for the Executive A Reference Guide Overview The amount of information being collected by companies today is incredible. Wal- Mart has 460 terabytes of data, which, according to the
More informationBig Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014
White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page
More informationData-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
More informationBIG DATA USING HADOOP
+ Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationWhat is Big Data? Concepts, Ideas and Principles. Hitesh Dharamdasani
What is Big Data? Concepts, Ideas and Principles Hitesh Dharamdasani # whoami Security Researcher, Malware Reversing Engineer, Developer GIT > George Mason > UC Berkeley > FireEye > On Stage Building Data-driven
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris
More informationHDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationW H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationMapReduce with Apache Hadoop Analysing Big Data
MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues
More informationBig Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
More informationSpeak<geek> Tech Brief. RichRelevance Distributed Computing: creating a scalable, reliable infrastructure
3 Speak Tech Brief RichRelevance Distributed Computing: creating a scalable, reliable infrastructure Overview Scaling a large database is not an overnight process, so it s difficult to plan and implement
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationCA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data
Research Report CA Technologies Big Data Infrastructure Management Executive Summary CA Technologies recently exhibited new technology innovations, marking its entry into the Big Data marketplace with
More informationHadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
More informationBig Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012
Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation
More informationIntroduction to Hadoop
Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction
More informationBig Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
More informationExtending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago
More informationVirtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
More informationINDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases
INDUS / AXIOMINE Adopting Hadoop In the Enterprise Typical Enterprise Use Cases. Contents Executive Overview... 2 Introduction... 2 Traditional Data Processing Pipeline... 3 ETL is prevalent Large Scale
More informationBringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
More informationGlobal Hadoop Market (Hardware, Software, Services) applications, Geography, Haas, Global Trends,Opportunities, Segmentation and Forecast 2014-2021
Brochure More information from http://www.researchandmarkets.com/reports/3050450/ Global Hadoop Market (Hardware, Software, Services) applications, Geography, Haas, Global Trends,Opportunities, Segmentation
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationHadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationHadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers
More informationBig Data Analytics OverOnline Transactional Data Set
Big Data Analytics OverOnline Transactional Data Set Rohit Vaswani 1, Rahul Vaswani 2, Manish Shahani 3, Lifna Jos(Mentor) 4 1 B.E. Computer Engg. VES Institute of Technology, Mumbai -400074, Maharashtra,
More informationBringing the Power of SAS to Hadoop. White Paper
White Paper Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities Contents Introduction... 1 What
More informationOpen source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationMicrosoft Big Data. Solution Brief
Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,
More informationApache Hadoop FileSystem and its Usage in Facebook
Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs
More informationDominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationCDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
More informationBig Data Realities Hadoop in the Enterprise Architecture
Big Data Realities Hadoop in the Enterprise Architecture Paul Phillips Director, EMEA, Hortonworks pphillips@hortonworks.com +44 (0)777 444 3857 Hortonworks Inc. 2012 Page 1 Agenda The Growth of Enterprise
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
More informationTap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
More informationRole of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationBig Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
More informationAccelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
More informationHadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
More informationCSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
More informationApplication and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10
Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationBuilding Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
More informationL1: Introduction to Hadoop
L1: Introduction to Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 1, 2014 Today we are going to learn... 1 General
More informationThe future: Big Data, IoT, VR, AR. Leif Granholm Tekla / Trimble buildings Senior Vice President / BIM Ambassador
The future: Big Data, IoT, VR, AR Leif Granholm Tekla / Trimble buildings Senior Vice President / BIM Ambassador What is Big Data? 2 Big Data is when the amount of data becomes part of the problem 3 Big
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationHadoop Distributed File System. Jordan Prosch, Matt Kipps
Hadoop Distributed File System Jordan Prosch, Matt Kipps Outline - Background - Architecture - Comments & Suggestions Background What is HDFS? Part of Apache Hadoop - distributed storage What is Hadoop?
More informationBig Data Workshop. dattamsha.com
Big Data Workshop About Praveen Has more than15 years of experience working on various technologies. Is a Cloudera Certified Developer for Apache Hadoop CDH4 (CCD-410) with 95% score and got through the
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationWhy Big Data in the Cloud?
Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data
More informationHadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
More informationAll You Wanted to Know About Big Data Projects Chida Sadayappan @schida. Jan 2014
All You Wanted to Know About Big Data Projects Chida Sadayappan @schida Jan 2014 1 WHAT WE DISCUSS HERE AGENDA > > > > > > Need History Open Source - Hadoop BigData EcoSystem Use Cases Managing BigData
More informationBrochure More information from http://www.researchandmarkets.com/reports/3065119/
Brochure More information from http://www.researchandmarkets.com/reports/3065119/ Global Hadoop Market applications, Geography, Haas, Strategy, Industry Overview, Size, regional analysis, Share, Global
More informationHadoop-BAM and SeqPig
Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3 1 Department of Computer
More informationApproaches for parallel data loading and data querying
78 Approaches for parallel data loading and data querying Approaches for parallel data loading and data querying Vlad DIACONITA The Bucharest Academy of Economic Studies diaconita.vlad@ie.ase.ro This paper
More informationScalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationModernizing Your Data Warehouse for Hadoop
Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking
More informationQLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
More informationFrom Internet Data Centers to Data Centers in the Cloud
From Internet Data Centers to Data Centers in the Cloud This case study is a short extract from a keynote address given to the Doctoral Symposium at Middleware 2009 by Lucy Cherkasova of HP Research Labs
More informationA PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM
A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM Ramesh Maharjan and Manoj Shakya Department of Computer Science and Engineering Dhulikhel, Kavre, Nepal lazymesh@gmail.com,
More information