<Insert Picture Here> Big Data
|
|
|
- Jane Lindsey
- 10 years ago
- Views:
Transcription
1 <Insert Picture Here> Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems
2 Program Agenda What is Big Data and why it is important? <Insert Picture Here> What is your Big Data Strategy? Oracle Big Data Offering 2
3 3
4 Big Data Defined Social Social BLOG BLOG Smart Metering Volume Velocity Variety Value Very large quantities of data Extremely fast streams of data Wide range of datatype characteristics High potential business value value if harnessed 4
5 5
6 Discovering New Value in Unconventional Data Manage the structured data from your applications. Manage and extract value from the unstructured data you generate (logs, sensors, feeds, call center, ) 6
7 7
8 8
9 What is your Big Data Strategy? How are you planning to make use of quasi-structured data coming from social media sites, web logs, or sensors and automated data gathering devices to make better decisions? How are you planning to leverage your existing business intelligence infrastructure so that you can widely disseminate this data? Where do you believe the potential return on investment could come from in analyzing all of your data?
10 <Insert Picture Here> Hadoop Ecosystem 10
11 Some Facts about Apache Hadoop It s Open Source (using the Apache license) Based on work done by Google in the early 2000s Specifically, on papers describing the Google File System (GFS) published in 2003, and MapReduce published in 2004 Currently around 40 core Hadoop committers from ~10 companies Cloudera, Yahoo!, Facebook, Apple and more Hundreds of contributors writing features, fixing bugs Users include: IBM, ebay, Twitter, Amazon, NetFlix, and Visa Vendors Integrations: Teradata (AsterData, HortonWorks), IBM (Netezza), Quest, HP (Vertica), MicroStrategy, plus many smaller companies Many related projects, application, tools 11
12 In-Database Analytics Maximize the Value of Enterprise Big Data Oracle Integrated Solution Stack for Big Data HDFS Hadoop (MapReduce) Oracle NoSQL Database Oracle Loader for Hadoop Data Warehouse Business Intelligence Applications Enterprise Applications Oracle Data Integrator ACQUIRE ORGANIZE ANALYZE DECIDE 12
13 Hadoop Components Hadoop consists of two core components The Hadoop Distributed File System (HDFS) The MapReduce Application Framework Other projects based around core Hadoop Referred to as the Hadoop Ecosystem Include things like Pig, Hive, HBase, Flume, Oozie, Sqoop, etc. Tools for ingest, retrieval, query and representation A set of machines running HDFS and MapReduce is a Hadoop Cluster Individual machines are known as nodes A cluster can have as few as one node or several thousand Facebook has 4,500 nodes, while the median number is 150 More nodes = better performance! 13
14 Hadoop Core Concepts Data Distribution Parallel Code Execution Extreme Fault Tolerance 14
15 Hadoop Core Concepts Data Distribution Based on clusters of servers with local storage Shared Nothing Architecture minimal communication between nodes MapReduce is the Functional Programming Paradigm used to process data in the Hadoop cluster MapReduce is a batch process not real-time Consists of two primary programmable phases: 1.Map : reads the file and extracts intermediate data 2.Reduce : aggregates/combines output of the mappers Data Movement/Network usage is minimized Data is initially loaded into large blocks of 64 or 128Mb and distributed among the nodes of the system Computation work is done on data stored locally on a node whenever possible No data transfer over the network is required for initial processing 15
16 Hadoop Core Concepts Fault Tolerance If a cluster node fails, the master will detect that failure and re-assign the work to a different node on the system Restarting a task does not require communication with nodes working on other portions of the data If a failed node restarts, it is automatically added back to the system and assigned new tasks If a node appears to be running slowly, the master can redundantly execute another instance of the same task and use the results from the first task to finish Replicating all data at least 3 times allows work to be reassigned to another node and still work on the data locally; minimizing or eliminating network impact for failures. 16
17 <Insert Picture Here> Hadoop Ecosystem The Hadoop Distributed File System (HDFS) 17
18 HDFS General Concepts HDFS is the file system responsible for storing data on the cluster Written in Java (based on Google s GFS) Sits on top of a native file system (ext3, ext4, xfs, etc) POSIX like file permissions model Provides redundant storage for massive amounts of data Stores data in in very large files (hundreds of MB/GB/TB each) Designed for extreme Fault Tolerance HDFS is optimized for large, streaming reads of files Files in HDFS are generally write once, read many Does not generally support random read/writes HDFS performs best with a modest number of large files Millions, rather than billions, of files Each file 100Mb or more 18
19 How Clients Write to HDFS 19
20 How Files Are Read by Clients in HDFS 20
21 How Files Are Stored in HDFS Individual files are split into blocks Each block is usually 64MB or 128MB If a file is smaller than a block the full 64MB/128MB will not be used Blocks are stored as standard files (xfs, etc) Blocks from a single file are distributed across the cluster Files can be compressed (multiple Codex available with varying speed/compression ratio tradeoffs) Each block is replicated to multiple nodes (3 nodes by default) A cluster has 2 types of nodes that operate in a master-worker pattern The NameNode manages the data in the cluster The DataNodes stores and retrieves the data 21
22 The HDFS Name Node The NameNode runs the NameNode daemon that manages HDFS Namespace Metadata What directories and files are on the cluster What blocks comprise a given file On which DataNodes the blocks for a given file are located Metadata held in RAM for fast access (need lots of RAM on this server) When a client application writes a file, the NameNode supplies a list of DataNodes to write the blocks to When a client reads a file, it communicates with the NameNode to determine which DataNodes the blocks reside on Listens for heartbeats from the DataNodes if a node is dead it will be removed from the active node list Issue commands to the DataNodes to re-replicate data so as to maintain 3 copies. 22
23 The HDFS Name Node - Persistence Only one NameNode in an HDFS cluster - single point of failure If the NameNode is down, the entire HDFS file system in unavailable If the name node metadata is lost then the entire cluster is unrecoverable Usually run on more reliable hardware than the rest of the cluster HDFS Metadata is persisted to 2 files for crash recovery Namespace image: stores file and directory information (fslist) Edit log (tracks metadata updates made in RAM) Metadata persistence files written both locally (RAID 1) and to a separate NFS mounted file system for recoverability. Block locations are reconstructed from data nodes when the system starts 23
24 The HDFS Secondary Name Node Daemon that runs on a separate server than the NameNode Primary task is to periodically (1 hour by default) merge namespace image file with the edit file. Stores result of merge and copies it back to the NameNode. In the event of NameNode Server failure, the metadata files can be copied from NFS mount and the NameNode Daemon started here. This then becomes the new NameNode If the NFS file is lost, the merged file on Secondary Name Node can be used for recovery as well. The Secondary NameNode is not a backup NameNode! There is no automatic failover! 24
25 The HDFS Data Nodes Machines in the cluster that store and retrieve the file blocks Clients connect directly to the DataNodes to read/write data Blocks are stored in a set of directories specified in a set of Hadoop configuration files Blocks are distributed across Data Nodes when data is initially loaded into HDFS Data Processing (called MapReduce) is done on the data node using locally stored blocks Periodically sends heartbeat (3 seconds by default) to the NameNode to confirm availability Every 10th message is a block report that tells the NameNode which blocks it is storing 25
26 The HDFS Data Nodes Data Integrity DataNodes implement replication on file creation by forwarding blocks to the other DataNodes after writing them locally Default Replication level is 3 (configurable) Block checksums are created by the DataNodes when files are written Clients reading/writing the data will verify success by checking the checksum The DataNode also runs a DataBlockScanner daemon that will detect corrupted blocks and heal them be using one of the replicas Data generally not replicated/backed-up outside of the cluster although redundant clusters can be used with data loaded into both. 26
27 How Files Are Accessed in HDFS Applications read and write HDFS files directly via the Java API Access to HDFS from the command line is achieved with the hadoop fs command Typically, files are created on a local file system and then must be moved into HDFS Likewise, files stored in HDFS may need to be moved to a machine s local file system 27
28 Oracle s recommended Big Data approach 1. Common platform to manage all your structured and unstructured data Easily load your map-reduce output into your DWH Infiniband connect your Big Data with your structured data systems Leverage the same Oracle skills to manage all your data 2. Engineered system to save time and money Preinstalled and preconfigured for large scale Big Data management Pre loaded Cloudera Distribution of Apache Hadoop, Oracle NoSQL and statistical environment R 3. Get one enterprise-level support for all your platform Oracle provides support for all your HW Oracle provides support for all your Oracle SW and the Open Source SW (CDH, R),
29 Acquire and Organize Big Data Oracle Big Data Appliance 18 Sun X4270 M2 servers 864 GB main memory 216 cores 648 TB Storage Software Oracle Linux Java Hotspot VM Cloudera CDH Cloudera Manager Open Source R Distribution Oracle NoSQL Database CE Big Data Connectors Oracle Loader for Hadoop Oracle Data Integrator Application Adapter for Hadoop Oracle R for Hadoop Connector Oracle Direct Connector for HDFS (or EE*) * Stand-alone software, can be pre-installed and pre-configured with BDA 29
30 Cloudera Management Console Reduces the risks of running Hadoop in production Improves consistency, compliance and administrative overhead Production support for CDH and certified integrations (MySql, Postgres, Oracle) Cloudera Manager Authorization Manager Activity Monitor Service Monitor Resource Manager Service & Configuration Manager 30
31 Cloudera Management Console Identify system issues and performance shortfalls by analyzing task distribution Trace issues down to individual tasks and logs Centralized administration of all system permissions Automated generation of configuration settings with built-in validations to prevent errors or downtime Drill down capability to the detailed status of individual nodes Automatically tracks services on hosts and their resource requirements Add new service instances with a few clicks 31
32 Cloudera Management Console Central Point of Management For All The Resources In The Hadoop Stack 32
33 Cloudera Management Console Service Dashboards Give Consolidated Status 33
34 Visualization at the Speed of Thought Oracle Exalytics 40 Intel processor cores 1 TB main memory 40 Gb InfiniBand connection to Oracle Exadata Oracle TimesTen In-Memory Database -Adaptive in-memory caching of analytics -In-memory columnar compression Oracle Business Intelligence Foundation Suite 34
35 Oracle Endeca Information Discovery Understand the complete picture with context from any source Enterprise Applications Unstructured Text Fields BI Star Schemas Metrics: Margin = Sales - Cost, Dimensions: Customer, Product, Transactions TxnID ProductI Amoun Category D t Mountain Bike Road Bike 1399 Customer reported receiving wheel with bent spokes Why did sales fall? Was it bad reviews? An unsuccessful marketing campaign? Quality issues? Enterprise Content System MS Office Documents, Internal Notes Websites Product Reviews Social Media Customer Comments Marketing Strategy Document Chapter 1. The new 506 from Acme is a solid road bike. 3.5 stars. Thinking about buying a new road bike. Anyone know anything about the Acme 506? 35
36 Oracle Endeca Information Discovery Rapid, intuitive exploration and analysis of data from any combination of structured and unstructured sources Unstructured Analytics Benefits Unprecedented Information Visibility Leverage Existing BI Investments Self-Service Data Discovery Reduced IT Costs, Better Business Decisions Unique Features Contextual Search, Navigation, Analytics Dynamic Data and Metadata Content Acquisition and Text Enrichment In-Memory Performance 36
37 Benefits of Oracle Big Data Integrate new data types into your existing business intelligence infrastructure Start managing all your unstructured data: sensor, social network, feeds, logs, call center, Take action using all available data sources Leverage insight from your structured and unstructured data Save time and money with engineered systems Avoid putting together the HW and SW pieces yourself Exploratory Analysis Get one point of support for all your business intelligence components, including Big Data One support for all your Big Data HW and SW components 37
38 38
Endeca Introduction to Big Data Analytics
Endeca Introduction to Big Data Analytics Overview May 8, 2013 1 Agenda Introduction Overview Analytics for Big Data Overview Endeca Information Discovery Q & A 2 Introduction Business vs. IT Big Data
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com
Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to outline our
Certified Big Data and Apache Hadoop Developer VS-1221
Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
Design and Evolution of the Apache Hadoop File System(HDFS)
Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop
Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya
Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
Apache Hadoop FileSystem and its Usage in Facebook
Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System [email protected] Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering
MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation
Big Data Are You Ready? Jorge Plascencia Solution Architect Manager
Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data: The Datafication Of Everything Thoughts Devices Processes Thoughts Things Processes Run the Business Organize data to do something
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing
Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer [email protected] Assistants: Henri Terho and Antti
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
Introducing Oracle Exalytics In-Memory Machine
Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle
Oracle Big Data Essentials
Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 40291196 Oracle Big Data Essentials Duration: 3 Days What you will learn This Oracle Big Data Essentials training deep dives into using the
Qsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
Prepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
Internals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
THE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
MySQL and Hadoop. Percona Live 2014 Chris Schneider
MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected]
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected] Hadoop, Why? Need to process huge datasets on large clusters of computers
CDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
Safe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
Big Data: Are You Ready? Kevin Lancaster
Big Data: Are You Ready? Kevin Lancaster Director, Engineered Systems Oracle Europe, Middle East & Africa 1 A Data Explosion... Traditional Data Sources Billing engines Custom developed New, Non-Traditional
Hadoop Distributed File System. Jordan Prosch, Matt Kipps
Hadoop Distributed File System Jordan Prosch, Matt Kipps Outline - Background - Architecture - Comments & Suggestions Background What is HDFS? Part of Apache Hadoop - distributed storage What is Hadoop?
MySQL and Hadoop Big Data Integration
MySQL and Hadoop Big Data Integration Unlocking New Insight A MySQL White Paper December 2012 Table of Contents Introduction... 3 The Lifecycle of Big Data... 4 MySQL in the Big Data Lifecycle... 4 Acquire:
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
Copyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
BIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
Distributed File Systems
Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Distributed Filesystems
Distributed Filesystems Amir H. Payberah Swedish Institute of Computer Science [email protected] April 8, 2014 Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 1 / 32 What is Filesystem? Controls
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
Big Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
Big Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
Deploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer [email protected] Alejandro Bonilla / Sales Engineer [email protected] 2 Hadoop Core Components 3 Typical Hadoop Distribution
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
An Oracle White Paper June 2013. Oracle: Big Data for the Enterprise
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
Hadoop for MySQL DBAs + 1 About me Sarah Sproehnle, Director of Educational Services @ Cloudera Spent 5 years at MySQL At Cloudera for the past 2 years [email protected] 2 What is Hadoop? An open-source
L1: Introduction to Hadoop
L1: Introduction to Hadoop Feng Li [email protected] School of Statistics and Mathematics Central University of Finance and Economics Revision: December 1, 2014 Today we are going to learn... 1 General
Data-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan [email protected] Abstract Every day, we create 2.5 quintillion
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture
An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP ESG Data Systems Architecture Big Data & Analytics as a Service Components Unstructured Data / Sparse Data of Value
An Oracle White Paper October 2011. Oracle: Big Data for the Enterprise
An Oracle White Paper October 2011 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5
Application Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
Big Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] June 3 rd, 2008
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] June 3 rd, 2008 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed
Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW
Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software 22 nd October 2013 10:00 Sesión B - DB2 LUW 1 Agenda Big Data The Technical Challenges Architecture of Hadoop
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
Oracle Big Data Fundamentals Ed 1 NEW
Oracle University Contact Us: +90 212 329 6779 Oracle Big Data Fundamentals Ed 1 NEW Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big
Cost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
Hadoop Distributed File System (HDFS) Overview
2012 coreservlets.com and Dima May Hadoop Distributed File System (HDFS) Overview Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized
Tap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网. Information Management. Information Management IBM CDL Lab
IBM CDL Lab Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网 Information Management 2012 IBM Corporation Agenda Hadoop 技 术 Hadoop 概 述 Hadoop 1.x Hadoop 2.x Hadoop 生 态
CIO Guide How to Use Hadoop with Your SAP Software Landscape
SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Definition A Conventional Disk-Based RDBMs
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
TUT NoSQL Seminar (Oracle) Big Data
Timo Raitalaakso +358 40 848 0148 [email protected] TUT NoSQL Seminar (Oracle) Big Data 11.12.2012 Timo Raitalaakso MSc 2000 Work: Solita since 2001 Senior Database Specialist Oracle ACE 2012 Blog: http://rafudb.blogspot.com
I/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage
www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
How To Create A Business Intelligence (Bi)
Oracle Business Analytics Overview Markus Päivinen Business Analytics Country Leader, Finland May 2014 1 Presentation content What are the requirements for modern BI Trend in Business Analytics Big Data
Intro to Map/Reduce a.k.a. Hadoop
Intro to Map/Reduce a.k.a. Hadoop Based on: Mining of Massive Datasets by Ra jaraman and Ullman, Cambridge University Press, 2011 Data Mining for the masses by North, Global Text Project, 2012 Slides by
Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks
Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies
Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights Big Data, Advanced Analytics:
