How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)
|
|
|
- Augustine Horn
- 5 years ago
- Views:
Transcription
1
2 Day with Development Master Class Big Data Management System DW & Big Data Global Leaders Program Jean-Pierre Dijcks Big Data Product Management Server Technologies
3 Part 1 Part 2 Foundation and Architecture of a BDMS Streaming & Batch Data Ingest and Tooling 3
4 Storing Data in HDFS and its relation to performance Space Usage vs. Type Complexity Data types like JSON are popular, especially when exchanging data or when capturing messages Simple JSON documents can be left in their full state If the documents are deeply nested it pays to flatten these upon ingest The consequence is of course an expansion of the data size But, things like joins and typical analytics on Hadoop will perform better on simpler objects In regulated industries, it pays to keep the original JSON as well as the decomposed structure. Ensure to compress the original and save away in a source directory 4
5 General Classification Stream Flume GoldenGate Kafka Batch HDFS Put (S)FTP NFS wget curl & HDFS Put wget curl & (S)FTP NFS Push Pull 5
6 Batch Loading data into HDFS Pushing Data 6
7 Don t do this HDFS in this case should replace any additional filers SAN or NAS filers B 7
8 Instead try to do this Add either an FTP or a Hadoop client to the source Major Benefits from this simple change: Reduces amount of NAS/SAN storage => COST savings Reduces complexity Reduces data proliferation (improved security) Hadoop or FTP client go here B 8
9 Using Hadoop Client to Load Data Source Server Big Data Appliance HDFS nodes Client HDFS Put issued from Hadoop Client on SRC server Linux FS HDFS 9
10 Using Hadoop Client to Load Data Source Server Big Data Appliance HDFS nodes Client HDFS Put issued from Hadoop Client on SRC server Enables direct HDFS writes without intermediate file staging on Linux FS Easy to scale: Initiate concurrent puts for multiple files HDFS will leverage multiple target servers and ingest faster Linux FS HDFS 10
11 FTP-ing onto Local Linux File System Basic Flow Install FTP server on BDA node(s) 1. FTP files onto local Linux FS on BDA Something like /u12 Some FTP clients can write to WebHDFS 2. Use HDFS Put to load data from Linux FS into HDFS 3. Remove files from Linux FS 4. Repeat Big Data Appliance HDFS nodes Linux FS HDFS 11
12 FTP Managing Space for Linux and HDFS on Ingest nodes You cannot (today) de-allocate a few disks from HDFS on BDA So, you should therefore: Set a quota on how large HDFS can grow on the ingest nodes Set a quota at the linux levels to regulate space Sizing depends on The ingest and cleanup schedule The ingest size Peak ingest sizes Linux FS Big Data Appliance HDFS HDFS nodes 12
13 FTP High Availability Source Server Big Data Appliance HDFS nodes Run multiple FTP servers on multiple BDA nodes Provide a load balancer like HA Proxy (included with Oracle Linux) HA Proxy Linux FS HDFS 13
14 Batch Loading data into HDFS Pulling Data 14
15 Pulling data with wget or curl and Hadoop Client Big Data Appliance Source Server HDFS nodes Client HDFS Put issued from Hadoop Client on SRC server Use wget or curl to initiate data transfer and load Linux FS HDFS 15
16 Pulling data with wget or curl and Hadoop Client Big Data Appliance Source Server HDFS nodes Client Pipe straight through to HDFS put Can use FTP/HTTP as well All observations from previous section apply HDFS Put issued from Hadoop Client on SRC server Use wget or curl to initiate data transfer and load Linux FS HDFS 16
17 Grabbing Data from Databases ORCL Big Data SQL - Copy to BDA - Table Acces to BDA Big Data Appliance HDFS nodes SQL Object Based Sqoop Change Capture ORCL GoldenGate 17
18 1) Avoid any additional external staging systems as these system reduce scalability 2) Opt for tools and methods that write directly into HDFS like HDFS put 18
19 Moving Mainframe data into HDFS Batch Files 19
20 Using Golden Gate to Replicate from Mainframe Mainframe GG can replicate from MF Database Big Data Appliance HDFS nodes GoldenGate Apply directly into HDFS or HBase 20
21 Mainframe Data Mainframe General Assumption: Any data collection on MF needs to be non-intrusive due to security and cost (MIPS) reasons. Existing jobs typical generate files SyncSort SyncSort is one of the leading MF tools FTP (via ETL Tools) from MF to recipient systems
22 Using file transfers to move from Mainframe Mainframe SyncSort Follow the push and pull mechanisms discussed earlier Big Data Appliance HDFS nodes Pull or Push Data 22
23 Using file transfers to move from Mainframe Most MF files will be EBCDIC format and need to be converted to ASCII 1. Land on local disk (Linux FS) 2. Put files into HDFS 3. Convert from EBCDIC to ASCII using standard tooling (ex: SyncSort) on Hadoop 4. Optional: Copy ASCII file and compress together with original EBCDIC files 5. Archive original (with ASCII file if done step 3) file 6. Delete original files from Linux FS 7. Repeat Big Data Appliance HDFS nodes 23
24 1) Keep Transfer SW as simple as possible 2) Move as much processing of files from MF to BDA, and use proven tools for EBCDIC to ASCII conversions 24
25 Streaming Data Product Approach 25
26 Various tooling options Apache Kafka seems to be a (new) favorite Oracle GoldenGate just added a big data option enabling streaming from GG sources into HDFS and Hive for example Oracle Event Processing enables a rich developer environment and low latency stream processing See the representative documentation for details, usage and restrictions Note the distinction between Transport and Processing OEP is an example of stream processing, whereas Kafka is stream transport 26
27 What should I use now that I am streaming data? Chances are you have no choice Your sources are publishing data onto a messaging bus Your organization already has a streaming system in place Nevertheless the following section will attempt to clarify this question 27
28 Apache Flume (NG) Currently one of the most common tools, with many pre-built sources and sinks, some other interesting aspects: Scalable with fan-in and fan-out capabilities Direct write into HDFS Can evaluate simple in stream actions Part of CDH and supported as such Use for streaming when: Simple actions need to be evaluated Reasonable latency is ok Scalability is key You are using this for other data sources 28
29 Oracle Event Processing Low latency, with easy to use visual modeling environment and its own DSL called Continuous Query Language (CQL): Available for data center as well as embedded, enabling large fan-in setups for IoT like systems Direct write into HDFS as well as Oracle NoSQL DB Can evaluate complex in stream actions, leveraging readings from NoSQL, Oracle Database and can leverage for example Oracle Spatial Focuses on very low latency and complex actions Use for streaming when: You need low latency, embedded and complex actions, expanding to IoT You are looking for mature tooling, an easy to use DSL 29
30 Apache Kafka Highly scalable messaging system (Linked-In): Pub-Sub mechanism Distributed and highly resilient Highly scalable even when serving a mix of batch and online consumers No action evaluation capabilities (needs external tooling for this) Use for streaming when: You are looking for a scalable messaging system You are dealing with very high volumes You can code a number of things when needed 30
31 Conclusion Use Flume for specific use cases: Rolling log files Why? Flume has a lot of specific code available to deal with a large number of log formats and writes directly into HDFS Use OEP when you need event processing Processing: Complex rules are applied across the spectrum You need embedded systems (standardize) Use Kafka when: You have the skills or can acquire them Transportation: You are looking for massive scale queuing / streaming 31
32 Streaming data into HDFS Pushing Data 32
33 Flume Streaming logs to HDFS Big Data Appliance HDFS nodes Webserver Flume Log4j Client Flume Agent Flume HDFS Sink Note, Flume enables simple event processing as well as direct movement into HDFS or other sinks 33
34 Flume Streaming logs to HDFS Flume Concepts Client captures and transmits events to the next hop Agent Agents can write to other agents through sinks Flume Client source channel Flume Sink Source receives events and delivers these to one or more channels Channel receives the event, which gets drained by sinks Sink either finishes a flow (ex. HDFS sink) or transmits to the next agent 34
35 Flume Streaming logs to HDFS Splitting Streams / Multi-Consumer Webserver Flume Log4j Client Flume Source Flume Agent Flume Channel DR Site Flume Channel Production Same data flows to both HDFS clusters Flume HDFS Sink Flume HDFS Sink HDFS nodes HDFS nodes 35
36 Standardize as much as possible towards a single technology for ease of management (see next topic) 36
37 Landing Streaming Data 37
38 Land in HDFS of NoSQL? Driven by query requirements: NoSQL nodes HDFS nodes Do I need to see individual transactions as they land? Do I need key based access in real-time? Can I wait for HDFS to write to disk? Stream
39 The need for a separate NoSQL store does complicate architectures, so only do this if required 39
40 Streaming Some Example Architectures 40
41 OEP NoSQL Database Hadoop Big Data Appliance HDFS nodes Embedded OEP on Sensors OEP on GTW Devices NoSQL DB to catch data and deliver Models to OEP 41
42 OEP NoSQL Hadoop OEP instances are not linked and act upon a partition of inputs Embedded OEP on Sensors OEP on GTW Devices Add Coherence distributed memory grid to enable data sharing between all OEP instances 42
43 Flume Kafka Hadoop Big Data Appliance HDFS nodes Flume HDFS Sink Kafka Cluster Flume Client & Agents 43
44 Future State? Kafka Hadoop Big Data Appliance HDFS nodes Kafka Consumers Kafka Cluster Kafka Producers 44
45 The tooling for Streaming is in flux, Kafka is looking like a thing that is going to stick around When in doubt, look at vendor options as they are often better documented and supported 45
46 HDFS data into Databases 46
47 From HDFS to Database Big Data Appliance HDFS nodes Oracle Big Data SQL: Enables transparent SQL access to the end user across BDA + Exadata Covered in the next section!! Big Data Connectors - Oracle SQL Connector to HDFS - Oracle Loader for Hadoop ORCL Sqoop 47
48 A few comments Sqoop is widely used, but is also widely complained about Handle with care, know what you are doing Big Data Connectors: Better performance than Sqoop, preferred option for Oracle Database loads Oracle Data Integrator when licensing Big Data Connectors on Big Data Appliance ODI is included as a restricted use license. This applies when all transformations are done on BDA (none on Oracle DB for example) 48
49 Use (ETL) tools where you can as they simplify implementation and enable you to shift implementation paradigms more quickly 49
50 50
51
A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle
A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle Growth in Data Diversity and Usage 1.8 Zettabytes of Data in 2011, 20x Growth by 2020
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
Comprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
Oracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
Using RDBMS, NoSQL or Hadoop?
Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest
I/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
Oracle Big Data Essentials
Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 40291196 Oracle Big Data Essentials Duration: 3 Days What you will learn This Oracle Big Data Essentials training deep dives into using the
Saving Millions through Data Warehouse Offloading to Hadoop. Jack Norris, CMO MapR Technologies. MapR Technologies. All rights reserved.
Saving Millions through Data Warehouse Offloading to Hadoop Jack Norris, CMO MapR Technologies MapR Technologies. All rights reserved. MapR Technologies Overview Open, enterprise-grade distribution for
Dominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Architecting for the Internet of Things & Big Data
Architecting for the Internet of Things & Big Data Robert Stackowiak, Oracle North America, VP Information Architecture & Big Data September 29, 2014 Safe Harbor Statement The following is intended to
Oracle Big Data Fundamentals Ed 1 NEW
Oracle University Contact Us: +90 212 329 6779 Oracle Big Data Fundamentals Ed 1 NEW Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big
An Oracle White Paper June 2013. Oracle: Big Data for the Enterprise
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
<Insert Picture Here> Big Data
Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big
Oracle Big Data Spatial & Graph Social Network Analysis - Case Study
Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Mark Rittman, CTO, Rittman Mead OTN EMEA Tour, May 2016 [email protected] www.rittmanmead.com @rittmanmead About the Speaker Mark
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
Copyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions
Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid
Oracle Big Data Building A Big Data Management System
Oracle Big Building A Big Management System Copyright 2015, Oracle and/or its affiliates. All rights reserved. Effi Psychogiou ECEMEA Big Product Director May, 2015 Safe Harbor Statement The following
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
Introduction to Apache Kafka And Real-Time ETL. for Oracle DBAs and Data Analysts
Introduction to Apache Kafka And Real-Time ETL for Oracle DBAs and Data Analysts 1 About Myself Gwen Shapira System Architect @Confluent Committer @ Apache Kafka, Apache Sqoop Author of Hadoop Application
Safe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco
Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco About the Speaker Mark Rittman, Co-Founder of Rittman Mead
Connecting Hadoop with Oracle Database
Connecting Hadoop with Oracle Database Sharon Stephen Senior Curriculum Developer Server Technologies Curriculum The following is intended to outline our general product direction.
News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
An Oracle White Paper October 2011. Oracle: Big Data for the Enterprise
An Oracle White Paper October 2011 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5
Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine
Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Version 3.0 Please note: This appliance is for testing and educational purposes only; it is unsupported and not
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Information Builders Mission & Value Proposition
Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day
STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA Processing billions of events every day Neha Narkhede Co-founder and Head of Engineering @ Stealth Startup Prior to this Lead, Streams Infrastructure
An Oracle White Paper September 2014. Oracle: Big Data for the Enterprise
An Oracle White Paper September 2014 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform...
Dell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/
Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager
Oracle Data Integrator for Big Data Alex Kotopoulis Senior Principal Product Manager Hands on Lab - Oracle Data Integrator for Big Data Abstract: This lab will highlight to Developers, DBAs and Architects
The Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @
The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...
Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload
Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
Putting Apache Kafka to Use!
Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable
Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges
Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges James Campbell Corporate Systems Engineer HP Vertica [email protected] Big
White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014
White Paper EMC Isilon: A Scalable Storage Platform for Big Data By Nik Rouda, Senior Analyst and Terri McClure, Senior Analyst April 2014 This ESG White Paper was commissioned by EMC Isilon and is distributed
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
ORACLE DATA INTEGRATOR ENTERPRISE EDITION
ORACLE DATA INTEGRATOR ENTERPRISE EDITION Oracle Data Integrator Enterprise Edition 12c delivers high-performance data movement and transformation among enterprise platforms with its open and integrated
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco
Decoding the Big Data Deluge a Virtual Approach Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco High-volume, velocity and variety information assets that demand
Oracle Big Data Strategy Simplified Infrastrcuture
Big Data Oracle Big Data Strategy Simplified Infrastrcuture Selim Burduroğlu Global Innovation Evangelist & Architect Education & Research Industry Business Unit Oracle Confidential Internal/Restricted/Highly
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies
Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights Big Data, Advanced Analytics:
The Hadoop Eco System Shanghai Data Science Meetup
The Hadoop Eco System Shanghai Data Science Meetup Karthik Rajasethupathy, Christian Kuka 03.11.2015 @Agora Space Overview What is this talk about? Giving an overview of the Hadoop Ecosystem and related
Hadoop Meets Exadata. Presented by: Kerry Osborne. DW Global Leaders Program Decemeber, 2012
Hi Hadoop Meets Exadata Presented by: Kerry Osborne DW Global Leaders Program Decemeber, 2012 whoami Never Worked for Oracle Worked with Oracle DB Since 1982 (V2) Working with Exadata since early 2010
Ganzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
Upcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC [email protected] Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
How to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
Improve your IT Analytics Capabilities through Mainframe Consolidation and Simplification
Improve your IT Analytics Capabilities through Mainframe Consolidation and Simplification Ros Schulman Hitachi Data Systems John Harker Hitachi Data Systems Insert Custom Session QR if Desired. Improve
Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper
Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)
Oracle Big Data Handbook
ORACLG Oracle Press Oracle Big Data Handbook Tom Plunkett Brian Macdonald Bruce Nelson Helen Sun Khader Mohiuddin Debra L. Harding David Segleau Gokula Mishra Mark F. Hornick Robert Stackowiak Keith Laker
Trafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
Case Study : 3 different hadoop cluster deployments
Case Study : 3 different hadoop cluster deployments Lee moon soo [email protected] HDFS as a Storage Last 4 years, our HDFS clusters, stored Customer 1500 TB+ data safely served 375,000 TB+ data to customer
Please give me your feedback
Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &
Big Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com
Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook
Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the
Using Kafka to Optimize Data Movement and System Integration. Alex Holmes @
Using Kafka to Optimize Data Movement and System Integration Alex Holmes @ https://www.flickr.com/photos/tom_bennett/7095600611 THIS SUCKS E T (circa 2560 B.C.E.) L a few years later... 2,014 C.E. i need
HADOOP AND MAINFRAMES CRAZY OR CRAZY LIKE A FOX? Mike Combs, VP of Marketing 978-996-3580 [email protected]
HADOOP AND MAINFRAMES CRAZY OR CRAZY LIKE A FOX? Mike Combs, VP of Marketing 978-996-3580 [email protected] The Big Picture for Big Data 2 The Lack of Information Problem The Surplus of Data Problem
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING
ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING Enzo Unified Extends SQL Server to Simplify Application Design and Reduce ETL Processing CHALLENGES SQL Server does not scale out
Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,
Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Bellevue, WA Legal disclaimer The information in this
Moving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013
Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the
Big Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
Getting Real Real Time Data Integration Patterns and Architectures
Getting Real Real Time Data Integration Patterns and Architectures Nelson Petracek Senior Director, Enterprise Technology Architecture Informatica Digital Government Institute s Enterprise Architecture
High Performance Data Management Use of Standards in Commercial Product Development
v2 High Performance Data Management Use of Standards in Commercial Product Development Jay Hollingsworth: Director Oil & Gas Business Unit Standards Leadership Council Forum 28 June 2012 1 The following
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at [email protected].
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
Sentimental Analysis using Hadoop Phase 2: Week 2
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture
An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP ESG Data Systems Architecture Big Data & Analytics as a Service Components Unstructured Data / Sparse Data of Value
Bringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
Modernizing Your Data Warehouse for Hadoop
Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist [email protected] O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking
Hadoop Big Data for Processing Data and Performing Workload
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
Hadoop for MySQL DBAs + 1 About me Sarah Sproehnle, Director of Educational Services @ Cloudera Spent 5 years at MySQL At Cloudera for the past 2 years [email protected] 2 What is Hadoop? An open-source
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities
Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling
Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
Tungsten Replicator, more open than ever!
Tungsten Replicator, more open than ever! MC Brown, Senior Product Line Manager September, 2015 2014 VMware Inc. All rights reserved. We Face An Age Old Problem BRS/Search 2 It s Gotten Worse 3 Much Worse
