Emerging Trends in Big Data

Size: px
Start display at page:

Download "Emerging Trends in Big Data"

Transcription

1 Emerging Trends in Big Data TU Peter Linnell Big Data SUSE Apache Bigtop PMC plinnell@suse.com plinnell@apache.org

2 A little bit about me Scribus Founder and Core Team Member since 2001 Ex-Cloudera Kitchen Team baking Hadoop OpenSUSE Community member since 2006 OpenSUSE Board Member Apache Bigtop Founder and PMC Packager and contributor for many Open Source apps Day Job SUSE Systems Engineer in Silicon Valley High Performance Computing / Big Data Fan 2

3 Dilbert on Big Data 3

4 Hype Cycle 4

5 Linux is the Foundation for Big Data Scale Low Cost Commodity Hardware No Lock In Coopetition 5

6 6

7 Big Data The Jargon List Hadoop Core Hadoop is a Data Operating System Apache Hadoop is an open source software ecosystem, built around the core Hadoop technology. NoSQL A way of storing data, mostly in memory for quickly searching for data. Data has a temperature: Cold Data stored nearby Hot / Fast in memory or intelligent chaching Live Data Accessible to Big Data Tools Dead Data = Offline Data ACID - Atomicity, Consistency, Isolation, Durability Sharding see Wikipedia it is too complicated :-) 7

8 Big Data Challenges Existing data workflows are siloed Data is siloed Formats, proprietary applications Sensitive Data Concerns Regulatory Blockages Budget Constraints Planning Lead Times 8

9 Big Data Challenges 9 Data Scrubbing is the step never mentioned but indeed can be one of the biggest challenges. Big Data likes memory aka storage. Jobs can run longer than some typical mainframe or batch jobs. Hadoop turns the computing notion of bringing data to processing power on its head. You bring the compute power to where the data resides.

10 Examples of Big Data volumes 10 Scientific measurements (i. e. particle collision results from the Large Hadron Collider at the CERN) Financial data like stock information, share-price statistical data, stock related press coverage, etc. Medical data: genome database, patient's files in hospitals, information about pharmaceutical Indexed web or social media content Environmental Records - Weather Webserver Access-logs Sales data

11 Five main use cases for Big Data Transparency: insights into ongoing business operations Decision-testing: What happened (will happen) when (if) we made (make) this decision? Individualization in real time: tailoring offerings and services to customer wishes in real time in order to increase customer satisfaction and reduce customer churn Intelligent process control and automation Innovative data-driven business models From Big Data in Action

12 How to distinguish between several kinds of Big Data? 12 Amount of data: large (n terabytes) or very large (n petabytes) or gigantic (n exabytes)? Structured data (i. e. relational, column separated) or unstructured data (i. e. documents, webpages)? How complex is the data model? Transactional or non-transactional? Full data integrity required ACID? Usage patterns: Just lots of reads or also many inserts, updates and deletions? Usage performance: Realtime, short delays, long delays? Combination of several questions from above

13 Hadoop vs SQL (RDBMS) No predefined schema Schema defined in advance Fast Loading Data transformed Simpler Data Structures Fast Reading Flexible and Agile Standards/Governance The real innovation is the capability to explore original raw data 13

14 When to pick Hadoop vs RMDBS Scalablity is important Speed is important Structured or Unstructured ACID Transactions Interactive Analytics Complex Data Process A sports car is faster, but a truck can carry more. 14

15 Apache Hadoop Strengths Huge data volumes Unstructured data Reliable Scalable Lowest cost Open source No hardware lock in Batch processing 15

16 Apache Hadoop Weakenesses Not very efficient at small scale Real time is challenging at the moment (WIP) Requires skilled engineers and operations Less mature than SQL Weakly defined user roles in data access model (WIP) 16

17 What About NoSQL/NewSQL? Can be a cost effective replacement or supplement for traditional proprietary databases. There are several e.g MongoDB, Accumulo, Cassandra trying to solve different problems. Each has strengths and weaknesses to evaluate. 17

18 Linux Challenges Scalability We're hitting the limit of physics with current technology. The need for better fault tolerance in the O/S. Now helped by live kernel patching in Linux 4.1. The future will bring us exascale challenges. Think 3-7 years down the road Java scalability? Stutter affects Hadoop 18

19 Emerging Trends in Big Data Streaming accessing data in near real time for capture and analysis. Fast Data - in memory or intelligent caching. E.g. Spark, SAP HANA, HP Haven. Connectors are becoming ubiquitous Machine learning is becoming more accessible. Despite lesser performance, Cloud is becoming a more usable option for production. 19

20 Evaluation Thoughts Is Big Data a solution in search of a problem? Evaluate the need for real time data vs. near real time. Do we have right questions to ask? How can Big Data workflows be integrated with our existing infrastructure? What other agencies might have useful data? Pilot Pilot Pilot... 20

21 Evaluation Thoughts Pilot Pilot Pilot... 21

22 SUSE Big Data Partner Ecosystem Integrated solutions SAP HANA Teradata Aster Big Analytics Appliance Hadoop Distributions Intel Cloudera Hortonworks WANdisco Database 22 Intersystems CACHÉ

23 Bigtop 23 Packaging, QA testing and integration stack for Apache Hadoop components Made up of engineers from all the most of the Hadoop distros: Cloudera, Hortonworks and WANdisco,along with SUSE and independent contributors Almost unique among other Apache projects in that it integrates other projects as its goal All major Hadoop distros base their product on Bigtop

24 Why SUSE for Big Data? 24 SUSE has a decade plus of leadership in HPC/Supercomputing for Linux. Est 50% Top 500. Titan the biggest runs SLES. SLES12 has the most modern optimized kernel for Big Data work loads. We have Tier 1 support and relationships with all major open source Hadoop Distributors. Competition sees Big Data as an opportunity to sell proprietary solutions. We care about this market.

25 Why SUSE for Big Data? 25 Capable of supporting 64Tb, yes Tb of ram on one system. SLES12 has the most modern optimized kernel for Big Data work loads. Excellent deployment and management tools. Competition sees Big Data as an opportunity to sell proprietary solutions. We care about this market.

26 SUSE & Hortonworks Joint Flyer Partner Site Modern Data Architecture 26

27 SUSE Big Data Lab Big Data Cluster in Provo UT for: Benchmarking Software certification Integration / test Reference architectures Demo system Remotely accessible 27

28 Learn More Visit our web site Read our whitepapers Deploying Hadoop on SLES Deploy and Manage Hadoop with SUSE Manager Contact us 28

29 Questions? 29

30 30 Corporate Headquarters (Worldwide) Join us on: Maxfeldstrasse Nuremberg Germany

31 Appendix

32 Hadoop Core Components 32

33 Typical Hadoop Distribution 33

34 How Hadoop Works at Its Core Metadata ops Client Namenode Metadata (name, replicas, ): /home/foo/data, 3,... Read Block ops Rack 1 Rack 2 Replication Blocks Datanodes Datanodes Write Client 34

35 Hadoop is only one part But an important part 35 The compute layer of big data Supports the running of applications on large clusters of commodity hardware. Provides a distributed file system (HDFS) that stores data on the compute nodes. Enables applications to work with thousands of computers and petabytes of data. Lots of momentum IBM, Microsoft, Oracle, SAP, EMC, HP, Teradata, have built solutions on Hadoop or at least connectors to Hadoop Ecosystem of Hadoop players: Intel, Cloudera, HortonWorks, WANdisco, MapR, Greenplum Apache support

36 NameNode 36 The NameNode (NN) stores all metadata Information about file locations in HDFS Information about file ownership and permissions Names of the individual blocks Location of the blocks Metadata is stored on disk and read when the NameNode daemon starts

37 NameNode2 37 File name is fsimage Block locations are not stored in fsimage Changes to the metadata are made in RAM Changes are also written to a log file on disk called edits Each Hadoop cluster has a single NameNode The Secondary NameNode is not a fail-over NameNode The NameNode is a single point of failure (SPOF)

38 Secondary NameNode (master) 38 The Secondary NameNode (2NN) is not-a fail-over NameNode! It performs memory/intensive administrative functions for the NameNode. Secondary NameNode periodically combines a prior file system snapshot and editlog into a new snapshot New snapshot is transmitted back to the NameNode Secondary NameNode should run on a separate machine in a large installation It requires as much RAM as the NameNode

39 DataNode 39 DataNode (slave) JobTracker (master) / exactly one per cluster TaskTracker (slave) / one or more per cluster

40 Running Jobs 40 A client submits a job to the JobTracker JobTracker assigns a job ID Client calculates the input and splits for the job Client adds job code and configuration to HDFS The JobTracker creates a Map task for each input split TaskTrackers send periodic heartbeats to JobTracker These heartbeats also signal readiness to run tasks JobTracker then assigns tasks to these TaskTrackers

41 Running Jobs 41 The TaskTracker then forks a new JVM to run the task This isolates the TaskTracker from bugs or faulty code A single instance of task execution is called a task attempt Status info periodically sent back to JobTracker Each block is stored on multiple different nodes for redundancy Default is three replicas

42 Anatomy of a File Write 1. Client connects to the NameNode 2. NameNode places an entry for the file in its metadata, returns the block name and list of DataNodes to the client 3. Client connects to the first DataNode and starts sending data 4. As data is received by the first DataNode, it connects to the second and starts sending data 5. Second DataNode similarly connects to the third 6. Ack packets from the pipeline are sent back to the client 7. Client reports to the NameNode when the block is written 42

43 Hadoop Core Operations Review Metadata ops Client Namenode Metadata (name, replicas, ): /home/foo/data, 3,... Read Block ops Rack 1 Rack 2 Replication Blocks Datanodes Datanodes Write Client 43

44 Expanding on Core Hadoop 44

45 Hive, Hbase and Sqoop Hive High level abstraction on top of MapReduce Allows users to query data using HiveQL, a language very similar to standard SQL HBase A distributed, sparse, column oriented data store Sqoop 45 The Hadoop ingestion engine the basis of connectors like Teradata, Informatica, DB2 and many others.

46 Oozie 46 Work flow scheduler system to manage Apache Hadoop jobs Workflow jobs are Directed Acyclical Graphs (DAGs) of actions Coordinator jobs are recurrent Workflow jobs triggered by time (frequency) and data availabilty Integrated with the rest of the Hadoop stack Supports several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) Also supports system specific jobs (such as Java programs and shell scripts)

47 Flume 47 Distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data Simple and flexible architecture based on streaming data flows Robust and fault tolerant with tunable reliability mechanisms and many fail-over and recovery mechanisms Uses a simple extensible data model that allows for online analytic application

48 Mahout 48 The Apache Mahout machine learning library's goal is to build scalable machine learning libraries Currently Mahout supports mainly three use cases: Recommendation mining takes users' behavior and from that tries to find items users might like Clustering, for example, takes text documents and groups them into groups of topically related documents Classification learns from existing categorized documents what documents of a specific category look like and is able to assign unlabeled documents to the (hopefully) correct category

49 Whirr Set of libraries for launching Hadoop instances on clouds A cloud-neutral way to run services You don't have to worry about the idiosyncrasies of each provider. A common service API The details of provisioning are particular to the service. Smart defaults for services 49 You can get a properly configured system running quickly, while still being able to override settings as needed

50 Giraph 50 Iterative graph processing system built for high scalability Currently used at Facebook to analyze the social graph formed by users and their connections

51 Apache Pig 51 Platform for analyzing large data sets that consist of a high-level language for expressing data analysis programs Language layer currently consists of a textual language called Pig Latin, which has the following key properties: Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain. Extensibility. Users can create their own functions to do special-purpose processing.

52 Ambari Project goal is to develop software that simplifies Hadoop cluster management Provisioning a Hadoop Cluster Managing a Hadoop Cluster Monitoring a Hadoop Cluster 52 Ambari leverages well known technology like Ganglia and Nagios under the covers. Provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs

53 HUE Hadoop User Experience 53 Graphical front end to Hadoop tools for launching, editing and monitoring jobs Provides short cuts to various command line shells for working directly with components Can be integrated with authentication services like Kerberos or Active Directory

54 R Statistical Language 54 Statistical Language Open Source Licensed Similar to Octave or Mathlab Not currently packaged for SLES or opensuse

55 Shark/Spark 55 Spark is a real time query framework developed at Berkeley AMP. Spark was initially developed for two applications where placing data in memory helps: iterative algorithms, which are common in machine learning, and interactive data mining. Shark uses Spark to process real time queries in Hive. Up to 100x faster than MapReduce in some cases. Going in to most Hadoop distros now or soon.

56 Zookeeper 56 An orchestration stack. Centralized service for: Maintaining configuration information Naming Providing distributed synchronization Delivering group services.

57 NoSQL Cassandra 57 Enterprise provider is Datastax Keyspace -> container for column families High Performance, Highly Scalable, Available - No SPOF Replication by hashing data between nodes Query by Column - Requires index SQL-Like Native support for Apache Hadoop Flexible Schema -> Change at runtime. No transactions, no JOINs

58 NoSQL (cont) Accumulo 58 Like Hbase, a BigTable clone. Join-Less Runs on top of Hadoop. MapReduce with hadoop. Used for scanning large two-dimensional tables Accumulo, HBase and Cassandra are part of the Hadoop ecosystem. HBase supported by the Hadoop provider. Hugely scalable NoSQL database developed at NSA. Only NoSQL DB with cell level locking and security..

59 NoSQL (cont) MongoDB 59 Enterprise provider MongoDB Inc, was known as 10gen Non-Relational DataStore for JSON Documents {"name":"alejandro"} {"name":"alejandro", "Age": 31, likes:["soccer","golf", "Beach"]} Schemaless, container vs table, document vs row Does not support JOINs or transactions (across multiple documents). Does not perform as memcached, not as functional as RDBMS. Sits in the middle.

60 NoSQL (cont - MongoDB) 60 Provides the "mongo" shell - JavaScript interpreter, tools and drivers for easy access to API. Support replication and sharding. Supports an aggregation framework, mapreduce, Hadoop plugin. Document size Max 16MB -> GridFS to store big data + metadata.

61 Web UI Ports for Users 61 Daemon Default Port Configuration parameter NameNode dfs.http.address DataNode dfs.datanode.http.address Secondary NameNode dfs.secondary.http.address Backup/Checkpoint Node dfs.backup.http.address JobTracker mapred.job.tracker.http.address TaskTracker mapred.task.tracker.http.address

62 tation techarticle/dm-1209hadoopbigdata/ 62

63 Resources SUSE Big Data website SUSE Big Data Flyer se_foundation_for_big_data_solution.pdf SUSE Big Data Contacts Business: Frank Rego Technical: Peter Linnell

64 64 Corporate Headquarters (Worldwide) Join us on: Maxfeldstrasse Nuremberg Germany

65 Unpublished Work of SUSE. All Rights Reserved. This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability. General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.

Deploying Hadoop with Manager

Deploying Hadoop with Manager Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution

More information

Big Data Trends and Best Practices. Peter Linnell Big Data Team @ SUSE Apache Bigtop PMC plinnell@suse.com plinnell@apache.org

Big Data Trends and Best Practices. Peter Linnell Big Data Team @ SUSE Apache Bigtop PMC plinnell@suse.com plinnell@apache.org Big Data Trends and Best Practices Peter Linnell Big Data Team @ SUSE Apache Bigtop PMC plinnell@suse.com plinnell@apache.org A little bit about me Scribus Founder and Core Team Member since 2001 Ex-Cloudera

More information

TUT5605: Deploying an elastic Hadoop cluster Alejandro Bonilla

TUT5605: Deploying an elastic Hadoop cluster Alejandro Bonilla TUT5605: Deploying an elastic Hadoop cluster Alejandro Bonilla Sales Engineer abonilla@suse.com Agenda Overview Manual Deployment Orchestration Generic workload autoscaling Sahara Dedicated for Hadoop

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

Running SAP HANA One on SoftLayer Bare Metal with SUSE Linux Enterprise Server CAS19256

Running SAP HANA One on SoftLayer Bare Metal with SUSE Linux Enterprise Server CAS19256 Running SAP HANA One on SoftLayer Bare Metal with SUSE Linux Enterprise Server CAS19256 Brad DesAulniers Senior Software Engineer bradd@us.ibm.com @cb_brad Ryan Hodgin Senior IT Architect rhodgin@us.ibm.com

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

SUSE Storage. FUT7537 Software Defined Storage Introduction and Roadmap: Getting your tentacles around data growth. Larry Morris

SUSE Storage. FUT7537 Software Defined Storage Introduction and Roadmap: Getting your tentacles around data growth. Larry Morris SUSE FUT7537 Software Defined Introduction and Roadmap: Getting your tentacles around data growth Larry Morris Sr. Product Manager lmorris@suse.com AGENDA Enterprise Market SUSE Product SUSE Solutions

More information

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop

More information

Big Data, SAP HANA. SUSE Linux Enterprise Server for SAP Applications. Kim Aaltonen kim.aaltonen@suse.com

Big Data, SAP HANA. SUSE Linux Enterprise Server for SAP Applications. Kim Aaltonen kim.aaltonen@suse.com Big Data, SAP HANA SUSE Linux Enterprise Server for SAP Applications Kim Aaltonen kim.aaltonen@suse.com 2 Agenda 3 Big Data SAP HANA Optimized Linux for SAP Why SUSE for SAP? Summary 4 5 Big Data What

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information

How to Hadoop Without the Worry: Protecting Big Data at Scale

How to Hadoop Without the Worry: Protecting Big Data at Scale How to Hadoop Without the Worry: Protecting Big Data at Scale SESSION ID: CDS-W06 Davi Ottenheimer Senior Director of Trust EMC Corporation @daviottenheimer Big Data Trust. Redefined Transparency Relevance

More information

HO5604 Deploying MongoDB. A Scalable, Distributed Database with SUSE Cloud. Alejandro Bonilla. Sales Engineer abonilla@suse.com

HO5604 Deploying MongoDB. A Scalable, Distributed Database with SUSE Cloud. Alejandro Bonilla. Sales Engineer abonilla@suse.com HO5604 Deploying MongoDB A Scalable, Distributed Database with SUSE Cloud Alejandro Bonilla Sales Engineer abonilla@suse.com Agenda SUSE Cloud Overview What is MongoDB? 2 Getting familiar with the Cloud

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Peers Techno log ies Pv t. L td. HADOOP

Peers Techno log ies Pv t. L td. HADOOP Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Communicating with the Elephant in the Data Center

Communicating with the Elephant in the Data Center Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

Upcoming Announcements

Upcoming Announcements Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

<Insert Picture Here> Big Data

<Insert Picture Here> Big Data Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big

More information

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the

More information

Google Bing Daytona Microsoft Research

Google Bing Daytona Microsoft Research Google Bing Daytona Microsoft Research Raise your hand Great, you can help answer questions ;-) Sit with these people during lunch... An increased number and variety of data sources that generate large

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software 22 nd October 2013 10:00 Sesión B - DB2 LUW 1 Agenda Big Data The Technical Challenges Architecture of Hadoop

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Big Data Too Big To Ignore

Big Data Too Big To Ignore Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Building Images for the Cloud and Data Center with SUSE Studio

Building Images for the Cloud and Data Center with SUSE Studio Building Images for the Cloud and Data Center with SUSE Studio Michal Svec James Tan Senior Product Manager msvec@suse.com Engineering Manager jatan@suse.com Agenda From Software to Service SUSE Studio

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Introduction to Big Data Training

Introduction to Big Data Training Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

#TalendSandbox for Big Data

#TalendSandbox for Big Data Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

MySQL and Hadoop. Percona Live 2014 Chris Schneider

MySQL and Hadoop. Percona Live 2014 Chris Schneider MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

We are watching SUSE

We are watching SUSE We are watching SUSE Monitoring at SUSE and in SUSE Linux Enterprise Server 12 Martin Čaj Linux System Administrator Prague Czech Republic mcaj@suse.com Joachim Werner Senior Product Manager Nürnberg Germany

More information

Advanced Systems Management with Machinery

Advanced Systems Management with Machinery Advanced Systems Management with Machinery Andreas Jaeger Thomas Göttlicher Senior Product Manager aj@suse.com Software Engineer tgoettlicher@suse.com Who Are We? Andreas Jaeger Product Manager Thomas

More information

SUSE Linux uutuudet - kuulumiset SUSECon:sta

SUSE Linux uutuudet - kuulumiset SUSECon:sta SUSE Linux uutuudet - kuulumiset SUSECon:sta Olli Tuominen Technology Specialist olli.tuominen@suse.com 2 SUSECon 13 4 days, 95 Sessions Keynotes, Breakout Sessions,Technology Showcase Case Studies, Technical

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

A very short Intro to Hadoop

A very short Intro to Hadoop 4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2 Chris Haddad

Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2 Chris Haddad Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2 Chris Haddad VP, Platform Evangelism WSO2 chris@wso2.com Section Break Text Here (32pt) Cloudy Goals Improve efficiency and

More information

DevOps and SUSE From check-in to deployment

DevOps and SUSE From check-in to deployment DevOps and SUSE From check-in to deployment Rodolfo Bejarano SUSE Systems Engineer rodolfo.bejarano@suse.com Rick Ashford SUSE Systems Engineer rick.ashford@suse.com 2 Agenda 3 Introductions Development

More information

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? The Big Data Buzz big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Certified Big Data and Apache Hadoop Developer VS-1221

Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification

More information

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved. EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics

More information

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction

More information

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013

More information

MapReduce with Apache Hadoop Analysing Big Data

MapReduce with Apache Hadoop Analysing Big Data MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues

More information

Using SUSE Cloud to Orchestrate Multiple Hypervisors and Storage at ADP

Using SUSE Cloud to Orchestrate Multiple Hypervisors and Storage at ADP Using SUSE Cloud to Orchestrate Multiple Hypervisors and Storage at ADP Agenda ADP Cloud Vision and Requirements Introduction to SUSE Cloud Overview Whats New VMWare intergration HyperV intergration ADP

More information

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated

More information

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Bellevue, WA Legal disclaimer The information in this

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Using SUSE Linux Enterprise to "Focus In" on Retail Optical Sales

Using SUSE Linux Enterprise to Focus In on Retail Optical Sales Using SUSE Linux Enterprise to "Focus In" on Retail Optical Sales Patrick Mullin Scott Steele Senior Technical Specialist SUSE Consulting pmullin@suse.com Point of Sale Manager National Vision, Inc. scott.steele@nationalvision.com

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Apache Hadoop: Past, Present, and Future

Apache Hadoop: Past, Present, and Future The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past

More information

Data Services Advisory

Data Services Advisory Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains

More information

High Availability Storage

High Availability Storage High Availability Storage High Availability Extensions Goldwyn Rodrigues High Availability Storage Engineer SUSE High Availability Extensions Highly available services for mission critical systems Integrated

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Complete Java Classes Hadoop Syllabus Contact No: 8888022204

Complete Java Classes Hadoop Syllabus Contact No: 8888022204 1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

Relax-and-Recover. Johannes Meixner. on SUSE Linux Enterprise 12. <jsmeix@suse.com>

Relax-and-Recover. Johannes Meixner. on SUSE Linux Enterprise 12. <jsmeix@suse.com> Relax-and-Recover on SUSE Linux Enterprise 12 Johannes Meixner Topics What is Relax-and-Recover? What means disaster recovery here? How does disaster recovery work? How does Relax-and-Recover

More information

BBM467 Data Intensive ApplicaAons

BBM467 Data Intensive ApplicaAons Hace7epe Üniversitesi Bilgisayar Mühendisliği Bölümü BBM467 Data Intensive ApplicaAons Dr. Fuat Akal akal@hace7epe.edu.tr Problem How do you scale up applicaaons? Run jobs processing 100 s of terabytes

More information

Implementing Linux Authentication and Authorisation Using SSSD

Implementing Linux Authentication and Authorisation Using SSSD Implementing Linux Authentication and Authorisation Using SSSD Lawrence Kearney Enterprise Service and Integration Specialist Technology Transfer Partnership (TTP) lawrence.kearney@earthlink.net Mark Robinson

More information

White Paper: What You Need To Know About Hadoop

White Paper: What You Need To Know About Hadoop CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack

More information

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 Lecture 2 (08/31, 09/02, 09/09): Hadoop Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 K. Zhang BUDT 758 What we ll cover Overview Architecture o Hadoop

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required. What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees

More information

Installing, Tuning, and Deploying Oracle Database on SUSE Linux Enterprise Server 12 Technical Introduction

Installing, Tuning, and Deploying Oracle Database on SUSE Linux Enterprise Server 12 Technical Introduction Installing, Tuning, and Deploying Oracle Database on SUSE Linux Enterprise Server 12 Technical Introduction Arun Singh Sr. Technical Manager Arun.Singh@suse.com Agenda 2 Introduction SUSE Components Oracle

More information

BIRT in the World of Big Data

BIRT in the World of Big Data BIRT in the World of Big Data David Rosenbacher VP Sales Engineering Actuate Corporation 2013 Actuate Customer Days Today s Agenda and Goals Introduction to Big Data Compare with Regular Data Common Approaches

More information