Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect

Size: px
Start display at page:

Download "Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect"

Transcription

1 Fighting Cyber Fraud with Hadoop Niel Dunnage Senior Solutions Architect 1

2 Summary Big Data is an increasingly powerful enterprise asset with many potential user cases in this case we ll explore the relationship between big data and cyber security Cloudera, Inc. All rights reserved.

3 Quick facts Founded 2008, by former employees of Employees Over 750 Global 24x7 Support Professional Services Follow-the-sun capability; Pro-active & Predictive Support Programs Dedicated Support Engineers; Support Centers in NA, Europe & Asia World class services delivery teams worldwide Mission Critical Thousands of enterprise customers rely on Cloudera: 50% of the Fortune 50; 65% of the Fortune 500, Top Defense & Intelligence Agencies The Largest Ecosystem Cloudera University Open Source Leaders Over 1200 Members of our Partner Program, ClouderaConnect Over 40,000 people trained around the world Cloudera employees are founders of most of the Apache Hadoop ecosystem projects, and leading contributors to all of them 3

4 Leading the way in data management powered by Hadoop CLOUDERA FOUNDED BY MIKE OLSON AMR AWADALLAH & JEFF HAMMERBACHER CLOUDERA RELEASES CDH THE FIRST COMMERCIAL APACHE HADOOP DISTRIBUTION CLOUDERA REACHES 100 PRODUCTION CUSTOMERS CLOUDERA ENTERPRISE 4 THE STANDARD FOR HADOOP IN THE ENTERPRISE CLOUDERA IMPALA CLOUDERA NAVIGATOR CLOUDERA SEARCH THE ENTERPRISE DATA HUB LAUNCHED CDH Cloudera Manager CLOUDERA ENTERPRISE 4 ASK BIGGER QUESTIONS ENTERPRISE DATA HUB HADOOP CREATOR DOUG CUTTING JOINS CLOUDERA CLOUDERA MANAGER: FIRST MANAGEMENT APPLICATION FOR HADOOP CLOUDERA UNIVERSITY EXPANDS TO 140 COUNTRIES CLOUDERA CONNECT REACHES 300 PARTNERS TOM REILLY JOINS AS CEO OVER 800 PARTNERS IN CLOUDERA CONNECT 4

5 Agenda Data: - The new oil How to scale out with unreliable workers How enterprises pool and share storage and computation resources Enterprise data governance and growing up Hadoop security Deploying machine learning at scale Empowering creative data science Cloudera, Inc. All rights reserved.

6 This Morning Cloudera, Inc. All rights reserved.

7 Cyber Security:- Data is a valuable commodity DDOS Data Exfiltration Confidential customer records Transaction data Reputation attack False flag Fake data Insider Threat Operations designed to deceive in such a way that the operations appear as though they are being carried out by entities, groups or nations other than those who actually planned and executed them hacker has released a database dump from a US fantasy football website ( claiming that it was timed to coincide with the NFL has continued to support OpSaudi, claiming further attacks on websites connected to Saudi Aramco. Anonymous Italy and Operation Green Rights (OpGR) have released the contents of an account connected to an Italian steel producer, in connection to accusations of pollution against the company Cloudera, Inc. All rights reserved.

8 Cloudera s Approach to Hadoop Security Comprehensive Standards-based Authentication Centralized, Granular Authorization Native Data Protection End-to-End Data Audit and Lineage Compliance-Ready Meet compliance requirements HIPAA, PCI-DSS, Encryption and key management Transparent Security at the core Minimal performance impact Compatible with new components Insight with compliance Cloudera, Inc. All rights reserved.

9 Enterprise Data Hub Users Cases OSINT Analysis Fraud Detection Log Processing Performance Management Risk Manageme nt Innovation and Advantage Ask bigger questions in the pursuit of discovering something incredible Operational Efficiency Perform existing workloads faster, cheaper, better ETL Acceleration Active Archive EDW Optimization Deep Exploratory BI Historical Compliance Cloudera, Inc. All Rights Reserved.

10 Our Design Strategy The Enterprise Data Hub A fully integrated Hadoop ecosystem One pool of data One metadata model One security framework One set of system resources Metadata, Navigator Select CPU_Met from application WHERE (USAGE > 1000) LEFT OUTER JOIN ON application_id where application_type IS Non_Critical Batch Processing Spark, MAPREDUCE, HIVE & PIG Interactive SQL CLOUDERA IMPALA HDFS TEXT, RCFILE, PARQUET, AVRO, ETC. Interactive Search CLOUDERA SEARCH Engines Machine Learning Spark Mlib,MAHOUT, Oryx Resource Management YARN Storage Integration graph.vertices.filter{case(id, _) => id== }.collect Math & Statistics SAS, R Hbase/ Accumulo RECORDS REST (Webhdfs), File (Fuse) Flume, Sqoop Stream Processing Spark streaming Security, Navigator, Sentry Cloudera, Inc. All rights reserved.

11 Offence:- Fraud Detection Fully Automated at scale User Cases Distributed parallel execution with chained joins Historical processing at scale Machine Learning, malware/anomaly detection, spam filters etc Combined real time and batch predictors 12 12

12 Big Data Economics Ask bigger questions Agile (2 week cycle) Linear scaling Robust and economic crypto security Creative fail fast innovation Powers productivity insights Increasing infrastructure ROI Increasing business ROI Defeating fraudulent activity Evaluating risk Innovate Predict Ingest Discover Cloudera, Inc. All Rights Reserved.

13 Data Ingest NRT Ingest Flume Optimized to flow real time event data into the Hadoop cluster Spark Streaming for near real time micro batch aggregations Twitter streaming Kafka Log API Bulk Load Sqoop for structured Fuse file system access API Web / Hue Data Enrichment Flume interceptors Kite Morplines module Configuration based interceptors that can enrich data. For example extracting facets, entity extraction applying regulatory tags collect Client Client Client Client enrich Agent Agent Agent buffer store Cloudera, Inc. All rights reserved.

14 Near Real time Access to threats View the geographic distribution of Slowloris DDOS taken from Apache web server logs Help isolate unpatched servers Identify source of attacks LogUtils.createStream(...).filter(_.getText.contains( 408 Error")).countByWindow(Seconds(10)) stream.join(historiccounts).filter { case (word, (curcount, oldcount)) => curcount > oldcount } Cloudera, Inc. All rights reserved.

15 Machine Learning Real-time large-scale machine learning predictive analytics infrastructure build on Hadoop Collaborative filtering and recommendation Classification and regression, Clustering 16 16

16 VARs and Monte Carlo Simulations Under reasonable circumstances, how much can you expect to lose? Monte Carlo simulation, involves posing thousands or millions of random market scenarios and observing how they tend to affect a portfolio of financial instruments VAR based on TimePeriod, Portfolio and Confidence level This technique is easily parallelizable and as such is a great fit for Hadoop and Spark in particular Until recently required complex MPI C++ code Easily implemented in Hadoop and feasible across hierarchies of financial instruments (P&L Accounts) Backtest to validate the VAR Curation of Market Factors is important Can shape portfolio investments for instruments that trial as loss making Cloudera, Inc. All rights reserved.

17 Applying VAR Techniques to Cyber Threat Monitoring with Hadoop Historical event data processing at scale Hadoop as a service shared with financial governance applications Treat s spent on vendors and software like instruments and portfolios? Anomaly detection of network traffic by learning what is normal Siloed applications have previously made it hard to have a tangible value of finanicial risk. Risk calculations tend towards the subjective ie low (FIS APT) high (insider threat) Cloudera, Inc. All rights reserved.

18 Internal Threat Dashboard Overall Risk Assessment: Risk Per Category: Online Banking Access: Public Records: Financial transaction rate: Online Activity: Social Media Activity: Regular purchases Foreign Travel: Ranked List of High Risk Personnel: Name Risk Score Kim Burgess 94 Guy Hughes 93 Jeff Maclaen 87 Ed Snowden 86 Mary Smith 82 Open Cases: Customers with Risk Scores that Recently Changed Name Old Score New Score John Smith Rob Jones Jim Fisher Henry Johnson Sue Leefield Name Risk Score Customers Dodgy Ecomm.biz 94 John Smith, Rob Jones. Brentford Shopping Centre 93 Jim Fisher, Henry Johnson 19

19 20 Analytics

20 Defense: - Security Features Hadoop Security: - Kerberos simplified deployment with Cloudera Manager Sentry: - provides unified authorization with a single policy for Hive, Impala and Search HDFS Extended ACL s and HBase cell level access control Navigator encrypt and key trustee deliver compliant data security Via Gazzang acquisition Navigator provides data management layer including audit, access control reviews, data classification and discovery, and lineage Cloudera, Inc. All rights reserved.

21 Kerberos Security Perimeter Security Guarding Technical Network access to the cluster itself Concepts: Authentication isolation Kerberos Kerberos: A computer network authentication protocol that works on basis of tickets to allow nodes to prove identity to each other in a secure manner using encryption extensively Messages are exchanged between: Client Server Kerberos Key Distribution Center (KDC). Note this is not part of Hadoop, but most Linux Distros come with MIT Kerberos KDC. Passwords are not sent across network, Instead passwords are used to compute encryption keys Authentication status is cached (don t need to send credentials with each request) Timestamps are essential to Kerberos (make sure system clocks are synchronized!) Cloudera, Inc. All rights reserved.

22 Apache Sentry Access Security Access Defining what users and applications can do with data Technical Concepts: Authorization Permissions Sentry Sentry provides unified authorization across multiple access paths A single authorization policy will be enforced for Impala, Hive and Search Role based access at Server, Database, Table or View granularity Multi-tenant: Separate policies for each database / schema Cloudera, Inc. All rights reserved.

23 Cloudera Navigator Visibility Visibility Reporting on where data came from and how it s being used Technical Concepts: Auditing Lineage Cloudera Navigator Auditing and Access Management View, granting and revoke permissions across the Hadoop stack Identify access to a data asset around the time of security breach Generate alert when a restricted data asset is accessed Lineage Given a data set, trace back to the original source Understand the downstream impact of purging/modifying a data set Metadata Tagging and Discovery Search through metadata to find data sets of interest Given a data set, view schema, metadata and policies Lifecycle Management Automate periodic ingestion of data Compress/encrypt a data set at rest Purge a dataset/replicate data set to a remote site Cloudera, Inc. All rights reserved.

24 Cloudera, Inc. All rights reserved.

25 Encryption at rest Navigator Encrypt and Key Trustee Encrypt any File, Directory AES-256 Encryption Unique Access controls Process Based, NOT users / groups 100% Transparent Separation of Duties Key Management AES encryption keys stored on separate Key Trustee server Key manager breach, data is safe Data Server breach, data is safe Process Based ACL s Linux File, Directory AES-256 Encryption Linux Server / VM Encrypt client GPG Linux Server / VM Key Trustee Server 26 Gazzang gazzang.com/products/cloudencrypt-for-aws

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect Fighting Cyber Fraud with Hadoop Niel Dunnage Senior Solutions Architect 1 Summary Big Data is an increasingly powerful enterprise asset and this talk will explore the relationship between big data and

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

The Enterprise Data Hub and The Modern Information Architecture

The Enterprise Data Hub and The Modern Information Architecture The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader

More information

Cloudera Enterprise Data Hub. GCloud Service Definition Lot 3: Software as a Service

Cloudera Enterprise Data Hub. GCloud Service Definition Lot 3: Software as a Service Cloudera Enterprise Data Hub GCloud Service Definition Lot 3: Software as a Service December 2014 1 SERVICE OVERVIEW & SOLUTION... 4 1.1 Service Overview... 4 1.2 Introduction to Cloudera... 5 1.3 Cloudera

More information

Hadoop Trends and Practical Use Cases. April 2014

Hadoop Trends and Practical Use Cases. April 2014 Hadoop Trends and Practical Use Cases John Howey Cloudera jhowey@cloudera.com Kevin Lewis Cloudera klewis@cloudera.com April 2014 1 Agenda Hadoop Overview Latest Trends in Hadoop Enterprise Ready Beyond

More information

Data Security in Hadoop

Data Security in Hadoop Data Security in Hadoop Eric Mizell Director, Solution Engineering Page 1 What is Data Security? Data Security for Hadoop allows you to administer a singular policy for authentication of users, authorize

More information

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson 1 A New Platform for Pervasive Analytics Multiple big data opportunities

More information

More Data in Less Time

More Data in Less Time More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational

More information

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Cloudera The Leader in Big Data Management Powered by Apache Hadoop The Leading Open Source Distribution of Apache

More information

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security

More information

Cloudera Enterprise Data Hub in Telecom:

Cloudera Enterprise Data Hub in Telecom: Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Driving Growth in Insurance With a Big Data Architecture

Driving Growth in Insurance With a Big Data Architecture Driving Growth in Insurance With a Big Data Architecture The SAS and Cloudera Advantage Version: 103 Table of Contents Overview 3 Current Data Challenges for Insurers 3 Unlocking the Power of Big Data

More information

Upcoming Announcements

Upcoming Announcements Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within

More information

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture

More information

Deploying an Operational Data Store Designed for Big Data

Deploying an Operational Data Store Designed for Big Data Deploying an Operational Data Store Designed for Big Data A fast, secure, and scalable data staging environment with no data volume or variety constraints Sponsored by: Version: 102 Table of Contents Introduction

More information

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce

More information

June 2011. Production Hadoop systems in the enterprise

June 2011. Production Hadoop systems in the enterprise June 2011 Production Hadoop systems in the enterprise 1 What Hadoop changes about data 2 The system past and present 3 Living with it your present and future 4 Q&A 2 2011 Cloudera, Inc. All Rights Reserved.

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Data Governance in the Hadoop Data Lake. Michael Lang May 2015 Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales

More information

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Version: 103 Table of Contents Introduction 3 Importance of Security 3 Growing Pains 3 Security Requirements

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Communicating with the Elephant in the Data Center

Communicating with the Elephant in the Data Center Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline

More information

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Version: 102 Table of Contents Introduction 3 Importance of Security 3 Growing Pains 3 Security Requirements

More information

Oracle Big Data Fundamentals Ed 1 NEW

Oracle Big Data Fundamentals Ed 1 NEW Oracle University Contact Us: +90 212 329 6779 Oracle Big Data Fundamentals Ed 1 NEW Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013 Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform

More information

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012 Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago

More information

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015 Data Governance in the Hadoop Data Lake Kiran Kamreddy May 2015 One Data Lake: Many Definitions A centralized repository of raw data into which many data-producing streams flow and from which downstream

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

How to avoid building a data swamp

How to avoid building a data swamp How to avoid building a data swamp Case studies in Hadoop data management and governance Mark Donsky, Product Management, Cloudera Naren Korenu, Engineering, Cloudera 1 Abstract DELETE How can you make

More information

Are You Big Data Ready?

Are You Big Data Ready? ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

TE's Analytics on Hadoop and SAP HANA Using SAP Vora TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

#TalendSandbox for Big Data

#TalendSandbox for Big Data Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

Certified Big Data and Apache Hadoop Developer VS-1221

Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification

More information

Ensure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015

Ensure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015 Ensure PCI DSS compliance for your Hadoop environment A Hortonworks White Paper October 2015 2 Contents Overview Why PCI matters to your business Building support for PCI compliance into your Hadoop environment

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management Big Data and New Paradigms in Information Management Vladimir Videnovic Institute for Information Management 2 "I am certainly not an advocate for frequent and untried changes laws and institutions must

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

MULTITENANCY AND THE ENTERPRISE DATA HUB:

MULTITENANCY AND THE ENTERPRISE DATA HUB: MULTITENANCY AND THE ENTERPRISE DATA HUB: Version: Q414-105 Table of Content Introduction 3 Business Objectives for Multitenant Environments 3 Standard Isolation Models of an EDH 4 Elements of a Multitenant

More information

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated

More information

and Hadoop Technology

and Hadoop Technology SAS and Hadoop Technology Overview SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview. Cary, NC: SAS Institute

More information

Oracle Big Data Building A Big Data Management System

Oracle Big Data Building A Big Data Management System Oracle Big Building A Big Management System Copyright 2015, Oracle and/or its affiliates. All rights reserved. Effi Psychogiou ECEMEA Big Product Director May, 2015 Safe Harbor Statement The following

More information

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera Accelerating Enterprise Big Data Success Tim Stevens, VP of Business and Corporate Development Cloudera 1 Big Opportunity: Extract value from data Revenue Growth x = 50 Billion 35 ZB Cost Savings Margin

More information

Big Data must become a first class citizen in the enterprise

Big Data must become a first class citizen in the enterprise Big Data must become a first class citizen in the enterprise An Ovum white paper for Cloudera Publication Date: 14 January 2014 Author: Tony Baer SUMMARY Catalyst Ovum view Big Data analytics have caught

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster The Big Data Security Gap: Protecting the Hadoop Cluster Introduction While the open source framework has enabled the footprint of Hadoop to logically expand, enterprise organizations face deployment and

More information

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

White Paper: What You Need To Know About Hadoop

White Paper: What You Need To Know About Hadoop CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack

More information

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385 brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries

Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries Ho Wing Leong, ASEAN 1 Cloudera company snapshot Founded Company Employees Today World Class Support Mission CriQcal 2008,

More information

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.

More information

SECURING YOUR ENTERPRISE HADOOP ECOSYSTEM

SECURING YOUR ENTERPRISE HADOOP ECOSYSTEM WHITE PAPER SECURING YOUR ENTERPRISE HADOOP ECOSYSTEM Realizing Data Security for the Enterprise with Cloudera Securing Your Enterprise Hadoop Ecosystem CLOUDERA WHITE PAPER 2 Table of Contents Introduction

More information

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation

More information

Hortonworks CISC Innovation day

Hortonworks CISC Innovation day Hortonworks CISC Innovation day Simon gregory sgregory@hortonworks.com Here was the ask Hortonworks' data reposition - how this works and the types of data you work with. 1: Data Types & Value. What have

More information

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Addressing Open Source Big Data, Hadoop, and MapReduce limitations Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

White Paper: Enhancing Functionality and Security of Enterprise Data Holdings

White Paper: Enhancing Functionality and Security of Enterprise Data Holdings White Paper: Enhancing Functionality and Security of Enterprise Data Holdings Examining New Mission- Enabling Design Patterns Made Possible by the Cloudera- Intel Partnership Inside: Improving Return on

More information

WHAT S NEW IN SAS 9.4

WHAT S NEW IN SAS 9.4 WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Securing Hadoop in an Enterprise Context

Securing Hadoop in an Enterprise Context Securing Hadoop in an Enterprise Context Hellmar Becker, Senior IT Specialist Apache: Big Data conference Budapest, September 29, 2015 Who am I? 2 Securing Hadoop in an Enterprise Context 1. The Challenge

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions

More information

LOG INTELLIGENCE FOR SECURITY AND COMPLIANCE

LOG INTELLIGENCE FOR SECURITY AND COMPLIANCE PRODUCT BRIEF uugiven today s environment of sophisticated security threats, big data security intelligence solutions and regulatory compliance demands, the need for a log intelligence solution has become

More information

Datameer Big Data Governance

Datameer Big Data Governance TECHNICAL BRIEF Datameer Big Data Governance Bringing open-architected and forward-compatible governance controls to Hadoop analytics As big data moves toward greater mainstream adoption, its compliance

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

A Modern Data Architecture with Apache Hadoop

A Modern Data Architecture with Apache Hadoop Modern Data Architecture with Apache Hadoop Talend Big Data Presented by Hortonworks and Talend Executive Summary Apache Hadoop didn t disrupt the datacenter, the data did. Shortly after Corporate IT functions

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management INTRODUCTION Traditional perimeter defense solutions fail against sophisticated adversaries who target their

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader The Digital Enterprise Demands a Modern Integration Approach Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader Yesterday s approach to data and application integration is a barrier

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

Real Time Data Processing using Spark Streaming

Real Time Data Processing using Spark Streaming Real Time Data Processing using Spark Streaming Hari Shreedharan, Software Engineer @ Cloudera Committer/PMC Member, Apache Flume Committer, Apache Sqoop Contributor, Apache Spark Author, Using Flume (O

More information

Encryption and Anonymization in Hadoop

Encryption and Anonymization in Hadoop Encryption and Anonymization in Hadoop Current and Future needs Sept-28-2015 Page 1 ApacheCon, Budapest Agenda Need for data protection Encryption and Anonymization Current State of Encryption in Hadoop

More information

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics

More information

Cisco IT Hadoop Journey

Cisco IT Hadoop Journey Cisco IT Hadoop Journey Srini Desikan, Program Manager IT 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases

More information

White Paper: Evaluating Big Data Analytical Capabilities For Government Use

White Paper: Evaluating Big Data Analytical Capabilities For Government Use CTOlabs.com White Paper: Evaluating Big Data Analytical Capabilities For Government Use March 2012 A White Paper providing context and guidance you can use Inside: The Big Data Tool Landscape Big Data

More information

Apache Hadoop: Past, Present, and Future

Apache Hadoop: Past, Present, and Future The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past

More information

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform Optimized for the Industrial Internet: GE s Industrial Lake Platform Agenda The Opportunity The Solution The Challenges The Results Solutions for Industrial Internet, deep domain expertise 2 GESoftware.com

More information

Big Data for Investment Research Management

Big Data for Investment Research Management IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable

More information

Paxata Security Overview

Paxata Security Overview Paxata Security Overview Ensuring your most trusted data remains secure Nenshad Bardoliwalla Co-Founder and Vice President of Products nenshad@paxata.com Table of Contents: Introduction...3 Secure Data

More information