Building Your Big Data Team
|
|
|
- Horatio Lawson
- 10 years ago
- Views:
Transcription
1 Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements. Platforms like Hadoop promise scalable data storage and processing at lower cost than traditional data platforms, but the newer technology requires new skill sets to plan, build and run a Big Data program. Companies not only have to determine the right Big Data technology strategy, they need to determine the right staffing strategy to be successful. What comprises a great Big Data team? Luckily the answer isn t that far from what many companies already have in house managing their data warehouse or data services functions. The core skill sets and experience in the areas of system administration, database administration, database application architecture / engineering and data warehousing are all transferable. What usually makes someone able to successfully make the leap from, say, Oracle DBA to Hadoop administrator is their motivation to learn and more curiosity about what s happening under the covers when data are loaded or a query is run. Here are the key roles your (Hadoop) Big Data team needs to fill to be complete. The first three roles are about building and operating a working Hadoop platform: 1. Data Center and OS Administrator This person is the one who stands up machines, virtual or otherwise, and presents a fully functional system with working disks, networking, etc. They re usually also responsible for all the typical data center services (LDAP/AD, DNS, etc.) and know how to leverage vendor-specific tools (e.g., Kickstart and Cobbler, or vsphere) to provision large numbers of machines quickly. This role is becoming increasingly sophisticated and/or outsourced with the move to infrastructure as a service approaches to provisioning, with the DC role installing hardware and maintaining the provisioning system, and developers interfacing with that system rather than humans. About DesignMind DesignMind uses leading edge technologies, including SQL Server, SharePoint,.NET, Tableau, Hadoop, and Platfora to develop big data solutions. Headquartered in San Francisco, we help businesses leverage their data to gain competitive advantage. DesignMind A white paper by Mark Kidwell, Principal Big Data Consultant, DesignMind September 2014 Copyright 2014 DesignMind. All rights reserved. 1
2 While this role usually isn t (shouldn t be) part of the core Big Data team, it s mentioned here because it s absolutely critical to building a Big Data platform. Anyone architecting a new platform needs to partner closely with the data center team to specify requirements and validate the build out. BACKGROUND/PREVIOUS ROLE: This role typically already exists, either in the organization or via a cloud hosted solution s web site. TRAINING/RETOOLING: If you still run your own data center operations, definitely look at large scale virtualization and server management technologies that support self-service provisioning if you haven t already. Anyone architecting a new platform needs to partner closely with the data center team to specify requirements and validate the build out. 2. Hadoop Administrator This person is responsible for set up, configuration, and ongoing operation of the software comprising your Hadoop stack. This person wears a lot of hats, and typically has a DevOps background where they re comfortable writing code and/or configuration with tools like Ansible to manage their Hadoop infrastructure. Skills in Linux and MySQL / PostgreSQL administration, security, high availability, disaster recovery and maintaining software across a cluster of machines is an absolute must. For the above reasons, many traditional DBAs aren t a good fit for this type of role even with training (but see below). BACKGROUND/PREVIOUS ROLE: Linux compute / web cluster admin, possibly a DBA or storage admin. TRAINING/RETOOLING: Hadoop admin class from Cloudera, Ansible training from ansible.com, plus lots of lab time to play with the tools. 3. HDFS/Hive/Impala/HBase/MapReduce/Spark Admin Larger Big Data environments require specialization, as it s unusual to find an expert on all the different components - tuning Hive or Impala to perform at scale is very different from tuning HBase, and different from the skills required to have a working high availability implementation of Hive or HttpFS using HAProxy. Securing Hadoop properly can also require dedicated knowledge for each component and vendor, as Cloudera s Sentry (Hive / Impala / Search security) is a different animal than Hortonworks s Knox perimeter security. Some DBAs can successfully make the transition to this area if they understand the function of each component. 2
3 DesignMind BACKGROUND/PREVIOUS ROLE: Linux cluster admin, MySQL / Postgres / Oracle DBA. See Gwen Shapira s excellent blog on Cloudera s site called the Hadoop FAQ for Oracle DBAs. TRAINING/RETOOLING: Hadoop admin and HBase classes from Cloudera, plus lots of lab time to play with all the various technologies. The next four roles are concerned with getting data into Hadoop, doing something with it, and getting it back out: 4. ETL / Data Integration Developer This role is similar to most data warehouse environments - get data from source systems of record (other databases, flat files, etc), transforming it into something useful and loading into a target system for querying by apps or users. The difference is the tool chain and interfaces the Hadoop platform provides, and the fact that Hadoop workloads are larger. Hadoop provides basic tools like Sqoop and Hive that any decent ETL developer that can write bash scripts and SQL can use, but understanding that best practice is really ELT (pushing the heavy lifting of transformation into Hadoop) and knowing which file formats to use for optimal Hive and Impala query performance are both the types of things that have to be learned. 3
4 A dangerous anti-pattern is to use generic DI tools / developers to load your Big Data platform. They usually don t do a good job producing high performance data warehouses with non-hadoop DBs, and they won t do a good job here. BACKGROUND/PREVIOUS ROLE: Data warehouse ETL developer with experience on high-end platforms like Netezza or Teradata. TRAINING/RETOOLING: Cloudera Data Analyst class covering Hive, Pig and Impala, and possibly the Developer class. Focus on delivering a usable data product quickly, rather than modeling every last data source across the enterprise. 5. Data Architect This role is typically responsible for data modeling, metadata, and data stewardship functions, and is important to keep your Big Data platform from devolving into a big mess of files and haphazardly managed tables. In a sense they own the data hosted on the platform, often have the most experience with the business domain and understand the data s meaning (and how to query it) more than anyone else except possibly the business analyst. The difference between a traditional warehouse data architect and this role is Hadoop s typical use in storing and processing unstructured data typically outside this role s domain, such as access logs and data from a message bus. For less complex environments, this role is only a part-time job and is typically shared with the lead platform architect or business analyst. An important caveat for data architects and governance functions in general assuming your Big Data platform is meant to be used for query and application workloads (vs. simple data archival), it s important to focus on delivering a usable data product quickly, rather than modeling every last data source across the enterprise and exactly how it will look as a finished product in Hadoop or Cassandra. BACKGROUND/PREVIOUS ROLE: Data warehouse data architect. TRAINING/RETOOLING: Cloudera Data Analyst class covering Hive, Pig and Impala. 4
5 6. Big Data Engineer / Architect This developer is the one writing more complex data-driven applications that depend on core Hadoop applications like HBase, Spark or SolrCloud. These are serious software engineers developing products from their data for use cases like large scale web sites, search and real-time data processing that need the performance and scalability that a Big Data platform provides. They re usually well versed in the Java software development process and Hadoop toolchain, and are typically driving the requirements around what services the platform provides and how it performs. For the Big Data architect role, all of the above applies, but this person is also responsible for specifying the entire solution that everyone else in the team is working to implement and run. They also have a greater understanding of and appreciation for problems with large data sets and distributed computing, and usually some system architecture skills to guide the build out of the Big Data ecosystem. BACKGROUND/PREVIOUS ROLE: Database engineers would have been doing the same type of high performance database-backed application development, but may have been using Oracle or MySQL as a backend previously. Architects would have been leading the way in product development that depended on distributed computing and web scale systems. TRAINING/RETOOLING: Cloudera s Data Analyst, Developer, Building Big Data Applications, and Spark classes. 5
6 7. Data Scientist This role crosses boundaries between analyst and engineer, bringing together skills in math and statistics, machine learning, programming, and a wealth of experience / expertise in the business domain. All these combined allow a data scientist to search for insights in vast quantities of data stored in a data warehouse or Big Data platform, and perform basic research that often leads to the next data product for engineers to build. The key difference between traditional data mining experts and modern data scientists is the more extensive knowledge of tools and techniques for dealing with ever growing amounts of data. My colleague Andrew Eichenbaum has written a great article on hiring data scientists. BACKGROUND/PREVIOUS ROLE: Data mining, statistician, applied mathematics. The key difference between traditional data mining experts and modern data scientists is the more extensive knowledge of tools and techniques for dealing with ever growing amounts of data. TRAINING/RETOOLING: Cloudera s Intro to Data Science and Data Analyst classes. The final roles are the traditional business-facing roles for data warehouse, BI and data services teams: 8. Data Analyst This role is largely what you d expect - using typical database client tools as the interface, they run queries against data warehouses, produce reports, and publish the results. The skills required to do this are a command of SQL, including dialects used by Hive and Impala, knowledge of data visualization, and understanding how to translate business questions into something a data warehouse can answer. BACKGROUND/PREVIOUS ROLE: Data Analysts can luckily continue to use a lot of the same analytics tools and techniques they re used to, and benefit from improvements in both performance and functionality. TRAINING/RETOOLING: Cloudera s Data Analyst class covering Hive, Pig and Impala. 9. Business Analyst Like the data analyst, this role isn t too different from what already exists in most companies today. Gathering requirements from end users, describing use cases and specifying product behavior will never go away, and are even more important when the variety of data grows along with volume. What s most different is the tools involved, their capabilities, and the general approach. The business analyst needs to be sensitive to the difference in end-user tools with Hadoop, and determine training needed for users. 6
7 BACKGROUND/PREVIOUS ROLE: Business / system analyst for previous data products, like a data warehouse / BI solution, or traditional database-backed web application. TRAINING/RETOOLING: Possibly Cloudera s Data Analyst class covering Hive, Pig and Impala, especially if this role will provide support to end users and analysts. What s the Right Approach? All of the above roles mention Hadoop and target data warehousing use cases, but many of the same guidelines apply if Apache Cassandra or similar NoSQL platform is being built out. And of course there are roles that are needed for any software or information management project project managers, QA / QC, etc. For those roles the 1 day Cloudera Hadoop Essentials class might be the best intro. The time invested in training and building out the team will pay off when it s time to leverage your data at scale with the production platform, improving the odds of your Big Data program being a success. So with the above roles defined, what s the right approach to building out a Big Data team? A Big Data program needs a core group of at least one each of an architect, admin, engineer and analyst to be successful, and for a team that small there s still a lot of cross over between roles. If there s an existing data services or data warehouse team taking on the Big Data problem then it s important to identify who will take on those roles and train them, or identify gaps that have to be filled by new hires. Of course the team needs to scale up as workload is added (users, apps and data), also accounting for existing needs. Transforming into a Big Data capable organization is a challenging prospect, so it s also a good idea to start with smaller POCs and use a lab environment to test the waters. The time invested in training and building out the team will pay off when it s time to leverage your data at scale with the production platform, improving the odds of your Big Data program being a success. Mark Kidwell specializes in designing, building, and deploying custom end-to-end Big Data solutions and data warehouses. DesignMind 150 Spear Street, Suite 700, San Francisco, CA designmind.com Copyright 2014 DesignMind. All rights reserved. 7
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth
MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager [email protected]
Consulting and Systems Integration (1) Networks & Cloud Integration Engineer
Ericsson is a world-leading provider of telecommunications equipment & services to mobile & fixed network operators. Over 1,000 networks in more than 180 countries use Ericsson equipment, & more than 40
Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013
Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the
Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
Ganzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
White Paper: Evaluating Big Data Analytical Capabilities For Government Use
CTOlabs.com White Paper: Evaluating Big Data Analytical Capabilities For Government Use March 2012 A White Paper providing context and guidance you can use Inside: The Big Data Tool Landscape Big Data
Modernizing Your Data Warehouse for Hadoop
Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist [email protected] O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
Cisco IT Hadoop Journey
Cisco IT Hadoop Journey Alex Garbarini, IT Engineer, Cisco 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases
Cost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Integrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
More Data in Less Time
More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational
Big Data at Cloud Scale
Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For
Using Cloud Services for Test Environments A case study of the use of Amazon EC2
Using Cloud Services for Test Environments A case study of the use of Amazon EC2 Lee Hawkins (Quality Architect) Quest Software, Melbourne Copyright 2010 Quest Software We are gathered here today to talk
The Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
Bringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
BIG DATA & DATA SCIENCE
BIG DATA & DATA SCIENCE ACADEMY PROGRAMS IN-COMPANY TRAINING PORTFOLIO 2 TRAINING PORTFOLIO 2016 Synergic Academy Solutions BIG DATA FOR LEADING BUSINESS Big data promises a significant shift in the way
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
Big Data and Apache Hadoop Adoption:
Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards
Using Hadoop to Expand Data Warehousing
Using Hadoop to Expand Data Warehousing Mike Peterson VP of Platforms and Data Architecture, Neustar Feb 28, 2013 1 Copyright Think Big Analytics and Neustar Inc. Why do this? Transforming to an Information
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
Virtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload
Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team [email protected] @rob1lancaster Organizer of Chicago
SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES
SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT
Bigg-Data LLC, Data Scientists Hadoop Developers/Administrators
Bigg-Data LLC, is a Software Solutions Technical training and resources firm. We are the first Professional Solutions Company in the country that specializes in providing big data training and resources.
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.
Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
Self-service BI for big data applications using Apache Drill
Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Management - MCS MapR Data Platform for Hadoop and NoSQL APACHE HADOOP AND OSS ECOSYSTEM Batch
Datenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
How to avoid building a data swamp
How to avoid building a data swamp Case studies in Hadoop data management and governance Mark Donsky, Product Management, Cloudera Naren Korenu, Engineering, Cloudera 1 Abstract DELETE How can you make
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM
David Chappell SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM A PERSPECTIVE FOR SYSTEMS INTEGRATORS Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Business
Presenters: Luke Dougherty & Steve Crabb
Presenters: Luke Dougherty & Steve Crabb About Keylink Keylink Technology is Syncsort s partner for Australia & New Zealand. Our Customers: www.keylink.net.au 2 ETL is THE best use case for Hadoop. ShanH
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
Safe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013
Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device
Has been into training Big Data Hadoop and MongoDB from more than a year now
NAME NAMIT EXECUTIVE SUMMARY EXPERTISE DELIVERIES Around 10+ years of experience on Big Data Technologies such as Hadoop and MongoDB, Java, Python, Big Data Analytics, System Integration and Consulting
Apache Sentry. Prasad Mujumdar [email protected] [email protected]
Apache Sentry Prasad Mujumdar [email protected] [email protected] Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP
PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP Your business is swimming in data, and your business analysts want to use it to answer the questions of today and tomorrow. YOU LOOK TO
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Self-service BI for big data applications using Apache Drill
Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Data Is Doubling Every Two Years Unstructured data will account for more than 80% of the data
Big Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
Introduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!
Dominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
Ubuntu and Hadoop: the perfect match
WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely
Please give me your feedback
Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &
Cisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
Reference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
Securing Hadoop. Sudheesh Narayanan. Chapter No.1 "Hadoop Security Overview"
Securing Hadoop Sudheesh Narayanan Chapter No.1 "Hadoop Security Overview" In this package, you will find: A Biography of the author of the book A preview chapter from the book, Chapter NO.1 "Hadoop Security
WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution
WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies
YOUR APP. OUR CLOUD.
YOUR APP. OUR CLOUD. The Original Mobile APP! Copyright cloudbase.io 2013 2 THE MARKET Mobile cloud market in billions of $ $ 16 $ 14 $ 12 $ 10 $14.5bn The size of the mobile cloud market in 2015 $ 8 $
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Using Tableau Software with Hortonworks Data Platform
Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data
WHAT S NEW IN SAS 9.4
WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?
Apache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer [email protected], twitter: @awadallah Hadoop Past
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Buying vs. Building Business Analytics. A decision resource for technology and product teams
Buying vs. Building Business Analytics A decision resource for technology and product teams Introduction Providing analytics functionality to your end users can create a number of benefits. Actionable
Peninsula Strategy. Creating Strategy and Implementing Change
Peninsula Strategy Creating Strategy and Implementing Change PS - Synopsis Professional Services firm Industries include Financial Services, High Technology, Healthcare & Security Headquartered in San
TOP 8 TRENDS FOR 2016 BIG DATA
The year 2015 was an important one in the world of big data. What used to be hype became the norm as more businesses realized that data, in all forms and sizes, is critical to making the best possible
The Enterprise Data Hub and The Modern Information Architecture
The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader
Data Analytics Infrastructure
Data Analytics Infrastructure Data Science SG Nov 2015 Meetup Le Nguyen The Dat @lenguyenthedat Backgrounds ZALORA Group (2013 2014) o Biggest online fashion retails in South East Asia o Data Infrastructure
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview
BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM An Overview Contents Contents... 1 BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM... 1 Program Overview... 4 Curriculum... 5 Module 1: Big Data: Hadoop
Net Developer Role Description Responsibilities Qualifications
Net Developer We are seeking a skilled ASP.NET/VB.NET developer with a background in building scalable, predictable, high-quality and high-performance web applications on the Microsoft technology stack.
Apache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
Interactive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
