New Modeling Challenges: Big Data, Hadoop, Cloud
|
|
- Kevin Lee
- 8 years ago
- Views:
Transcription
1 New Modeling Challenges: Big Data, Hadoop, Cloud Karen Karen Lopez Love Your Data Senior Project Manager & Architect 1
2 Abstract We data architects have done a great job maturing the data management profession: we have tools with decades of usability, technical features and relational database support. Our methods have been tested and tailored across many industries. We have books, training materials, online communities and more.and yet the advent of new methods and tools is putting us at risk for becoming irrelevant on modern data-driven projects. What do we do when our teams use Hadoop and other NoSQL technologies Will our methods work? What about our tools? Do those teams even need a data architect? Are data management skills no longer needed? How can we leverage our existing enterprise data models? In this presentation Karen covers the basics of Big Data, Hadoop, and NoSQL then shows how modern data architects should respond to the ever-changing environment of data architectures. She'll help you separate the myths from the misunderstandings and show you how to continue to leverage your data modeling assets. 4 2
3 Disclosure I m a Data Architect / Modeler I am biased You have been warned NoSQLNow! - D'Antoni / Lopez POLL: Who Are You? 3
4 7 POLL: Architect Much? NoSQLNow! - D'Antoni / Lopez 8 POLL: Cloud Much? NoSQLNow! - D'Antoni / Lopez 4
5 NoSQLNow! - D'Antoni / Lopez POLL: SQL Much? NoSQL Much? Hadoop Much? What we will be doing Terminology, Foundations and Thoughts. Modern Architectural Components Architecting versus building Cost, benefit, risk Demos! 5
6 What would you like to do? Outcomes 12 Concepts Features Existing Tech New Tech Hybrid Evaluate Pilot Production Benefits Trade-offs What Where When Why Architects Devs DBAs Who Modeling Dev Test How Costs Trade-offs How much 6
7 Terminology and Thoughts Because we need a vocabulary to share ideas Hadoop Cloud Polyglot Schemaless Scale Up / Out Volume Read-Optimized Analytics Unstructured data Write-Optimized Data versus Processing BIG DATA Variable Data Data Reservoir Logical Data Model Data Lake 7
8 Polyschematic SQL Server HA / DR On Premises Volume External Data JBIC data ROI Relational Read-Optimized Persistence Velocity Business Intelligence Physical Data Model Data Swamp Architect Fit Business Technology Data Protection Cost, Benefit, Risk 8
9 BIG DATA [x] Vs Data so big it s awkward to work with Always capitalized Big Data A confusing term because it defines what it IS NOT. NoSQL Scale Not SQL? Not Relational? Not Only Relational? A confusing term because it defines what it IS NOT. 9
10 Terminology ACID BASE Atomic, Consistent, Isolated, Durable Basically available, Soft state, Eventual consistency Eventual consistency Schemaless Constraints / Have-to/ MUST / OBEY / Rigid / Inflexible Relational Tables with rows same columns with the same datatypes with the same constraints with the same domains On purpose With many benefits Write-optimized This is a FEATURE Transactionoptimized Data integrity Data quality Consistent 10
11 Data Modeling Now What Data Models? Reverse engineered data models (diagrams) Physical Data Models Faux Logical Data Models Logical Data Models Conceptual Data Models Data Models Traditional Process Conceptual (Data) Model Logical Data Model Physical Data Model(s) MART OLTP OLTP OLTP OLTP OLTP Aug 2014 OLTP OLTP MART OLTP 11
12 Relational Traditional Data Architect Involvement Project Initiation Architecture and Infrastructure Design SW Requirements Development Deployment 12
13 The Big Data Story Lots of data Coming at us fast Lots of variety in format & quality We want all the data Highly available It s web scale What do we really mean by scale? Bringing computing to the data Massively parallel processing Cheap, commodity hardware, but lots of it Optimized for Query/Reads/Questions/Telling stories 13
14 Can we fit another buzzword in? Cloud Enable on-demand scaling Pay as you go pricing Click to deploy Service licensing, not product licensing, if any Managed by others, not your data center But The Cloud is Different Fine Tuning nope Patching, maintaining Skills Professional development Optimized subsystems Putting off upgrades How you work Who you work for How you get new features How you stay up to date How you think about problems 14
15 We ve been down this road before Traditional transactional applications Reportingoptimized tables/struct ures Data Warehouse / Dimensional Modeling There was a lot of contention 15
16 NoSQL, Not Only SQL Relational Graph Columnar/Column Family Key Value Document Databases Others Graph Databases 16
17 Key Value Pair Columnar InfoAdvisors - infoadvisors.com Aug
18 Modern Architectural Components & Concepts Hadoop Ecosystem (a Zoo) Pipeline / Workflow (Oozie) Event Pipeline (Flume) NoSQL Database (HBase) Metadata (HCatalog) Scripting (Pig) Graph (Pegasus) Stats processing (RHadoop) Query (Hive) Distributed Processing (MapReduce) Distributed Storage (HDFS) Machine Learning (Mahout) Data Integration ( ODBC / SQOOP/ REST) Legend Red = Core Hadoop Blue = Data processing Purple = Microsoft integration points and value adds Yellow = Data Movement Green = Packages 18
19 Scripting (Pig) SQL-like Query (HiveQL) Hadoop Zoo data warehouse Distributed Processing (MapReduce) SQL-like Query (Impala) Distributed Storage (HDFS) More Hadoop STORM Real time Streaming Topologies vs. MapReduce Jobs 19
20 MapReduce Shuffle and sort data Get from large data to smaller data Parallel processing Hive SQL-like query language Abstraction on top of MapReduce Metastructure on top of HDFS 20
21 HBase Column Family NoSQL database Key Value Store Hundreds of Millions/Billions of rows Based on Google Big Table Another type of HDInsight Cluster Large, Schemaless (really, Schema on read) Optimized for retrieving specific rows from large datasets Strictly Consistent Pig Pig Latin GRUNT # Create the Pig job definition $0 = '$0'; $QueryString = "LOGS = LOAD 'wasb:///example/data/sample.log';" + "LEVELS = foreach LOGS generate REGEX_EXTRACT($0, '(TRACE DEBUG INFO WARN ERROR FATAL)', 1) as LOGLEVEL;" + "FILTEREDLEVELS = FILTER LEVELS by LOGLEVEL is not null;" + "GROUPEDLEVELS = GROUP FILTEREDLEVELS by LOGLEVEL;" + "FREQUENCIES = foreach GROUPEDLEVELS generate group as LOGLEVEL, COUNT(FILTEREDLEVELS.LOGLEVEL) as COUNT;" + "RESULT = order FREQUENCIES by COUNT desc;" + "DUMP RESULT;" $pigjobdefinition = New-AzureHDInsightPigJobDefinition -Query $QueryString - StatusFolder $statusfolder 21
22 YARN Hadoop operating system Support more use cases batch interactive stream processing. Management Framework Better scalability & cluster utilization capacity guarantees, fairness, and service-level agreements. Classic DW Architecture ETL Data Mart EDW Data Mart 22
23 Discussion - Who What technical skills are needed? What professional skills are needed? What personal characteristics are needed? Discussion - Tools What data tools are needed? What data architecture tools are needed? What project management tools are needed? 23
24 Discussion - Money How much does a traditional DW/BI project cost? How much does operations & maintenance cost? How much does the software cost? How much do servers cost? Modern DW Architecture Hadoop ETL Analytics Mart EDW Data Mart 24
25 DocumentDB In Azure, Database Service not a product Document based (JSON) SQL Queries { "glossary": { "title": "example glossary", "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", Language", } } } } } "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"] }, "GlossSee": "markup" What s new Computing Storage Icing 25
26 What s new Analytics Mart HDInsight & Blob Storage HDFS or Azure Blob Storage HDFS on compute nodes Compute have to run Blob storage is independent GeoRedundancy Reuse data Delete computing nodes Blob is cheaper Elastic blob MapReduce & HDFS 26
27 Scale and Nodes Scale up versus scale out Nodes, not beefier servers Data stays, clusters come and go Quick Create for playing Custom Create for architecting a solution Every design decision should include cost, benefit and risk - Karen Lopez 27
28 Common Themes In-memory capabilities Build Now, Analyze Later Read optimized workloads Optimized for bulk loads, not singleton updates Most products have HA/DR built in by design Framework Most products have scale out capabilities Text Cloud vs On-Premises Short Term Use Rapid Scale Test Use Cases Pay as you go Internet data source On-Premises Large long term implementations Well known workloads Shared clusters Large initial investment 28
29 Windows vs. Other OSs (Linux) Graph Databases 29
30 There s a book And it s FREE! Graphdatabases.com Document Databases JSON, BSON, XML, YAML 30
31 Key Value Pair Key Value Pair 31
32 Cassandra CQL SELECT balance FROM accounts WHERE account_id=3476 CREATE TABLE monkeyspecies ( species text PRIMARY KEY, common_name text, population varint, average_size int ) WITH comment='important biological records' AND read_repair_chance = 1.0; Columnar 64 32
33 => CREATE TABLE Retail.Product_Dimension ( Product_Key integer NOT NULL, Product_Description varchar(128), SKU_Number char(32) NOT NULL, Category_Description char(32), Department_Description char(32) NOT NULL, Package_Type_Description char(32), Package_Size char(32), Fat_Content integer, Diet_Type char(32), Weight integer, Weight_Units_of_Measure char(32), Shelf_Width integer, Shelf_Height integer, Shelf_Depth integer ); Vertica SQL SELECT balance FROM accounts WHERE account_id=3476 The CLOUD Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Database as a Service (DBaaS)66 33
34 Demo: Hadoop and Relational, Together HDInsight: Hadoop Clusters in the Cloud One of the enterprise-focused implementations of Hadoop Based on Windows Architecture Cloud service Demo-ed here as a representation of functionality, not a recommendation 34
35 HDInsight Demo Components Blob Storage Storage Container HDInsight Cluster PowerShell Azure SQL DB Demo Process Azure Administration Create Storage Account Create Container Create Optional SQL DB Create HDInsight Cluster Subscription Payment Naming Standard Location Metadata External Table Options Location 35
36 Let s talk billing. Per hour Per node head node security node data nodes Data egress, not ingress Geo-redundancy Blob versus HDFS HDFS HDInsight 36
37 Blob versus HDFS HDFS HDInsight Where Data Models Can Help Create a Physical Data Model of Key Data Sources One Entity per tab or Worksheet Someone Normalized (master data + combinations) Model the metadata definitions gotchas expected domains metadata Published in a shareable portal Cross model searchability Data Lineage Resuse 37
38 Demo: Sales and Rebates IRS Data 38
39 The Data Story * Demo: Columnar Graph Database 39
40 Exercise How Might a Modern hybrid data architecture changes YOUR ORGANIZATION? What questions might you be able to answer that you can t answer now? Discussion - Who What technical skills are needed? What professional skills are needed? What personal characteristics are needed? 40
41 Discussion - Tools What data tools are needed? What data architecture tools are needed? What project management tools are needed? Discussion - Money How much does a traditional DW/BI project cost? How much does operations & maintenance cost? How much does the software cost? How much do servers cost? 41
42 Modern Data Architect Involvement Project Initiation Architecture and Infrastructure Design SW + Data Requirements Development Deployment Anti-patterns for new DBs & Data It doesn t matter if data changes Schemaless means faster delivery We don t need a data architect We don t have to steward data We don t need ANY architects Its selfdocumented We don t have to understand data Cc Kazuhisa OTSUBO 42
43 External Data Licensing Propriety Datasets Open Datasets Crowsourced Formats XML CSV MS Word/PowerPoint/Excel PDF Dictionaries/Glossaries 43
44 10 Tips for Architects Understand the use cases for hybrid technologies. Evaluate / profile your data requirements for suitability for each database/datastore type. Understand the licensing / editions for commercial database features / products. Hadoop and other NoSQL technologies are optimized for read. Data Modeling and Analysis still happens, just later*. 44
45 10 Tips for Architects Hands on experience makes a real difference in your understanding. Evaluate which analytical features each product supports. Test your current development tools for support. Test your database design / data modeling tools. Leverage your existing metadata / models. Finally There business models of Open Source technologies means there are hundreds, maybe thousands of new things to learn. That s both a pro and a con for the architect. 45
46 Outcomes Concepts Features Existing Tech New Tech Hybrid Evaluate Pilot Production Benefits Trade-offs What Where When Why Architects Devs DBAs Who Modeling Dev Test How Costs Trade-offs How much Wrap up NoSQL = SQL + NoSQL External Data will become more important Modern Data Architecture = Hybrids, not either/or 46
47 Karen Lopez Love Your Data Senior Project Manager &
48 Thank You Thank You 48
New Modeling Challenges: Big Data, Hadoop, Cloud
New Modeling Challenges: Big Data, Hadoop, Cloud Karen López @datachick www.datamodel.com Karen Lopez Love Your Data InfoAdvisors.com @datachick Senior Project Manager & Architect 1 Disclosure I m a Data
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationOpen Source Technologies on Microsoft Azure
Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions
More informationBringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationHadoop Introduction. Olivier Renault Solution Engineer - Hortonworks
Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationAzure Data Lake Analytics
Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data
More informationData Services Advisory
Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains
More informationModernizing Your Data Warehouse for Hadoop
Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking
More informationMicrosoft Azure Data Technologies: An Overview
David Chappell Microsoft Azure Data Technologies: An Overview Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Blobs... 3 Running a DBMS in a Virtual Machine... 4 SQL Database...
More informationTRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationUnderstanding NoSQL Technologies on Windows Azure
David Chappell Understanding NoSQL Technologies on Windows Azure Sponsored by Microsoft Corporation Copyright 2013 Chappell & Associates Contents Data on Windows Azure: The Big Picture... 3 Windows Azure
More informationIntegrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013
Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the
More informationThe Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationHadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
More informationIntegrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
More informationBig Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
More informationOracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
More informationYou should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationHadoop. for Oracle database professionals. Alex Gorbachev Calgary, AB September 2013
Hadoop for Oracle database professionals Alex Gorbachev Calgary, AB September 2013 Alex Gorbachev Chief Technology Officer at Pythian Blogger Cloudera Champion of Big Data OakTable Network member Oracle
More informationCOSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015
COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt
More informationData Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
More informationSQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse
SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale
More informationNative Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
More informationPeers Techno log ies Pv t. L td. HADOOP
Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationAgile Business Intelligence Data Lake Architecture
Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step
More informationUnderstanding NoSQL on Microsoft Azure
David Chappell Understanding NoSQL on Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Data on Azure: The Big Picture... 3 Relational Technology: A Quick
More informationExtending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago
More informationGAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationBig Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationProgramming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationCloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
More informationSQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationbrief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385
brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationData Governance in the Hadoop Data Lake. Michael Lang May 2015
Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales
More informationPro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah
Pro Apache Hadoop Second Edition Sameer Wadkar Madhu Siddalingaiah Contents J About the Authors About the Technical Reviewer Acknowledgments Introduction xix xxi xxiii xxv Chapter 1: Motivation for Big
More informationWINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS
WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do this, Windows Azure provides a range of technologies
More informationThe evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through
More informationMicrosoft Big Data Solutions. Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com;
Microsoft Big Data Solutions Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com; Why/What is Big Data and Why Microsoft? Options of storage and big data processing in Microsoft Azure. Real Impact of Big
More informationSQL on NoSQL (and all of the data) With Apache Drill
SQL on NoSQL (and all of the data) With Apache Drill Richard Shaw Solutions Architect @aggress Who What Where NoSQL DB Very Nice People Open Source Distributed Storage & Compute Platform (up to 1000s of
More informationAligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
More informationBIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationGetting Started with Hadoop. Raanan Dagan Paul Tibaldi
Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop
More informationHDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationHow to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
More informationReference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationBig Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park
Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable
More informationBuilding a BI Solution in the Cloud
Building a BI Solution in the Cloud Stacia Varga, Principal Consultant Email: stacia@datainspirations.com Twitter: @_StaciaV_ 2 SQLSaturday #467 Sponsors Stacia (Misner) Varga Over 30 years of IT experience,
More informationTransforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationConstructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationDatenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationEnterprise Operational SQL on Hadoop Trafodion Overview
Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development
More informationRoadmap Talend : découvrez les futures fonctionnalités de Talend
Roadmap Talend : découvrez les futures fonctionnalités de Talend Cédric Carbone Talend Connect 9 octobre 2014 Talend 2014 1 Connecting the Data-Driven Enterprise Talend 2014 2 Agenda Agenda Why a Unified
More informationNext Gen Hadoop Gather around the campfire and I will tell you a good YARN
Next Gen Hadoop Gather around the campfire and I will tell you a good YARN Akmal B. Chaudhri* Hortonworks *about.me/akmalchaudhri My background ~25 years experience in IT Developer (Reuters) Academic (City
More informationBringing Big Data into the Enterprise
Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?
More informationSQLSaturday #399 Sacramento 25 July, 2015. Big Data Analytics with Excel
SQLSaturday #399 Sacramento 25 July, 2015 Big Data Analytics with Excel Presenter Introduction Peter Myers Independent BI Expert Bitwise Solutions BBus, SQL Server MCSE, SQL Server MVP since 2007 Experienced
More informationBig Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016
Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationCollaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
More informationBig Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
More informationSentimental Analysis using Hadoop Phase 2: Week 2
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
More informationextensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationDominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
More informationDavid Chappell. Data in a PaaS World. A Guide for New Applications. Sponsored by Microsoft Corporation. Copyright 2016 Chappell & Associates
David Chappell Data in a PaaS World A Guide for New Applications Sponsored by Microsoft Corporation Copyright 2016 Chappell & Associates Contents The Rise of PaaS Data Services... 3 The Value of PaaS for
More informationCopyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. DATA MANAGEMENT FOR ANALYTICS
DATA MANAGEMENT FOR ANALYTICS WHAT IS ANALYTICS? A VERY BROAD TERM OFTEN CONFUSED Descriptive What happened? When? Why? Advanced What will happen? When? Why? How do we benefit? What actions should I take?
More informationBIG DATA - HADOOP PROFESSIONAL amron
0 Training Details Course Duration: 30-35 hours training + assignments + actual project based case studies Training Materials: All attendees will receive: Assignment after each module, video recording
More informationTesting Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
More informationINTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
More informationTap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
More information#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
More informationBig Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
More informationThe Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader
The Digital Enterprise Demands a Modern Integration Approach Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader Yesterday s approach to data and application integration is a barrier
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationPractical Hadoop by Example
Practical Hadoop by Example for relational database professioanals Alex Gorbachev 12-Mar-2013 New York, NY Alex Gorbachev Chief Technology Officer at Pythian Blogger OakTable Network member Oracle ACE
More information