Stinger Initiative: Introduction
|
|
|
- Clarissa Evans
- 10 years ago
- Views:
Transcription
1 Stinger Initiative: Introduction Interactive Query on Hadoop Chris Harris [email protected] Twitter : cj_harris5 Page 1
2 The World of Data is Changing Data Explosion 1 Zettabyte (ZB) = 1 Billion TBs 15x growth rate of machine generated data by 2020 Source: IDC By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent. Gartner, Mark Beyer, Information Management in the 21st Century Page 2
3 What is Hadoop? Hadoop is a new data platform, that can store more data, more kinds of data, and perform more flexible analyses Hadoop is open source and runs on industry standard hardware, so it's 1-2 orders of magnitude more economical than conventional data solutions Hadoop provides more cost effective storage, processing, and analysis. Some existing workloads run faster, cheaper, better Page 3
4 Hadoop Enterprise Use Cases Vertical Use Case Data Type Financial Services Telecom Retail Manufacturing New Account Risk Screens Text, Server Logs Fraud Prevention Server Logs Trading Risk Server Logs Maximize Deposit Spread Text, Server Logs Insurance Underwriting Geographic, Sensor, Text Accelerate Loan Processing Text Call Detail Records (CDRs) Machine, Geographic Infrastructure Investment Machine, Server Logs Next Product to Buy (NPTB) Clickstream Real-time Bandwidth Allocation Server Logs, Text, Sentiment New Product Development Machine, Geographic 360 View of the Customer Clickstream, Text Analyze Brand Sentiment Sentiment Localized, Personalized Promotions Geographic Website Optimization Clickstream Optimal Store Layout Sensor Supply Chain and Logistics Sensor Assembly Line Quality Assurance Sensor Proactive Maintenance Machine Crowdsourced Quality Assurance Sentiment Page
5 A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform : Yahoo! creates team under E14 to work on Hadoop Focus on INNOVATION Enterprise Hadoop 2008: Yahoo team extends focus to operations to support multiple projects & growing clusters Focus on OPERATIONS 2011: Hortonworks created to focus on Enterprise Hadoop. Starts with 24 key Hadoop engineers from Yahoo STABILITY Page 5
6 Leadership that Starts at the Core Driving next generation Hadoop YARN, MapReduce2, HDFS2, High Availability, Disaster Recovery 420k+ lines authored since 2006 More than twice nearest contributor Deeply integrating w/ecosystem Enabling new deployment platforms (ex. Windows & Azure, Linux & VMware HA) Creating deeply engineered solutions (ex. Teradata big data appliance) All Apache, NO holdbacks 100% of code contributed to Apache Page 6
7 Operational Data Refinery Refine Explore Enrich APPLICATIONS Business Analy;cs Custom Applica;ons Enterprise Applica;ons Collect data and apply a known algorithm to it in trusted operational process DATA SYSTEMS RDBMS EDW MPP TRADITIONAL REPOS Capture Capture all data 2 Process Parse, cleanse, apply structure & transform DATA SOURCES Tradi;onal Sources (RDBMS, OLTP, OLAP) New Sources (web logs, , sensor data, social media) 1 3 Exchange Push to existing data warehouse for use with existing analytic tools Page 7
8 Key Capability in Hadoop: Late binding With tradi;onal ETL, structure must be agreed upon far in advance and is difficult to change. MACHINE GENERATED Store Transformed Data WEB LOGS, CLICK STREAMS OLTP ETL Server Data Mart / EDW Client Apps With Hadoop, capture all data, structure data as business need evolve. MACHINE GENERATED HORTONWORKS DATA PLATFORM Dynamically Apply Transforma8ons WEB LOGS, CLICK STREAMS OPERATIONAL SERVICES DATA SERVICES HADOOP CORE OLTP Hortonworks HDP Data Mart / EDW Client Apps Page 8
9 Big Data Exploration & Visualization Refine Explore Enrich APPLICATIONS Business Analy;cs Custom Applica;ons Enterprise Applica;ons Collect data and perform iterative investigation for value 3 DATA SYSTEMS RDBMS EDW MPP TRADITIONAL REPOS 2 1 Capture Capture all data 2 Process Parse, cleanse, apply structure & transform DATA SOURCES Tradi;onal Sources (RDBMS, OLTP, OLAP) New Sources (web logs, , sensor data, social media) 1 3 Exchange Explore and visualize with analytics tools supporting Hadoop Page 9
10 Visualization Tooling Robust visualization and business tooling Ensures scalability when working with large datasets Native Excel support Web browser support Mobile support Page 10
11 Enhancing the Core of Apache Hadoop Deliver high-scale storage & processing with enterprise-ready platform services HADOOP CORE PLATFORM SERVICES MAP REDUCE HDFS YARN (in 2.0) Enterprise Readiness Unique Focus Areas: Bigger, faster, more flexible Continued focus on speed & scale and enabling near-real-time apps Tested & certified at scale Run ~1300 system tests on large Yahoo clusters for every release Enterprise-ready services High availability, disaster recovery, snapshots, security, Page 11
12 Data Services for Full Data Lifecycle HADOOP CORE PLATFORM SERVICES FLUME SQOOP DATA SERVICES PIG HIVE HCATALOG WEBHDFS Distributed Storage & Processing Enterprise Readiness HBASE Provide data services to store, process & access data in many ways Unique Focus Areas: Apache HCatalog Metadata services for consistent table access to Hadoop data Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools Apache HBase NoSQL database for Hadoop WebHDFS Access Hadoop files via scalable REST API Talend Open Studio for Big Data Graphical data integration tools Page 12
13 Organize Tiers and Process with Metadata Access Tier Conform, Summarize, Access Pig HiveQL Organize data based on source/derived relationships Allows for fault and rebuild process Gold Tier Work Tier Raw Tier Transform, Integrate, Storage Standardize, Cleanse, Transform Pig MapReduce Pig MapReduce HCat Provides unified metadata access to Pig, Hive & MapReduce Extract & Load WebHDFS Flume Sqoop Page 13
14 Hive Current Focus Area Non- Real-Time Interactive Batch Interactive Online systems R-T analytics CEP Parameterized Reports Drilldown Visualization Exploration Data preparation Incremental batch processing Dashboards / Scorecards Operational batch processing Enterprise Reports Data Mining Current Hive Sweet Spot 0-5s 5s 1m 1m 1h 1h+ Data Size Page 14
15 Stinger: Extending Hive s Sweetspot Non- Real-Time Interactive Batch Interactive Online systems R-T analytics CEP Parameterized Reports Drilldown Visualization Exploration Future Hive Expansion Data preparation Incremental batch processing Dashboards / Scorecards Operational batch processing Enterprise Reports Data Mining Current Hive Sweet Spot 0-5s 5s 1m 1m 1h 1h+ Data Size Improve Latency & Throughput Query engine improvements New Optimized RCFile column store Next-gen runtime (elim s M/R latency) Extend Deep Analytical Ability Analytics functions Improved SQL coverage Continued focus on core Hive use cases Page 15
16 Hadoop 2.0 The Enterprise Generation 1.0 Architected for the Large Web Properties Business Value 2.0 Architected for the Broad Enterprise Single PlaTorm Mul;ple Use Enterprise Requirements Mixed workloads Interactive Query Reliability Point in time Recovery Multi Data Center ZERO downtime Security Hadoop 2.0 Features YARN Hive on Tez Full Stack HA Snapshots Disaster Recovery Rolling Upgrades Knox Gateway BATCH INTERACTIVE ONLINE Big Data Transactions, Interactions, Observations
17 The Stinger Initiative Interactive Query on Hadoop Page 17
18 Stinger Initiative At A Glance Page 18
19 Base Optimizations: Intelligent Optimizer Introduction of In-Memory Hash Join: For joins where one side fits in memory: New in-memory-hash-join algorithm. Hive reads the small table into a hash table. Scans through the big file to produce the output. Introduction of Sort-Merge-Bucket Join: Applies when tables are bucketed on the same key. Dramatic speed improvements seen in benchmarks. Other Improvements: Lower the footprint of the fact tables in memory. Enable the optimizer to automatically pick map joins. Page 19
20 Dimensionally Structured Data Extremely common pattern in EDW. Results in large fact tables and small dimension tables. Dimension tables often small enough to fit in RAM. Sometimes called Star Schema. Page 20
21 A Query on Dimensional Data Derived from TPC-DS Query 27 SELECT col5, avg(col6) FROM fact_table join dim1 on (fact_table.col1 = dim1.col1) join dim2 on (fact_table.col2 = dim2.col1) join dim3 on (fact_table.col3 = dim3.col1) join dim4 on (fact_table.col4 = dim4.col1) GROUP BY col5 ORDER BY col5 LIMIT 100; Dramatic speedup on Hive 0.11 Page 21
22 Star Schema Join Improvements in 0.11 Page 22
23 ORCFile - Optimized Column Storage Make a better columnar storage file Tightly aligned to Hive data model Decompose complex row types into primitive fields Better compression and projection Only read bytes from HDFS for the required columns. Store column level aggregates in the files Only need to read the file meta information for common queries Stored both for file and each section of a file Aggregates: min, max, sum, average, count Allows fast access by sorted columns Ability to add bloom filters for columns Enables quick checks for whether a value is present Page 23
24 Yarn Moving Hive and Hadoop beyond MapReduce Page 24
25 Hadoop 2.0 Innovations - YARN Focus on scale and innovation Support 10,000+ computer clusters Extensible to encourage innovation Next generation execution Improves MapReduce performance Supports new frameworks beyond MapReduce Low latency, Streaming, Services Do more with a single Hadoop cluster MapReduce Tez Graph Processing Other YARN: Cluster Resource Management HDFS Redundant, Reliable Storage
26 Tez Moving Hive and Hadoop beyond MapReduce Page 26
27 Tez Low level data-processing execution engine Use it for the base of MapReduce, Hive, Pig, Cascading etc. Enables pipelining of jobs Removes task and job launch times Hive and Pig jobs no longer need to move to the end of the queue between steps in the pipeline Does not write intermediate output to HDFS Much lighter disk and network usage Built on YARN Page 27
28 Tez - Core Idea Task with pluggable Input, Processor & Output Input Processor Output Task Tez Task - <Input, Processor, Output> YARN ApplicationMaster to run DAG of Tez Tasks Page 28
29 Tez Blocks for building tasks MapReduce Map MapReduce Reduce HDFS Input Map Processor Sorted Output Shuffle Input Reduce Processor HDFS Output MapReduce Map Task MapReduce Reduce Task Intermediate Reduce for Map-Reduce-Reduce Shuffle Input Reduce Processor Sorted Output Intermediate Reduce for Map- Reduce-Reduce Page 29
30 Tez More tasks Special Pig/Hive Map In-memory Map HDFS Input Map Processor Pipeline Sorter Output HDFSIn put Map Processor Inmemory Sorted Output Tez Task Tez Task Special Pig/Hive Reduce Shuffle Skipmerge Input Reduce Processor Sorted Output Tez Task Page 30
31 Pig/Hive-MR versus Pig/Hive-Tez SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a JOIN b ON (a.id = b.id) JOIN c ON (a.itemid = c.itemid) GROUP BY a.state Job 1 Job 2 I/O Synchronization Barrier I/O Synchronization Barrier Single Job Job 3 Pig/Hive - MR Pig/Hive - Tez Page 31
32 FastQuery: Beyond Batch with YARN Tez Generalizes Map-Reduce Simplified execution plans process data more efficiently Always-On Tez Service Low latency processing for all Hadoop data processing Page 32
33 Recap and Questions: Hive Performance Page 33
34 Improving Hive s SQL Support Page 34
35 Stinger: Deep Analytical Capabilities SQL:2003 Window Functions OVER clauses Multiple PARTITION BY and ORDER BY supported Windowing supported (ROWS PRECEDING/FOLLOWING) Large variety of aggregates RANK FIRST_VALUE LAST_VALUE LEAD / LAG Distrubutions Page 35
36 Hive Data Type Conformance Data Types: Add fixed point NUMERIC and DECIMAL type (in progress) Add VARCHAR and CHAR types with limited field size Add DATETIME Add size ranges from 1 to 53 for FLOAT Add synonyms for compatibility BLOB for BINARY TEXT for STRING REAL for FLOAT SQL Semantics: Sub-queries in IN, NOT IN, HAVING. EXISTS and NOT EXISTS Page 36
37 Questions? Page 37
Apache Hadoop's Role in Your Big Data Architecture
Apache Hadoop's Role in Your Big Data Architecture Chris Harris EMEA, Hortonworks [email protected] Twi
Big Data Realities Hadoop in the Enterprise Architecture
Big Data Realities Hadoop in the Enterprise Architecture Paul Phillips Director, EMEA, Hortonworks [email protected] +44 (0)777 444 3857 Hortonworks Inc. 2012 Page 1 Agenda The Growth of Enterprise
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
Big Data: Making Sense of it all!
Big Data: Making Sense of it all! Jamie Engesser E-mail : [email protected] Page 1 Data Driven Business? Facts not Intuition! Data driven decisions are better decisions its as simple as that. Using
Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks
Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
Next Gen Hadoop Gather around the campfire and I will tell you a good YARN
Next Gen Hadoop Gather around the campfire and I will tell you a good YARN Akmal B. Chaudhri* Hortonworks *about.me/akmalchaudhri My background ~25 years experience in IT Developer (Reuters) Academic (City
A Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
Hadoop in the Enterprise
Hadoop in the Enterprise Modern Architecture with Hadoop 2 Jeff Markham Technical Director, APAC Hortonworks Hadoop Wave ONE: Web-scale Batch Apps relative % customers 2006 to 2012 Web-Scale Batch Applications
A Modern Data Architecture with Apache Hadoop
Modern Data Architecture with Apache Hadoop Talend Big Data Presented by Hortonworks and Talend Executive Summary Apache Hadoop didn t disrupt the datacenter, the data did. Shortly after Corporate IT functions
Upcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC [email protected] Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. [email protected], twicer: @awadallah
Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. [email protected], twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
Comprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
How Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
Datenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
Harnessing big data with Hortonworks Data Platform and Red Hat JBoss Data Virtualization
Harnessing big data with Hortonworks Data Platform and Red Hat JBoss Data Virtualization Kimberly Palko, Product Manager Red Hat JBoss Doug Reid, Director Partner Product Management Hortonworks Cojan van
Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
Big Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
#TalendSandbox for Big Data
Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND
Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
Hadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK [email protected] Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
Hadoop, the Data Lake, and a New World of Analytics
Hadoop, the Data Lake, and a New World of Analytics Hortonworks. We do Hadoop. Spring 2014 Version 1.0 Page 1 Hortonworks Inc. 2014 Traditional Data Architecture Pressured 2.8 ZB in 2012 85% from New Data
SAP and Hortonworks Reference Architecture
SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform
BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
The Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
HADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
Using distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
Modern Data Architecture for Predictive Analytics
Modern Data Architecture for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Hortonworks Inc. 2013 Page 1 Your Presenters
Talend Big Data. Delivering instant value from all your data. Talend 2014 1
Talend Big Data Delivering instant value from all your data Talend 2014 1 I may say that this is the greatest factor: the way in which the expedition is equipped. Roald Amundsen race to the south pole,
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
Analytics on Spark & Shark @Yahoo
Analytics on Spark & Shark @Yahoo PRESENTED BY Tim Tully December 3, 2013 Overview Legacy / Current Hadoop Architecture Reflection / Pain Points Why the movement towards Spark / Shark New Hybrid Environment
Modernizing Your Data Warehouse for Hadoop
Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist [email protected] O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Tap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets
Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,
Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Bellevue, WA Legal disclaimer The information in this
Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya
Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
Unified Batch & Stream Processing Platform
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
TE's Analytics on Hadoop and SAP HANA Using SAP Vora
TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -
Data Security in Hadoop
Data Security in Hadoop Eric Mizell Director, Solution Engineering Page 1 What is Data Security? Data Security for Hadoop allows you to administer a singular policy for authentication of users, authorize
Apache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer [email protected], twitter: @awadallah Hadoop Past
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
Bringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016
Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible
#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
Trafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
Please give me your feedback
Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385
brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
BIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
Oracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
Evolution from Big Data to Smart Data
Evolution from Big Data to Smart Data Information is Exploding 120 HOURS VIDEO UPLOADED TO YOUTUBE 50,000 APPS DOWNLOADED 204 MILLION E-MAILS EVERY MINUTE EVERY DAY Intel Corporation 2015 The Data is Changing
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
Getting Started Practical Input For Your Roadmap
Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson
Apache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team [email protected] @rob1lancaster Organizer of Chicago
Interactive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
The Next Wave of Data Management. Is Big Data The New Normal?
The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management
Cloudera Enterprise Data Hub in Telecom:
Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to outline our
The Enterprise Data Hub and The Modern Information Architecture
The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader
A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect
A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers
Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Whitepaper: Solution Overview - Breakthrough Insight. Published: March 7, 2012. Applies to: Microsoft SQL Server 2012. Summary:
Whitepaper: Solution Overview - Breakthrough Insight Published: March 7, 2012 Applies to: Microsoft SQL Server 2012 Summary: Today s Business Intelligence (BI) platform must adapt to a whole new scope,
Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA
Big Data: What You Should Know Mark Child Research Manager - Software IDC CEMA Agenda Market Dynamics Defining Big Data Technology Trends Information and Intelligence Market Realities Future Applications
Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering
Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering Self Service at scale 6 5 4 3 2 1 ? Relational? MPP? Hadoop? Linkedin data 350M Members 25B 3.5M 4.8B 2M
Apache Kylin Introduction Dec 8, 2014 @ApacheKylin
Apache Kylin Introduction Dec 8, 2014 @ApacheKylin Luke Han Sr. Product Manager [email protected] @lukehq Yang Li Architect & Tech Leader [email protected] Agenda What s Apache Kylin? Tech Highlights Performance
How To Use Big Data For Business
Big Data Maturity - The Photo and The Movie Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson Mike
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
