Batter Up! Advanced Sports Analytics with R and Storm
|
|
|
- Alexis Hudson
- 10 years ago
- Views:
Transcription
1 Batter Up! Advanced Sports Analytics with R and Storm Meeting the Real-Time Analytics Opportunity Head-On December 11, 2014 Bill Jacobs VP Product Marketing Revolution Vineet Sharma Dir., Partner Marketing MapR Technologies Allen Day Principal Data Scientist MapR
2 Who Am I? Bill Jacobs, VP Product Marketing Revolution
3 Polling Question #1: Who Are You? (choose one) Statistician or modeler Data Scientist Hadoop Expert Application builder Data guru Business user Baseball fan
4 Sports Analytics as Analogy. Sports Teams Are Like Other Corporations. Great Value Achievable With Data Vast Range of Data Sources Timely Analysis Amplifies Value And apologies if you came to learn whom to bet upon in next year s season.
5 Game Changing Big Data Analytics Applications Marketing: Clickstream & Campaign Analyses Digital Media: Recommendation Engines Social Media: Sentiment Analysis Retail: Purchase Prediction Insurance: Fraud Waste and Abuse Healthcare Delivery: Treatment Outcome Prediction Risk Analysis: Insurance Underwriting Manufacturing: Predictive Maintenance Operations: Supply Chain Optimization Econometrics: Market Prediction Marketing: Mix and Price Optimization Life Sciences: Pharmacogenetics Transportation: Asset Utilization
6 Polling Question #2: What Language or Tools Is In Use for Analytics (check all that apply) R SAS or SPSS Python Java BI tools including: MSTR, Qlik, Tableau, Business Objects, Cognos Salford Systems or MATLAB H20, RapidMiner, KNIME or similar Other data mining tools Other programming languages None or Don t know
7 R Open Source - Language, Community, Collaboration - Robert Gentleman & Ross Ihaka, Version 1.0 released Million Global Users - Over 4,800 add-on Packages - Why R? R in Universities = New Talent WELCOME & INTRODUCTIONS Emerging Modeling/Visualization Lower Cost Alternative Open Source = Flexible & Innovative Access to Free Packages
8 R is exploding in popularity & functionality R Usage Growth Rexer Data Miner Survey, % of data miners report using R R is the first choice of more data miners than any other software Source:
9 Innovate with R Most widely used data analysis software Used by 2M+ data scientists, statisticians and analysts Most powerful statistical programming language Flexible, extensible and comprehensive for productivity Create beautiful and unique data visualizations As seen in New York Times, Twitter and Flowing Data Thriving open-source community Leading edge of analytics research Fills the talent gap New graduates prefer R White Paper R is Hot bit.ly/r-is-hot
10 Polling Question #3: How are you using R today? (choose one) Not using R Studying R now Initial R project(s) underway R is widely used for exploration & modeling R is deployed into production
11 Revolution Analytics In A Nutshell Our Vision: R is becoming the defacto standard for enterprise predictive analytics Our Mission: Drive enterprise adoption of R by providing enhanced R products tailored to meet enterprise challenges
12 Revolution Analytics Builds & Delivers: Software Products: Support & Services Stable Distributions Commercial Support Programs Broad Platform Support Training Programs Big Data Analytics in R Professional Services Application Integration Academic Support Programs Deployment Platforms IP Indemnification Agile Development Tooling Future Platform Support
13 Revolution R Advantages for Analytics Professionals: Broadly-used, scalable R language Large (2M+), collaborative, young R analytics community Largest repository of statistical & analytical algorithms Big data analytics capabilities Scales from workstations to Hadoop Transparent parallelism Cross platform compatibility Multi-platform architectures Broadens career opportunities
14 Revolution R Advantages for Business Executives Viable Alternative to Legacy Analytics Solutions Predictable Time To Results Simplified Licensing All-Inclusive Environment Lower Staffing Costs Controllable Open Source Risks Support IP Infringement Protections
15 Revolution R Advantages for IT Organizations Consistency Across Platforms Avoids Sprawl Support for Workstations, Servers, Hadoop, EDWs and Grids Heterogeneous Architecture Capabilities Integrates With Major BI & Application Tools Streamline Model Deployment Run Complex Analytics in the Data Lake Be a Good Citizen in shared systems Commercial Support Reduces Project Risks Quick Start Programs Accelerate Results Platform Continuity Future-Proofs Architectures
16 Revolution R Enterprise: Predictive Analytics Across Huge Data in Hadoop YARN Exploration Visualization Predictive Modeling HDFS
17 Polling Question #4: Stage of Hadoop Adoption? (choose one) No Need Studying Setting-Up Hadoop Experimenting with Hadoop Deploying Hadoop Now Hadoop in Production
18 Introducing: Vineet Sharma Director, Partner Marketing MapR Technologies 2014 MapR Technologies 18
19 MapR + Revolution Leverages MapR As A Scalable Enterprise R Engine. Plus: Run RRE Analytics In MapR Hadoop Without Change Broad Adoption of Hadoop for Big Data Analytics MapR Enterprise Data Platform Capabilities Rapid Adoption of R Eliminate Need To Design Parallel Software or Think In MapReduce Leverage All Revolution R Enterprise Pre-Parallelized Algorithms Enable Users To Build Custom Apps That Leverage Hadoop s Parallelism Slash Data Movement by Analyzing Data Inside the MapR Data Platform Expand Deployment and Integration Options 2014 MapR Technologies 19
20 Desktop Users with Analytical Access to Huge Data in Hadoop Predictive Modeling Algorithms MapR FS Data 2014 MapR Technologies 20
21 MapR: Best Solution for Customer Success Top Ranked Exponential Growth 500+ Customers Premier Investors >2x annual bookings 90% software licenses 80% of accounts expand 3X < 1% lifetime churn > $1B in incremental revenue generated by 1 customer 2014 MapR Technologies 21
22 Management The Power of the Open Source Community APACHE HADOOP AND OSS ECOSYSTEM Batch Tez* ML, Graph SQL NoSQL & Search Streaming Data Integration & Access Security Workflow & Data Governance Provisioning & coordination Spark Drill Cascading GraphX Shark Accumulo* Hue Savannah* Pig MLLib Impala Solr Storm* HttpFS Juju MapReduce v1 & v2 Mahout Hive HBase Spark Streaming Flume Knox* Falcon* Whirr YARN Sqoop Sentry* Oozie ZooKeeper EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS MapR-FS MapR Data Platform MapR-DB * Certification/support planned 2014 MapR Technologies 22
23 Management MapR Distribution for Hadoop APACHE HADOOP AND OSS ECOSYSTEM Batch Tez* ML, Graph SQL NoSQL & Search Streaming Data Integration & Access Security Workflow & Data Governance Provisioning & coordination Spark Drill Cascading GraphX Shark Accumulo* Hue Savannah* Pig MLLib Impala Solr Storm* HttpFS Juju MapReduce v1 & v2 Mahout Hive HBase Spark Streaming Flume Knox* Falcon* Whirr YARN Sqoop Sentry* Oozie ZooKeeper High availability Data protection Disaster recovery EXECUTION ENGINES Standard file access Standard database access Pluggable MapR-FS services Broad developer support 2X to 7X higher performance Consistent, low latency Ability to logically divide a cluster to support different use cases, job MapR Data Platform types, user groups, and administrators DATA GOVERNANCE Enterprise security AND OPERATIONS authorization Wire-level authentication MapR-DB Data governance Ability to support predictive analytics, real-time database operations, and support high arrival rate data Enterprise-grade Interoperability Performance Multi-tenancy Security Operational * Certification/support planned 2014 MapR Technologies 23
24 Management Revolution R Enterprise and MapR Hadoop R on the Desktop APACHE HADOOP & OSS ECOSYSTEM ZooKeeper Oozie Hue Pig Hive Impala Shark Drill Tez Via Browsers Flume HttpFS Cascading Solr Juju Mahout MLLib Knox Sentry Sqoop Whirr HBase YARN Storm Spark Streaming Spark Falcon BI and other Apps MapR Data Platform Edge Node YARN MapReduce Mobile
25 Introducing: Allen Day Principal Data Scientist MapR 2014 MapR Technologies 25
26 Talk Overview Agile Real-time Stats R + Storm github.com/allenday/r-storm Agile Methods Advanced Statistics DEMO How to do it? Continuous Real-time Delivery Q & github.com/allenday/hadoop-summit-r-storm-demo-public 2014 MapR Technologies 26
27 Architecting R into the Storm Application Development Process 2014 MapR Technologies 2014 MapR Technologies 27
28 Quick intro Allen Day, Principal Data Scientist ] 7yr Hadoop dev, 12yr R dev/author PhD, Human Genetics, UCLA Medicine 2014 MapR Technologies 28
29 What s Storm? What s R? What s Storm? Processes a data stream. Akin to UNIX pipe + tee & merge commands Runs on a cluster. Fault-tolerant and designed to scale out Used for: real-time analytics & machine learning What s R? Programming language with advanced statistics libraries Does not scale out. Can scale up Used for: prototyping, data modeling, visualization How to combine these? 2014 MapR Technologies 29
30 R outside, Storm inside: not practical. Why? User R Model-building and QA is done on data snapshots Storm However, R => Hadoop is realistic. Key difference: referenced data can be static Use MapR snapshots for dev and QA See also: RHIPE (Purdue) and RHadoop (RevolutionAnalytics) 2014 MapR Technologies 30
31 Storm outside, R inside: a good fit User Storm R Enables separation of concerns Independently manage modeling, ops timelines, and version control Integrate as needed Enables role specialization R built-ins allow faster iteration and more concise stats-type code Do DevOps with specific SW engineering tech, e.g. Java 2014 MapR Technologies 31
32 Q: Who really likes statistics? A: Baseball fans A: Team Managers = Portfolio Managers MapR MapR Technologies Technologies 32
33 Famous Vintage Data Oakland Athletics 2002 Season 20 consecutive wins the current record 2014 MapR Technologies 33
34 Goal: Detect Moneyball 2002 Winning Streak MapR MapR Technologies Technologies 34
35 Methods: Change Point Detection Find natural breakpoints in a time-series set of data points R packages implement this: changepoint: more sensitve, but not streaming bcp: streaming, but less sensitive 2014 MapR Technologies 35
36 Methods: R+Storm Demo Architecture github.com/allenday/hadoop-summit-r-storm-demo-public Oakland A s Data (accelerated) Storm Bolt R online change point detector Storm Bolt (write to Jetty) GIFs to MapR Filesystem Jetty Webserver Browser (D3.js) Us 2014 MapR Technologies 36
37 Demo 50-game sliding window/buffer to detect change points Cumulative history with detected break points Raw data (score difference between A s and opponent) MapR MapR Technologies Technologies 37
38 Methods Details: How it s done Uses R-Storm binding github.com/allenday/r-storm Storm package on CRAN cran.r-project.org/web/packages/storm Storm (dev team) R (stats team) Storm (dev team, pure Java) Producer Consumer 2014 MapR Technologies 38
39 Methods Details: Easy integration R: lambda function storm = Storm$new(); storm$lambda = function(s) { t = s$tuple; t$output = vector(length=1); t$output[1] = tada! s$emit(t) } Storm: extend ShellBolt public static class MyRBolt extends ShellBolt implements IRichBolt { } public RBolt() { } super("rscript", my.r"); 2014 MapR Technologies 39
40 Results Change points are identified, but none for winning streak Not using score difference, anyway Discussion Time to integrate with the modeling team! a pull request on GitHub Applicable to many other use cases, e.g. Security (fraud detection, intrusion detection) Marketing (intent to purchase / social media streams) Customer Support (help desk voice calls) 2014 MapR Technologies 40
41 Polling Question #5 How important will Real-Time analytical apps become? (choose one) Uncertain Not important Necessary Critical
42 Real-Time and Internet of Things: Foundation of a Compelling Trend for 2015 Big Data Analytics Meets The Internet of Things Transactions + Human Behavior + Internet of Things: Sensors and extracting value using Traditional Statistics Visualization Machine Learning plus adaptability Real-Time Agile Modeling & Fast Model Execution Production Capable, Stable and Secure Rapidly-Evolving Data Science
43 Where Does Real Time Impact The Analytical Lifecycle? Data Engineering Collection and Ingest Blending Modeling Aggregation, Segmentation & Exploration Model Development & Optimization Testing & Validation Operationalization Deployment & Scoring Delivery Monitoring & Evaluation
44 Typical Analytical Lifecycle Model Ingest Explore Model Deploy Score Score Act Measure
45 More Complex Event Driven Analytical Cycle Data Analytics & Process Design Historic Ingest Blend Explore Model Deploy Scoring Append Improve Event Ingest Trans- Form Act Measure Enrich
46 Real-Time Analytics Best Practices Develop a Common Lexicon for Real-Time Discriminate Between Needs of Each Stage in Lifecycle Data Ingest & Manipulation and Enrichment Data Source / Repository Integration Needs Processes that Fill the Lake Process that Act on the Stream Vastly different computationally Big differences in data ingest volume & latency Start with Tractable Goals Anticipate Growing Requirements: Microbatched >> Interactive >> Autonomy Build for today, Architect for tomorrow
47 Real-Time Realities Plan for Diverse Needs Real-Time Score Retrieval, Scoring, Modeling Wide-Ranging Performance Microbatch Interactive - Autonomous Fragmentation Data Delivery Systems Pre-Exist Will Vary Widely by Vertical Market Competing Proprietary Solutions Growing Demand Numerous high-value targets The next step : Put big data analytics to work
48 What s Needed Real-Time Performance plus Agility Deployment models Organization Infrastructure Analytics Manageable Costs Hadoop Open Source R Production Platform(s) Proven Performant
49 Next Steps Resources: R foundation URL: Download Revolution R: Learn about Apache Storm: R-Storm bindings: github.com/allenday/r-storm Storm package on CRAN: cran.r-project.org/web/packages/storm Whitepaper: Revolution R for Hadoop: vering-value-big-data-revolution-r-enterprise-andhadoop or
50 Thank you GET.REVO
Why Spark on Hadoop Matters
Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR Technologies Apache Spark Summit - July 1, 2014 1 MapR Overview Top Ranked Exponential Growth 500+ Customers Cloud Leaders 3X bookings Q1 13
Comprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
Self-service BI for big data applications using Apache Drill
Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Management - MCS MapR Data Platform for Hadoop and NoSQL APACHE HADOOP AND OSS ECOSYSTEM Batch
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
The Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
Self-service BI for big data applications using Apache Drill
Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Data Is Doubling Every Two Years Unstructured data will account for more than 80% of the data
Information Builders Mission & Value Proposition
Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
Data Security in Hadoop
Data Security in Hadoop Eric Mizell Director, Solution Engineering Page 1 What is Data Security? Data Security for Hadoop allows you to administer a singular policy for authentication of users, authorize
WWW.WIPRO.COM HADOOP VENDOR DISTRIBUTIONS THE WHY, THE WHO AND THE HOW? Guruprasad K.N. Enterprise Architect Wipro BOTWORKS
WWW.WIPRO.COM HADOOP VENDOR DISTRIBUTIONS THE WHY, THE WHO AND THE HOW? Guruprasad K.N. Enterprise Architect Wipro BOTWORKS Table of contents 01 Abstract 01 02 03 04 The Why - Need for The Who - Prominent
Upcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC [email protected] Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
Modern Data Architecture for Predictive Analytics
Modern Data Architecture for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Hortonworks Inc. 2013 Page 1 Your Presenters
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
A Modern Data Architecture with Apache Hadoop
Modern Data Architecture with Apache Hadoop Talend Big Data Presented by Hortonworks and Talend Executive Summary Apache Hadoop didn t disrupt the datacenter, the data did. Shortly after Corporate IT functions
Integrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
Big Data Realities Hadoop in the Enterprise Architecture
Big Data Realities Hadoop in the Enterprise Architecture Paul Phillips Director, EMEA, Hortonworks [email protected] +44 (0)777 444 3857 Hortonworks Inc. 2012 Page 1 Agenda The Growth of Enterprise
How to Hadoop Without the Worry: Protecting Big Data at Scale
How to Hadoop Without the Worry: Protecting Big Data at Scale SESSION ID: CDS-W06 Davi Ottenheimer Senior Director of Trust EMC Corporation @daviottenheimer Big Data Trust. Redefined Transparency Relevance
Consulting and Systems Integration (1) Networks & Cloud Integration Engineer
Ericsson is a world-leading provider of telecommunications equipment & services to mobile & fixed network operators. Over 1,000 networks in more than 180 countries use Ericsson equipment, & more than 40
Moving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
Cloudera Enterprise Data Hub in Telecom:
Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer
Cisco IT Hadoop Journey
Cisco IT Hadoop Journey Alex Garbarini, IT Engineer, Cisco 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases
#TalendSandbox for Big Data
Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND
HADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
Roadmap Talend : découvrez les futures fonctionnalités de Talend
Roadmap Talend : découvrez les futures fonctionnalités de Talend Cédric Carbone Talend Connect 9 octobre 2014 Talend 2014 1 Connecting the Data-Driven Enterprise Talend 2014 2 Agenda Agenda Why a Unified
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Delivering Value from Big Data with Revolution R Enterprise and Hadoop
Executive White Paper Delivering Value from Big Data with Revolution R Enterprise and Hadoop Bill Jacobs, Director of Product Marketing Thomas W. Dinsmore, Director of Product Management October 2013 Abstract
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
Dominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
How Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013
Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the
The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader
The Digital Enterprise Demands a Modern Integration Approach Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader Yesterday s approach to data and application integration is a barrier
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
Apache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer [email protected], twitter: @awadallah Hadoop Past
The Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,
Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Bellevue, WA Legal disclaimer The information in this
SQL on NoSQL (and all of the data) With Apache Drill
SQL on NoSQL (and all of the data) With Apache Drill Richard Shaw Solutions Architect @aggress Who What Where NoSQL DB Very Nice People Open Source Distributed Storage & Compute Platform (up to 1000s of
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
Cisco IT Hadoop Journey
Cisco IT Hadoop Journey Srini Desikan, Program Manager IT 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases
Data Governance in the Hadoop Data Lake. Michael Lang May 2015
Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
Big Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Dell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/
Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management
Big Data and New Paradigms in Information Management Vladimir Videnovic Institute for Information Management 2 "I am certainly not an advocate for frequent and untried changes laws and institutions must
Transforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
Building Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
R and Hadoop: Architectural Options. Bill Jacobs VP Product Marketing & Field CTO, Revolution Analytics @bill_jacobs
R and Hadoop: Architectural Options Bill Jacobs VP Product Marketing & Field CTO, Revolution Analytics @bill_jacobs Polling Question #1: Who Are You? (choose one) Statistician or modeler who uses R Other
SAP and Hortonworks Reference Architecture
SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical
Are You Big Data Ready?
ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain
BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
INDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases
INDUS / AXIOMINE Adopting Hadoop In the Enterprise Typical Enterprise Use Cases. Contents Executive Overview... 2 Introduction... 2 Traditional Data Processing Pipeline... 3 ETL is prevalent Large Scale
Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson
The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson 1 A New Platform for Pervasive Analytics Multiple big data opportunities
Hadoop in the Enterprise
Hadoop in the Enterprise Modern Architecture with Hadoop 2 Jeff Markham Technical Director, APAC Hortonworks Hadoop Wave ONE: Web-scale Batch Apps relative % customers 2006 to 2012 Web-Scale Batch Applications
Big Data and Data Science. The globally recognised training program
Big Data and Data Science The globally recognised training program Certificate in Big Data Analytics Duration 5 days Big Data and Data Science enables value creation from data, through the use of calculative
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS
. 3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS Deliver fast actionable business insights for data scientists, rapid application creation for developers and enterprise-grade
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
Case Study : 3 different hadoop cluster deployments
Case Study : 3 different hadoop cluster deployments Lee moon soo [email protected] HDFS as a Storage Last 4 years, our HDFS clusters, stored Customer 1500 TB+ data safely served 375,000 TB+ data to customer
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at [email protected].
Find the Hidden Signal in Market Data Noise
Find the Hidden Signal in Market Data Noise Revolution Analytics Webinar, 13 March 2013 Andrie de Vries Business Services Director (Europe) @RevoAndrie [email protected] Agenda Find the Hidden
Interactive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
The Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @
The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...
and Hadoop Technology
SAS and Hadoop Technology Overview SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview. Cary, NC: SAS Institute
Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis
Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse
SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale
Extend your analytic capabilities with SAP Predictive Analysis
September 9 11, 2013 Anaheim, California Extend your analytic capabilities with SAP Predictive Analysis Charles Gadalla Learning Points Advanced analytics strategy at SAP Simplifying predictive analytics
WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley
WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or
Talend Big Data. Delivering instant value from all your data. Talend 2014 1
Talend Big Data Delivering instant value from all your data Talend 2014 1 I may say that this is the greatest factor: the way in which the expedition is equipped. Roald Amundsen race to the south pole,
High Performance Predictive Analytics in R and Hadoop:
High Performance Predictive Analytics in R and Hadoop: Achieving Big Data Big Analytics Presented by: Mario E. Inchiosa, Ph.D. US Chief Scientist August 27, 2013 1 Polling Questions 1 & 2 2 Agenda Revolution
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.
Big Data Analytics 1 Priority Discussion Topics What are the most compelling business drivers behind big data analytics? Do you have or expect to have data scientists on your staff, and what will be their
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
Cisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
A Case Study of Hadoop in Healthcare
Leading a Healthcare Company to the Big Data Promised Land: A Case Study of Hadoop in Healthcare Mohammad Quraishi (IT Senior Principal - Cigna) [email protected] About me BS in Computer Science and Engineering
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP Business Analytics for All Amsterdam - 2015 Value of Big Data is Being Recognized Executives beginning to see the path from data insights to revenue
Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
