Hortonworks & SAS Analytics everywhere. Page 1
A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment to 1x1 Targeting to Automated Algorithms to Designer Medicine allow organizations to shift interactions from Reactive Post Transaction A shift in Retail From static branding A shift in Telco From break then fix to Real-time Personalization to repair before break Proactive Pre Decision Page 2
We estimate that within 3 years 50% of the worlds data will reside on Hadoop.
Data is doubling in size every 2 3 years. Traditional or not? APPLICATIONS DATA SYSTEM Business Analy4cs RDBMS EDW MPP REPOSITORIES Custom Applica4ons Packaged Applica4ons 2.8 ZB in 2012 85% from New Data Types 15x Machine Data by 2020 40 ZB by 2020 Source: IDC SOURCES Exis4ng Sources (CRM, ERP, Clickstream, Logs)
OLTP, ERP, CRM Systems Unstructured documents, emails Hadoop stores and processes the data your customers currently do not or cannot. Server logs Sen>ment, Web Data Sensor. Machine Data 1: Cost profile. 2: Data Structure. Clickstream Geoloca>on
Hadoop enables scalable compute & storage with a compelling cost profile. Cloud Storage HADOOP NAS Engineered System MPP Fully-loaded Cost Per Raw TB of Data (Min Max Cost) SAN $0 $20,000 $40,000 $60,000 $80,000 $180,000
Hadoop enables scalable compute & storage for all data structures. Current Reality Apply schema on write Dependent on IT Repeatable Process: SQL Determine list of ques4ons Design solu4ons Collect structured data Ask ques4ons from list Detect addi4onal ques4ons Augment w/ Hadoop Apply schema on read Support range of access patterns to data stored in HDFS: polymorphic access Right Engine, Right Job Batch Interactive Real-time Inmemory HADOOP Iterate over structure Transform and Analyze
The Net Result: A modern data architecture capable of storing, processing, correlating, analysing, matching, aggregating, searching and exposing..all data & insights.
.when integrated with the right tools capable of delivering the right results Page 9
The Modern Data Architecture is a Plus +1. APPLICATIONS Base SAS Enterprise Miner OLTP, ERP, CRM Systems Unstructured documents, emails Server logs DATA SYSTEM RDBMS EDW MPP REPOSITORIES Governance & Integration Data Access Data Management Security Operations Sen>ment, Web Data Sensor. Machine Data SOURCES OLTP, ERP, CRM Documents, Emails Web Logs, Click Streams Social Networks Machine Generated Sensor Data Geo- loca>on Data Clickstream Geoloca>on Page 10
From... With... SAS accesses and extracts data from Hadoop to a SAS server for processing, and writes results back. SAS accesses and processes Hadoop data on SAS Servers while keeping the data and computations massively parallel. and In Hadoop Page 11 SAS processes data directly in the Hadoop cluster.
SAS + from Hadoop Data Management Base SAS Enterprise Miner SAS/ACCESS to Hadoop ANY?! ANY?! ANY?! ANY?! disk Page 12
Access to Hadoop Uses Existing SAS Interfaces Standard Libname syntax PROC HADOOP Datastep and Proc SQL translated to Hive Filename support Execute Pig Scripts and MapReduce Push-down of certain procedures Custom SerDe Page 13
SAS + with Hadoop SAS Rack architecture SAS Rack Enterprise Hadoop Page 14
SAS + in Hadoop in-memory analytics (and BI) Visual Analytics Visual Statistics In-memory Statistics for Hadoop Root Node MPI LASR LASR LASR LASR memory disk SASHDAT SASHDAT SASHDAT SASHDAT Page 15
Turns Big Data Into Real- time Customer Insights Telcos Rogers Media is a subsidiary of Rogers Communications, which owns Canada's largest publishing company. Has more than 70 consumer and business publications. Rogers Media Inc. also owns 54 radio stations, and several television properties including terrestrial television stations and cable television channels. Challenge: Unable to analyze huge amounts of data to optimize and improve real-time customer insights Understand audience: Having the largest volume of data sets, audience segments/profile in Canada while leading the Canadian marketplace in privacy and governance. Find Audience: Being leaders in identifying and targeting audiences across channels, platforms and devices. Engage Audience: Driving engagement across platforms and formats. Measure Audience: Exceeding client expectations with transparent reporting and accurate attribution models. Solution Rogers Media Audience Platform: Integration of all data collected across organizations Query all data in one location: Blend of online and offline data, subscription, ecommerce, loyalty programs, etc. Land massive click stream log files: 100+ M records / day 30 million unique IDs / month Use 100% of the data for Analysis and Visualization instead of smaller random samples (over sampling) Page 16
Resources Customer Video: Rogers Media discusses SAS and Hadoop Demos: SAS Visual Analytics, Ingest SAS to Hive Webinars: SAS and the Modern Data Architecture SAS and Hortonworks use cases www.hortonworks.com/sas Page 17
Thank you. Questions Page 18