Evolution from Big Data to Smart Data
Information is Exploding 120 HOURS VIDEO UPLOADED TO YOUTUBE 50,000 APPS DOWNLOADED 204 MILLION E-MAILS EVERY MINUTE EVERY DAY Intel Corporation 2015
The Data is Changing Performance Optimized Capacity Optimized Data Type Structured Unstructured Record Size Kilobytes or less Megabytes to Terabytes Data Updates Frequent Rare/never Access Frequency Heavy Light Metadata Fixed Variable Scale Required Up to Terabytes Exabytes Unstructured Data accounts for 70-80% of storage capacity growth Ashish Nadkarni, IDC IDC, 2015 Copyright 2014 IDC 3
The Industry Responds 1. Scale Out 2. Software Defined 3. Smart Data 4
Scale-Out Economics Start Small Scale Large Start from a single node (TBs) but have the ability to scale to multiple independent nodes (PBs) RAIN Architecture Granular Resource Scaling Add CPUs and storage independently as needed Take advantage of decreasing storage costs and increased storage densities
Software Defined Storage Modern Apps Analytics Deep Archive Object HDFS HyperStore Smart Storage Platform New York. London File Tokyo 100% S3 Always On Smart Protect Multi Datacenter Smart Policies Enterprise Grade
The Era of Smart Data Storage DATA STORAGE = problem SMART DATA STORAGE = solution Passive Delayed Analytics Static Data Active Timely Insight Meaning Actionable Business Value HYPERSTORE ANALYTICS DATA INFORMATION OBJECT STORE 7
Smart Data Analytics in Place Consumer Activity (Events, GPS, WiFi) Device Tracking and Logs Social Media Result of Analysis INTERNET OF THINGS Benefits Faster time-to-decision Event processing platform B IG DATA Fast Efficient Better business decisions Analyze more allows for efficient bulk data analysis in place No redundant storage of data Analytics HyperStore scales out with your data adding nodes for I/O Take advantage of multi-core CPUs makes sense for MapReduce Can feed smarter data to subsequent analytic systems Cloudian HyperStore COST EFFICIENT 8
Cloudian & Hortonworks Batch Map Reduce Script Pig SQL Hive/Tez, HCatalog NoSQL HBase Accumulo Stream Storm Search Solr YARN : Data Operating System 1 HDFS S3 Native File System (URI scheme: s3n) Linux Windows On- Premise Others In-Memory Analytics, ISV engines N Cloud HDFS Shell Commands File I/O Operations Mass Upload ETL with Pig Standard Map Reduce Analysis with Hive 9
Analytics and Hadoop Availability Peer to peer storage system Locality Data Center Locality Can enforce constraints on the location of Hadoop data and maintain locality of reference for Hadoop Hadoop can be run on storage nodes Efficiency Erasure Coding for efficient bulk data storage Scale Cluster on demand as needed dynamic rebalance Multi-part uploading to improve large object uploads Rich metadata Example Pig can load filtered data directly from Cloudian HyperStore without passing for HDFS A = LOAD 's3n://bucket' USING CloudianStorage(); B = FILTER A BY (time >= '2015/02/16') AND (time <= '2015/02/20'); 10
Use Cases Hadoop for Internet of Things Clickstream data Sentiment data Server log data Sensor data Analysis of what people click on Individual web pages and in what order. Clickstream analysis can reveal how users research products and also how they complete their online purchases. Unstructured data on opinions, emotions, and attitudes from sources like social media posts, blogs, online product reviews and customer support interactions. Organizations use sentiment analysis to understand how the public feels about something and track how those opinions change over time. Large enterprises build, manage and protect their own proprietary, distributed information networks. Server logs are the computergenerated records that report data on the operations of those networks. When there is a problem, its one of the first places the IT team looks for a diagnosis. From refrigerators and coffee makers to energy-measuring smart meters, sensor data is everywhere. It is created by the machinery that runs assembly lines and the cell towers that route our phone calls. It is net new data that is increasing exponential in the information age. Internet Marketing Online Commerce Retail Media & Entertainment IT Organizations Customer Support Manufacturing Industrial 11
Smart Support CUSTOMER CLOUDIAN HyperStore Appliances Telemetrics Data Smart Support HyperStore Appliances S3n://bucket/ Hadoop Cluster Smart Support Analytics Cloudian Support 12
Cloudian HyperStore Platform 13
Multi tenancy & QoS Tenant A Tenant B Tenant C Storage Bytes Storage Policies Storage Objects Data Placement Data Access Requests per Min Access Control Inbound Bytes/Min Tenant A Tenant B Tenant C Outbound Bytes/Min Tiering HyperStore Software Defined Storage 14
Your Choice of Deployment Pre-Configured Software-Defined Storage Arrays Stand-alone Software HSA Series FL3000 Series 1U and 2U models Scales from 24TB to multiple PBs Dedicated, all-inone, on-premises storage Density optimized appliance with PBscale architecture Seamless scalability on demand 8 storage nodes in 8U Hot plug everything OR Efficient data protection with compression, replication and erasure coding On-premises S3 with full support for all S3 ecosystem apps Dynamic data tiering Hadoop-ready Geo-replication Multi-tenant QoS controls Self-healing and autorebalancing
SMART STORAGE OPERATIONS SMART STORAGE PLATFORM SMART STORAGE ANALYTICS Smart Protect Proactive Repair Smart Tiering Smart Scale Software Defined Forever Storage Platform Real Time Analytics Search & Discovery Smart Support CLOUDIAN HYPERSTORE SMART DATA STORAGE
1c per GB per Month Visit Us: Booth 415 WEBSCALE SIMPLICITY & ECONOMICS HYBRID CLOUD OPEN ARCHITECTURE ENTERPRISE READY