The Vertica Database simply fast! Mastering Big Data with HP Software Lior Tzabari - Regional Sales Manager Moshe Goldberg - Vertica System Engineer Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Welcome to the World of Big Data There is strategic value in big data; with real-time analytics, organizations are able to maximize business value and efficiencies Compliance Sarbanes-Oxley, HIPAA, Basel II Geophysical Exploration Healthcare Electronic Patient Record Gene Sequencing Medical Imaging Enterprise ERP CRM Products, Customers, Suppliers, Partners Technology Sensors, LOBs, XML Mobility Social Media Financial Services High-frequency Trading Algorithmic Trading Communications Call Detail Records
HP Customers Big Data Concerns HP survey responses - senior business and technology executives 50% 98% 34% 35% Do not have an effective information strategy in place Can not deliver the right information, at right time to support enterprise outcomes all of the time Say that half of their information is unconnected, undiscovered and unused Are not effective at accessing enterprise information as and when needed for compliance or operational needs * Source: Coleman Parkes 3
How Big is Big Data? Storage capacity growing 23% per annum Figure 1: The Digital Universe 2009-2020 Computing capacity growing 54% per annum 60% of the world s population used mobile phones in 2010 30 billion pieces of content shared every month on Facebook 30 million network sensor nodes in 2010 annual growth rate > 30% a year 40% projected growth in global data generated per year vs. 5% growth in global IT spending Source: Big Data The Next Frontier for Innovation, Competition and Productivity McKinsey Global Institute 2009 0.8 ZB* 2020 35 ZB * Growing by a Factor of 44 *Zettabyte = 1 trillion gigabytes
What is Big Data? Extreme Information: volume, velocity, variety and complexity Social Media Video Audio Email Texts Mobile Transactional Data Targeted Engagement BIG DATA Pattern- Based Analytics BIG DATA: datasets whose velocity and/or volume is beyond the capability of typical database tools to collect, store, manage and analyze IT/OT Docs Search Engine Contextual Relevance Images
Why Should You Care About Big Data? It Can Be Monetized! Business Value Examples: $300B in annual U.S. Healthcare value Retailers can increase operating margin by 60% using Big Data Governments could save more than $149B (Europe alone) annual through improved operational efficiency New companies formed on Big Data: IT Value: Big Data and analytics projects offer higher ROI than any other IT projects Opportunity for IT, analysts, and business users to come together (Moneyball!) Leverage previous skills and investments in IT projects that collect and store information
The Big Data Paradox: Data volumes growing faster than people, skills, disk, plant and power Outdated Technology: Traditional DBMS were never designed for today s volume, velocity, complexity Ad hoc questions come from all users, even customers directly Detailed data is where the interesting things happen Shortage of People: U.S. alone faces shortage of 150,000+ people with deep analytic skills U.S. missing 1.5M managers and analysts to analyze data and make decisions
Vertica Analytics Platform Real Time Big Data Cloud Mobile Monetize Better Decisions Analysis Real Time Statistics Services Individual = SOFTWARE based Real- Time Analytics Platform SQL & NoSQL analytics capabilities Industry Leading LOAD & QUERY Performance SIMPLE installation & use with AUTOMATIC setup and tuning Highly SCALABLE, ELASTIC and full parallelism MPP MONETIZE 100% of your data Sensor
750 customers + Financial Services Retail Communications Consumer Marketing Healthcare Online Web & Gaming
A Platform Designed for Big Data Next Generation Administration and Design Tools Columnar Compression Concurrent Load & Query Elastic Cluster SQL Analytics User- Defined Analytics Optimized Connectors Standard Interface True Column Store - RDBMS Native and Performance Optimized High Availability Real Time Massively Parallel Processing
Graphing with Vertica It s not just Social! Visualize the Power of relationships Scale, performance, and elasticity are core attributes Relationships can be people, products, markets, compounds, etc.
Big Data Analytics Not Only SQL & Structured Structured Unstructured Semi-structured Monetize 100% of your data All data sources Internal / External More data points = greater insight Common Platform Uncommon Results Real-time analytics with both SQL & NoSQL Dynamically add / change sources Scale, elasticity, and simplicity all with predictable performance
Understand the Past, Predict the Future How HP/Vertica Predicted the Oscars from Twitter Sentiment Loaded raw tweets from Twitter into Vertica prior to Oscars Performed text parsing and sentiment analysis in Vertica Scored each film category based on positive/negative mentions Accurately predicted winners in nearly every category! How much is knowing the future worth?
Vertica Analytics Platform - Monetizing Big Data Monetize Real Time Statistics Analysis Better Decisions Make smarter decisions in real time.
Telecommunications 7 of the top 10 global telecommunications firms run their business on Vertica Revenue & Service Assurance and Fraud Detection Sensor & Device management and performance monitoring Subscriber insights and targeted marketing and advertising Vertica opened doors to analyses that otherwise were too time-intensive or impossible. A larger team of business managers now have faster, easier access to more information. That knowledge is invaluable in an aggressively competitive market like ours. - Brian Harvell, Executive Director, Comcast Network Operations
Internet Gaming/Web 2.0 Predictive & targeted engagement for every individual Pattern recognition, sentiment, and social media Capture, analyze, and store PB s of data no pruning Real-time analysis for actionable insights NOW! being able to run social graph analysis on tables with tens of billions of rows with a fast turn around is amazing - Dan McCaffrey, Director of Analytics, Zynga
Financial Services Revolutionize catastrophe and risk management Real-time measurement and management to maximize asset performance Integrated offerings for financial services Institutional, Retail, Liquidity, Risk, etc. Comprehensive structured and unstructured data capabilities with 100 s of clients and 1000 s of analyses understanding our portfolio used to take 3 months with Vertica it doesn t even take an hour. We ve not only saved millions, but made even more - RMS Client
Healthcare Re-think health care in its entirety payer, provider, and PMP $300BN annual value creation opportunity two thirds in the form of reductions to national health care expenditure Emergence of new business models powered by Big Data (e.g. Blue Health Intelligence) Four distinct health care data silos Pharmaceutical R&D Clinical Activity (claims) and cost Patient behavior and sentiment Patient safety, protocol effectiveness, fraud detection and cost reduction all Big Data opportunities we went from waiting days to waiting seconds the impact on every aspect of our business has been transformational - Doug Porter, CIO, Blue Cross Blue Shield Association
Built from the Ground Up: The Four C s of Vertica Columnar storage and execution Clustering Capacity Optimization Continuous performance Achieve best data query performance with unique Vertica column store Linear scaling by adding more resources on the fly Store more data, provide more views, use less hardware Query and load 24x7 with zero administration
Ecosystem Integration Hadoop / M.R. + = Vertica Approach Support and leverage the Hadoop ecosystem rather than reinventing the MR wheel Technology Hadoop connector Squeal optimizing compiler for Pig programs Use cases Hadoop for exploratory analysis Existing MR, Pig scripts Vertica for stylized, interactive analysis With shared features, often faster than Hadoop with a fraction of HW
Automated / Unified Platform Management HADOOP Visualize Analytic resources Health / Status Cloud Cloud Provision Dynamically deploy Distribute resources Virtualized On Premise Manage Unlimited cluster sizes Geographically distributed enterprises
SQL Analytics + - Built for Big Data Features Time series gap filing and interpolation Event window functions and sessionization Social Graphing Pattern matching Event series join Statistical functions Geospatial functions Benefits High performance (Keep Data close to CPU) Low cost (Industry Standard building blocks) Ease of use (Automated + Available) Use Cases Tickstore data cleanups CDR/VOD data analysis Clickstream sessionization Data aggregation and compression Monte Carlo simulation Graph algorithms Sensor Data Process Control Time Series SmartGrid
Geospatial Analytics Store and query using SQL: Locations as Points of Interest Networks, e.g. roads, utilities, etc. as Line Segments Regions, e.g. sales territories, high risk zones, etc. Use cases Mobile check-in and gaming services (e.g. Foursquare, SCVNGR) Asset management, insurance Public sector and intelligence
Statistical Modeling Extensions Use Cases Loan default prediction Customer labeling on purchasing behavior Technology Classification logistic regression and decision trees Native Vertica implementation is MPP and high performance
Vertica Analytics Platform SDK A framework for Open Source and 3 rd Party plug-in Analytics Simple: concise APIs and examples accelerate deployment Flexible: operate on Structured and Unstructured data sets Efficient: In-process, fully parallel Fully leverage CPUs, Disks, Memory investments > 2,000 developers globally
SDK EXAMPLES
OLAP Rollup and Cube Calculations Present Data in Business-Friendly OLAP Form Transform data in-database for maximum efficiency and scale Present it in the form readily consumable by Business users and their favorite Business Intelligence tools Fast and Efficient Eliminate latency and storage of multiple copies MPP: tackles data sets at scale impractical or impossible on a workstation Visualize insights within a timeframe that empowers decisions Servers belong in a data room use a mobile device and retire those noisy workstations
AES Encryption Secure sensitive data, even from DBAs Secure Applies standard AES libraries Protect without impacting manageability Encrypt entire columns or individual cells Fast and Efficient Executes in parallel, in process, on multiple nodes Little to no net increase in storage requirements
In-Database Location GeoCoding Understand the position of any Address or Place Name Flatten arbitrary address formats to simple Latitude and Longitude Segment by boundary or proximity in Vertica s built-in Geospatial library Simple Lookups, or Complex Analytics In-Database Identify valuable regional or social activity trends Segment, Tag, or Group by location, e.g. postal code or near place name
Web Server Log & Click Stream Analysis Scalable library functions for IIS and WC3 log formats Extracts all fields from each web server log format Executes in parallel on multiple nodes, cores Bolsters Vertica s optimized in-database sessionization, pattern matching, and event series join capabilities Implemented as extensions to familiar SQL analytic syntax High-performance in-database page rank and user activity segmentation
Sentiment Analysis Package Mine customer interactions and online comments Scoring (negative/neutral/positive) on any text string Score customer service case notes and transcripts Score tweets and blogs mentioning your brand or products (or your competitor s) Manage a complete Business Communication Strategy Stay informed of customer sentiment from all internal and external sources
XML Parsing & Transformation XML within Vertica Store and Transform XML documents in-database Generate XML documents from queries Query external Web Services directly from Vertica MPP scale : parse more documents at lower latency Avoid complexity In-Database processes are more maintainable Inherently High Availability: no investment in redundant external transformation software or gateway servers
Google Analytics & Twitter Access Libraries Acquire data on demand from within Vertica No external infrastructure to maintain Low latency access to critical information Twitter Access API Highly maintainable: store keywords in the database for visibility and easy maintenance Google Analytics connection, query, and record extraction Detailed data on demand for real-time analysis and value
Document Relevance Comparison Cluster and Tag documents for search and comparison Quickly isolate the collection of documents surrounding a topic of interest Compute relevance vectors with scalable performance Scores the relevance of a word or sentence vs. another Runs in parallel on multiple nodes, multiple cores Includes Tag Cloud Example Generates HTML with most relevant words surrounding a topic, sized by score
Natural Language Processing Functions Common Generalized Functions for Machine Processing of Natural Language Optimized for performance and scale Used in many common search algorithms Suitable for low latency, high volume text streams in a variety of languages Used across multiple industries: Online Gaming, Telco, Security, Insurance (to name a few)
Send SMS Messages from Vertica Invoke SMS Messages from ordinary SQL Run direct marketing as the result of a SQL query Notify end users of important information in real time Automate administrative alerts Notify users of batch completion Notify administrators of maintenance conditions
Shell Command Framework Secure Accessible only where privileges are specifically granted Leverages Vertica s Role Based security model Powerful and Flexible Invoke shell commands as SQL functions Results captured and transformed for use in query Easily automate administrative tasks Easily execute on all nodes or a subset
Thank You!