HP Vertica Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop Helmut Schmitt Sales Manager DACH
Big Data is a Massive Disruptor 2 A 100 fold multiplication in the amount of data is a 10,000 fold multiplication in the number of patterns we can see in that data. Philip Evans: Boston Consulting Group Fellow, Ted Talk
Industry-leading breath & depth of capabilities Haven Big Data Platform Contextual Search Core Big Data Business Capabilities Access Explore Enrich Analyze Predict Serve Act And more.. Data Exploration Image/Video Analytics Accelerated Analytics Geospatial Analytics Sentiment Analysis SQL on Hadoop Predicative Analytics On-premise In the Cloud
DATA is an organization s most strategic asset Monetize Differentiate Personalize Monitor Meter Optimize Predict and more
and its greatest risk Monetize Differentiate Personalize Monitor Meter Optimize Predict and more Regulate Comply Control Secure Address Ensure
The Big Data Balance Sheet Monetize Differentiate Personalize Monitor Meter Optimize Predict and more Assets Liabilities Regulate Comply Control Secure Address Ensure
We will be the trusted partner for every organization * Serve Store Explore * * Protect Govern * *
The Big Data flow Store & Explore Govern & Protect Serve Unstructured enterprise data repositories Structured enterprise data repositories Cloud-based repositories Mobile & social media Offsite or removable data repositories Data Address business & operational objectives Enterprise Content Management Enterprise Search & Collaboration Information Archiving ediscovery Legacy Data Cleanup Address legal & compliance objectives Legal Holds Records Management Address information management objectives Backup & Recovery Disaster Recovery Business Resiliency Long-Term Retention Business resiliency Operations Analytics Predictive Maintenance Smart Metering Patient analytics Fraud prevention Records Management Advertising analytics Legal & Compliance 8
Vehicle Recognition Used in association with ANPR Match Make and/or Model Easy to train Real-time matching Alert or Search for Vehicle without registration Validate database using ANPR result to identify illegal plated vehicles
Core Capabilities Built for Speed We boost performance What 1000% means: Use to take Now takes 1 hour 3.6 Seconds 8 hours (overnight) Under 30 seconds "When we did the first queries, they were done so fast, we thought they were broken. - Michael Relich, Guess?
Secrets to Achieving Performance Increases Columnar Storage Compression MPP Scale- Out Distributed Query Projections Speeds Query Time by Reading Only Necessary Data Lowers costly I/O to boost overall performance Provides high scalability on clusters with no name node or other single point of failure Any node can initiate the queries and use other nodes for work. No single point of failure Combine high availability with special optimizations for query performance CPU CPU A B D C E A CPU Memory Memory Memory Disk Disk Disk
Query Optimization Comparison Traditional Materialized Views Are secondary storage Are rigid: Practically limited to columns and query needs, more columns = more I/O Are mostly batch updated Provide high data latency Vertica Projections Are primary storage no base tables are required Can be segmented, partitioned, sorted, compressed and encoded to suit your needs Have a simple physical design Are efficient to load & maintain Are versatile they can support any data model Allow you to work with the detailed data Provide near-real time low data latency Combine high availability with special optimizations for query performance Traditional Indexes Are secondary storage pointing to base table data Support one clustered index at most tough to scale out Require complex design choices Are expensive to update Provide high data latency 13
Analytical Features of Vertica Vertica SQL Standard SQL-99 Conventions 14 Vertica Extended-SQL Advanced Analytics with SQL Vertica Innovations Advanced Analytics using Custom Logic Aggregate Analytical Sessionization Regression Testing Statistical Modeling Analytics C++ Java R Window Functions Time Series Time slice Interpolation (Constant & Linear) Gap Filling Aggregate Event-based Windows Conditional Change Event Conditional True Event Graph Event Series Joins Page Rank Monte Carlo Geospatial Statistical Social Media/Pulse Text Mining Patterns/Trends Pattern Matching Match, Define, Pattern Keywords Funnel Analysis Classification Algorithms Text-mining Geospatial (Place) Vertica User Defined Extensions Connection ODBC/JDBC HIVE Hadoop Flex Zone
HP Vertica Distributed R R-based Analytics Challenge: Customers want to use R for analytics. However, R scalability is always a question CPU Memory CPU Memory CPU Memory SOLUTION: HP Distributed R Benefit: Analyze data sets too large for standard R Perform complex analyses much more quickly (20x faster than Hadoop) Use familiar R environment to explore data, develop, and execute algorithms Operate on full data set (no down sampling) Algorithm Linear Regression (GLM) Logistic Regression (GLM) Random Forest K-Means Clustering Page Rank Disk Disk R R R Use cases Risk Analysis, Trend Analysis, etc. Customer Response modeling, Healthcare analytics (Disease analysis) Customer churn, Market campaign analysis Customer segmentation, Fraud detection, Anomaly detection Identify influencers Disk 15
Introducing HP Vertica for SQL on Hadoop HP Vertica for SQL on Hadoop offers the only full-featured query engine on Hadoop - Same Core Engine - Hadoop Distribution Agnostic - Enterprise-ready Solution - World-class Enterprise Support and Services - Open platform - Ready for Haven Competitive price point Vertica ANSI SQL Data Exploration Hadoop Storage 16
One Query Engine to Serve it all Store Data in HP Vertica or any Hadoop Distribution Query data in place in Hadoop Formats Co-Locate and leverage existing Hadoop infrastructure HP Vertica performance on lower-cost infrastructure Single query engine across diverse formats and infrastructure Query Engine Format HP Vertica ANSI SQL Vertica Optimized (ROS, Flex Tables) Hadoop (ORC, Parquet, et al) File System Vertica (EXT4) Hadoop (HDP, CDH, MapR NFS) 17
Which Version Is Right for You? HP Vertica for SQL on Hadoop Discover Data Control Costs Leverage Hadoop Infrastructure No Frills, No Brainer HP Vertica EE For SQL Hadoop on environments Hadoop only Full MPP SQL engine Includes JOINs, time series analysis and Key Value Management tools including workload management, database designer and back-up and restore Hadoop Agnostic Compatibility Flex Zone Compression and Columnar Store Java UDx Accelerated Analytics, Live Aggregate projections, Geospatial and Sentiment Analysis C++ UDx / UDL Highly Optimized HP Vertica EXT4 file system HP Vertica Enterprise Edition Boost Performance Faster Analytics Deeper Analytics Customize Analytics Infrastructure All the bells and whistles 18
High End Scalability Think Big Start Small Vertica Community edition: Up to 3 nodes Up to 1 Terabyte Free for productive use Scale up to Enterprise edition Add nodes on the fly Scale up to PB Embed Hadoop 19
Leaders don t make compromises Promotional Testing Behavior Analytics Claims Analyses Patient Analyses Clinical data Analyses Fraud Monitoring Click Stream Analyses Network Analyses Customer Analytics Compliance Testing Financial Tracking Trading Analytics Loyalty Analysis Marketing Analytics 20
HP Vertica s Top Use Cases & Verticals Click to view Use Case Communication s, Media & Ent. Consumer Web Health & Life Sciences Retail Financial Services Energy Public Sector Clickstream Analytics Customer Analytics Hadoop Accelration EDW Modernization Fraud Detection Transaction Analytics Compliance Security Operations Analytics Sensor Data Analytics 21