Interactive BI for Large Data Volumes Silicon India BI Conference, 2011, Mumbai Vivek Bhatnagar, Ingres Corporation
Agenda Need for Fast Data Analysis & The Data Explosion Challenge Approaches Used Till Date New Breakthrough - Harnessing The Power of The Modern Micro-processors Quick Intro - Ingres VectorWise Market Offerings Today: Gartner DW MQ 2011
Today s Challenges in BI Make informed decisions faster Analysis in seconds not minutes, minutes not hours Data explosion Collecting more information, less resources >44x growth in next 10 years Existing Tools: Too Slow Too complex Too expensive Analytical databases designed in 80s & 90s do not take advantage of today s modern hardware
Need for Fast Data Analysis Make the right decision TIMELY Boost sales: beat the competition Improve efficiencies: lower costs Maximize profits: enable earlier investments Examples Risk management: reduce risk before exposure results in losses Customer churn: act before the customer switches Fraud detection: minimize losses due to fraud
Data Explosion Challenges the Business Corporate Data Time Data continues to grow at explosive rate By 2020 we will see: 44x increase in digital information 67x increase in the files containing this information 35 trillion gigabytes of digital information in the world (a stack of DVDs reaching ½ way to Mars) Source: IDC, The Digital Universe Are You Ready May 2010
Resources to Manage Data Explosion are Constraints Data growth: 44x Human Resources to manage data growth: 40% Resources to manage data can t grow at same rate as data Time Physical data center Human Resources Budget Energy cost Expertise Impending regulation Businesses require technology breakthroughs to manage the data explosion
Challenges with Current State of BI Business Wants Faster Answers Database Too Slow Application Traditional Database Data Growth Exponential Analytical Bottleneck Complex, Risky & Expensive More Hardware OLAP Cubes Indices Rollups Hardware Star Schemas Tuning DB only use a fraction of CPU Capability Confidential 2010 Ingres Corporation
Approaches for Achieving Data Analytics Performance (1980s 2000s) Optimizations for parallel processing and minimal data retrieval Parallel Processing Column store with compression Proprietary hardware MPP Column Decompress Buffer Manager Disk RAM Data Warehouse Appliances Acceptable performance has been achieved by using more hardware or by intelligently lowering the volume of data to be processed However, none of these approaches leverages the performance features of the new modern processors ie taking the most out of each modern commodity CPU
Modern CPU Instruction Capabilities SIMD Traditional CPU processing: Single Instruction, Single Data (SISD) Modern CPU processing capabilities: Single Instruction, Multiple Data (SIMD) Out-of-order execution Chip multi-threading Large L2/L3 caches Streaming SIMD Extensions for efficient SIMD processing
Traditional Scalar Processing vs Vector Processing Traditional Scalar Processing Vector Processing One operation performed on one element at a time 1 x 1 = 1 2 x 2 = 4 3 x 3 = 9 4 x 4 = 16 5 x 5 = 25 6 x 6 = 36 7 x 7 = 49 8 x 8 = 64 Many V s 1 1 x 1 2 x 2 3 x 3 4 x 4 5 x 5 6 x 6 7 x 7 8 x 8 = 1 4 9 16 25 36 49 64 One operation performed on a set of data at a time No overheads Large overheads n x n = n 2 n x n n 2 Process even 1GB per second
Today s Database Facts Most database software originally developed in 1970s/80s Assumes SISD processing Does not use SIMD and other performance features due to complexity Moore s Law has applied to CPU processing power since 1970s CPU performance doubled at least every 2 years Recently thanks to advanced performance features Has database performance gained 2x every 2 years?
New Emerging Approach: Ingres VectorWise On Chip Vectorised Computing/Columnar Database Time / Cycles to Process 2-20 150-250 Millions On Chip Computing DISK 40-400MB RAM 2-3GB Data Processed CHIP 10GB >100X Faster to process data on chip cache than in RAM Automatic optimization of full storage hierarchy Vector Processing Core level parallel processing 100X eg SIMD, Superscalar 2 nd Gen Column Store Limit I/O to specific columns Efficient real time updates Smarter Compression Maximizes throughput via automatically optimized on-chip compression
Ingres VectorWise The World s Fastest Database Ingres VectorWise is a breakthrough Analytical Database Faster 10x - 70x faster Analytics & reporting Handle terabyte-scale data with just a single server Delivers results in seconds not minutes minutes not hours Simple Use your existing queries & SQL statements Eliminate the Buzz Words- Cubes, roll ups, indexes Deliver IT reporting projects faster with lower cost & risk Smarter Revolutionary use of modern chip technology Get more from low cost commodity hardware Database speeds increase with Moore s Law
Ingres VectorWise The World s Fastest Database Faster Insight Faster performance The World s Fastest Database Faster & deeper insight More Agile BI VectorWise could do any query we wanted in under 1 second In a pathological case, Postgres took over 13min Dan Sanders, Advanced E-commerce Research Systems Game-changing technology Don Feinberg, Gartner Group
Gartner MQ 2011 for Data-warehouse DBMSs Increasing domination of non traditional RDBMSs for data-warehousing: MPP/Appliances: Teradata/Asterdata, Netezza (IBM), Greenplum, DataAllegro Columnar/MPP: Sybase IQ, Vertica (HP), 1010 Data, Infobright, Asterdata (Teradata), ParAccel On-Chip Vectorised Computing/Columnar : Ingres VectorWise Source: Gartner Magic Quadrant for Data Warehouse Database Management Systems, Jan 2011 15
Any Questions? Reduced Costs Greater Innovation