Il mondo dei DB Cambia : Tecnologie e opportunita` Giorgio Raico Pre-Sales Consultant Hewlett-Packard Italiana 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Sources Data Consumption Taxonomy Drive transactions Manage and store information Generate insight Transactional/ OLTP CRM Data Integration Data Management Platform Enterprise Data Warehouse Data Mart Business Reporting & Analytic Applications (Visualization) Reporting, Dashboards, OLAP, Information Delivery Order ERP Finance Select Extract Transform Integrate & Load EDW DM DM DM Reports OLAP Apps Exec Dashboards Unstructured Data Analytics Analytic Applications 2
Transactional Transaction processing Real-time processing Schemas structure data ACID DB system table Interactive fast response TB data Unstructured Analytic processing Not designed for low-latency access The consistency of data is weak Distributed with data replication Batch oriented; not interactive Peta Byte-order data 3
INFORMATION ASYMMETRY There will always be a gap between what you want to know and what is knowable 4
Information Asymmetry A New Class of Information Social media Call detail records Gene sequencing Sensors Customer purchasing history Algorithmic trading Click streams Game interactions 5
Is Snoopy a dog? Information Asymmetry Standard Unstructured 6 HP Restricted. For HP Internal and Partner Use Only. Amount 15% 85% Growth 22% 62%
New generation customer intelligence What they Do Structured Data Unstructured Data Improve existing knowledge Expose new knowledge Customer Profile Transactions Preferences Responsiveness Customer Customer Perception Intentions & Suggestions Influence & Evangelism What they Think 8 Copyright 2011 Hewlett-Packard Development Company, L.P. HP RESTRICTED
A Brief Background on Databases As Prices Decline, Devices Proliferate As Devices Proliferate, Data Creation Explodes Consumer Content Creation and Consumption is Increasing What Issues Do Today s Data Pose? Data Volume, Often Data Item Tend to Be Big Data Items are Being Created Rapidly, New data type Structure (or Lack of) Cataloging Data 8
A Brief Background on Databases If a Relational General Purpose Databases Can Do every thing, Why bother? Cost and Performance Shortcomings of Relational Databases Today Data Volumes Parallelization : MPP, Appliances, Clusters If you go parallel on HW you have to coordinate : Shared Nothing The trade off when sorting : Columnar Databases The Hard Drive Bottleneck: In Memory Databases Speed up latency : Flash Technology 9
A Brief Background on Databases New Data type and use cases are creating New Companies and New business Extracting Meaning from Unstructured Data and Human created data Meta-tagging : Even Unstructured Data Needs Structured Analysis Transactional Data Machine Data ( web log, click log ) Unstructured Text Data ( blogs, posted text, social ) Other Unstructured Data ( photo, video, voice ) 10
Moving Forward : Drive The Change HP will support your choices - Change HW/O.S without DB update - Change HW/O.S. with DB Update - Change DB same HW/O.S - Change DB and HW/O.S 11
Database Options OLTP/SAP OLTP DW/DM BI Integrity HP-UX Options Sybase ASE IBM DB2 Oracle EnterpriseDB Sybase ASE IBM DB2 NonStop SQL Oracle EnterpriseDB Sybase IQ IBM DB2 Oracle Datonix Datonix Query Object X86/Options Sybase ASE MS SQL Server Oracle MS SQL Server Sybase ASE EnterpriseDB Oracle MS SQL Server EnterpriseDB Sybase IQ SAP HANA Oracle Sybase IQ SAP HANA Vertica 12
Database Options Database Sybase ASE HP NonStop SQL EnterpriseDB Advanced Server 9.0 Microsoft SQL Server Description Sybase ASE is row-type relational database to support the SAP and non-sap operational workloads. It can also be used for Data Warehouse implementations. Sybase can be implemented on Integrity or ProLiant platforms. Supported Operating Systems are HP-UX, Linux and Windows The HP NonStop SQL is an excellent choice for customers who want to migrate off Oracle RAC Database for their mission-critical enterprise applications offering the highest levels of platform and database availability and resiliency. NonStop SQL is fully integrated with the NonStop hardware and software and built upon a scalable shared-nothing, fault-tolerant architecture. It is ANSI standards-based and ideal for consolidating OLTP and DW workloads. EDB is a database company supporting the Open Source PostgreSQL. It is a row oriented database that is compatible with the Oracle database. EDB can run on HP-UX, Linux or Windows. It can be deployed on Integrity or ProLiant platforms for OLTP or Data Warehouse workloads. SQL Server has complementary systems that are packaged with SQL RDBMS. These include: an ETL tool (SQL Server Integration Services or SSIS), a Reporting Server, an OLAP and data mining server (Analysis Services), and several messaging technologies, specifically Service Broker and Notification Services. It is a robust enterprise database that can run on a X86 platform in a Windows environment. It can support OLTP, OLAP as well as CRM or SAP workloads 13
Database Options (cont.) Database SAP HANA Sybase IQ Vertica Cross/Z Description SAP HANA is new solution available for high-performance analytics and BI. SAP has announced that it will also support SAP OLTP by the end of 2012. It can be used to migrate/improve some conventional Oracle technology currently in use by customers. HANA includes both database memory manager and analytics capabilities, as well as the ability to integrate with other sources and targets. Sybase IQ is a column-type relational database to support the BI data store workloads. This is the preferred Sybase solution for BI workloads. The HP Vertica Analytics System is a fully integrated analytical mart solution. To learn more, go to http://www.hp.com/go/vertica The Datonix/Query Object is an Event database management system with columnarhierarchical-relational-molap architecture 14
Database types 15
Defining the world of analytics Big Data originating with analytics beyond BI Traversing enormous diverse data types to spot patterns 16 10s - 100s of terabytes (TB), petabytes (PB), and yes - even Exabyte's (EB) Business needing faster -- real time (seconds - minutes vs. hours to days) analytic results Combining data from silos Analyzing diverse data types and Sources Connect data from various business units (cross analyze, access, & reference ) Growing at exponential rate Structured data data stored in databases Unstructured all other data including emails, social media, blogs, free form feedback, documents, transaction, multimedia (images, videos, etc.) 90% of enterprise information is unstructured Data size being a constant moving target
Structured Connectors HP Insight CMU Connectors Unstructured Connectors HP Analytics and big data solutions Deep and robust insight end-to-end Vertica Real-time, SQL-compliant, ad hoc analytics Connectors enable information transfer between Hadoop and Vertica Autonomy IDOL In depth context-based analysis of big data Builds additional rich, contextual meta data Hadoop Efficient, low cost, open source repository to store and analyze vast amounts of data Push button simplicity Low cost and optimized performance with real-time and historical monitoring Social media Customer feedback forms Emails Machinegenerated Databases warehouses ERP, CRM Autonomy IDOL Hadoop distribution s Cloudera, MapR, Hortonworks Red Hat Enterprise Linux HP Converged Infrastructur e TS Consulting Services Ad hoc analysis Vertica RDBMS, analytics, dashboards, Excel Visualization tools 17
HP Vertica Analytics System Real-time and ad hoc analytics for next generation business insight of Hadoop solution Analytics for real-time business intelligence Limitless scaling - add nodes for capacity and performance Extreme compression Columnar Simplicity, MPP, no single point of failure Bi-directional Hadoop data connectors 18
19 HP Confidential
Why do we care about Hadoop? The digital universe will expand by almost half in 2012-90% of that data is unstructured Traditional systems are not designed to analyze unstructured data Hadoop is designed specifically to extract business value from unstructured data Risk Modeling Fraud Detection Sentiment Analysis Customer Retention Web Mining Financial Services Government Retail Telecom Media Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Solutions for Hadoop Enabling an ecosystem of end to end, scalable platforms Augmenting Hadoop HP Partners Datameer Karmasphere Insight CMU Hadoop Management Software Vertica Autonomy HadoopEcosyste m Vertica Autonomy Ad hoc SQL Compliant Analytics Meaning Based Analytics HP Partners Cloudera MapR Hortonworks No SQL Fast OLTP with Range Queries HP Servers and HP Networking Consulting Services and Support Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Combining the strengths Hadoop for exploratory analysis Especially with existing MR, Pig scripts Vertica for interactive analysis For shared features, often faster than Hadoop with a fraction of hardware resources Vertica s Hadoop connector + Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
And Now Unstructured and Humans Data Sources are diverse Text, Sound, XML, Video and Audio It does not exactly match Is Snoopy a dog? NOW Meaning with is dynamic Meaning is multi-layered Meaning is relative Meaning is a common currency 23 HP Restricted. For HP Internal and Partner Use Only.
Why is Processing Human Information Different? Human Information is made up of ideas, is diverse, and has context. Ideas don t exactly match like data does; they have distance. Information is not static it s dynamic and lives everywhere. Meaning is a common currency across all information types. Social Media Video Audio Email Texts Mobile IT/OT Transactional Documents Search Engine Images Data 24
When the IT world started machines could not understand the real world of rich information, so a useful simpler analogy was created this gave rise to the structured data world, it has proved very useful. Over the years there have been many technology changes, the T in IT has changed many times, Mainframe, client server, IP, Cloud. IT Platforms Operate On Data with NO Sentiment Meaning and Context 25 HP Restricted. For HP Internal and Partner Use Only.
Thomas Bayes Claude Shannon 26
28 HP Restricted. For HP Internal and Partner Use Only.
65,000 customers +
The IT industry handles 10% of the problem, we do 100% 30 HP Restricted. For HP Internal and Partner Use Only.
31 GRAZIE