HP Software - Big Data Challenges February 2015
The world has changed Burroughs IBM NEC Unisys Hitachi Product Configurator Mainframe Kilobytes Fijitsu Bull Sales tracking & Marketing Commissions Claim Processing Bills of Material Payroll SCM ERP Costing Database Manufacturing Projects CRM Quality Control HCM SAP Engineering HP MRM Inventory EMC Cost Management Cash Management Time and Expense Accounts Receivable Billing Activity Management Training Time & Attendance Data Warehousing Service Order Entry Joyent Client/server Megabytes Rostering HCM Fixed Assets PLM Intacct Saba DCC CCC IntraLinks Adobe Microsoft Cornerstone ondemand Softscape Kenexa Plex Systems ebay Quickbooks NetSuite OpSource Hosting.com Tata Communications Datapipe PPM NetDocuments Microsoft Corel Saba Softscape Volusion Google Ariba Alterian OpenText Workscape ADP VirtualEdge Yahoo Quadrem Xerox SugarCRM FinancialForce.com Avid NetReach Zoho Serif Yahoo! CyberShift Sage Sonar6 Hyland Qvidian Music kaggle SuperCam SLI Systems Elemica SCM Kinaxis Xactly CyberShift The Internet Gigabytes NetSuite Exact Online PaperHost Sonar6 SmugMug ihandy Facebook Fring Rackspace SolidFire Snapfish Cookie Doodle Ah! Fasion Girl Dragon Diction GoGrid buzzd Atlassian Flickr Paint.NET DocuSign UPS Mobile Hootsuite Renren Education salesforce.com Zynga Pandora Amazon Bromium Scanner Pro Foursquare nebula ischedule BrainPOP Khan Academy MobileFrame.com myhomework Toggl Xing MailChimp Amazon Web Services LimeLight News Associatedcontent MobilieIron LinkedIn Workday Navigation SuccessFactors Twitter AppFog PingMe cloudability CloudSigma HP eprint Fed Ex Mobile RightScale Amazon SmugMug Utilities Scribd. Zillabyte YouTube Business Reference Productivity TripIt Atlassian Twitter Baidu Atlassian Tumblr. New Relic Urban Parse Yandex Yandex Mozy Zynga Finance Games Workbrain Entertainment Jive Software Qzone Travel box.net Mixi Heroku CYworld BeyondCore Taleo Lifestyle Pinterest Sport Photo & Video Yammer Answers.com Viber Social Networking PingMe Splunk ScaleXtreme Big Data, Cloud, Mobility Zettabytes dotcloud Every 60 seconds 204 million+ emails sent 100,000+ tweets 2 million+ Google searches $275,000 spent online shopping 35,000 brand Likes on Facebook 38,000 new Tumblr blog posts 48 hours new video on YouTube 2,000 check-ins on Four Square Brontobytes + Geopbytes 2
Big Data from the Internet of Things We have gone beyond the decimal system Today, data scientists use Yottabytes to describe how much government data the NSA or FBI have on people altogether. In the near future, a Geopbyte will be the measurement to describe the type of data generated from the IOT. Brontobyte This will be our digital universe tomorrow Yottabyte This is our digital universe today Exabyte 1 EB of data is created on the internet each day 10 27 10 24 Terabyte 500TB of new data per day are ingested in Facebook databases 10 18 Megabyte 10 12 10 6 Geopbyte 10 30 This will take us beyond our decimal system 10 21 Zettabyte 1.3 ZB of network traffic by 2016 10 15 Petabyte The CERN Large Hadron Collider generates 1PB per second 10 9 Gigabyte 3
Enterprise data growth Burroughs IBM NEC Costs of managing data Unisys Hitachi Product configurator Mainframe Kilobytes Fijitsu Bull Claim processing Payroll Sales tracking & marketing Commissions Bills of material SCM ERP Costing Database Manufacturing projects CRM Quality control HCM SAP Engineering HP MRM Inventory EMC Cost management Cash management Time and Expense Accounts receivable Billing Activity management Training Time & attendance Data warehousing Service Order entry Joyent Client/server Megabytes Rostering HCM Fixed assets PLM Intacct DCC Saba CCC IntraLinks Adobe Microsoft Cornerstone ondemand Softscape Plex Systems ebay Quickbooks Kenexa NetSuite OpSource Hosting.com Tata Communications Datapipe PPM NetDocuments Microsoft Google Alterian Corel Yahoo Saba Softscape Volusion Ariba OpenText Workscape ADP VirtualEdge Quadrem Xerox SugarCRM FinancialForce.com Avid NetReach Zoho Serif Yahoo! CyberShift Sage Sonar6 Hyland Qvidian Music kaggle SuperCam SLI Systems Elemica SCM Kinaxis Xactly CyberShift SmugMug ihandy The Internet Gigabytes NetSuite Exact Online PaperHost Sonar6 Facebook Fring Rackspace SolidFire Snapfish Dragon Diction GoGrid Cookie Doodle Ah! Fasion Girl buzzd Renren Atlassian Paint.NET DocuSign Hootsuite Education Flickr UPS Mobile salesforce.com Zynga Pandora Amazon Bromium Scanner Pro Foursquare nebula ischedule BrainPOP Khan Academy MobileFrame.com myhomework Toggl Xing MailChimp AppFog Amazon Web Services LimeLight News LinkedIn Workday Navigation SuccessFactors Associatedcontent MobilieIron Twitter PingMe cloudability CloudSigma HP eprint Fed Ex Mobile RightScale YouTube Business Amazon Utilities Scribd. SmugMug Zillabyte Reference Games Productivity TripIt Atlassian Twitter Baidu Atlassian Tumblr. New Relic Urban Parse Yandex Finance Entertainment Jive Software Qzone Travel Lifestyle box.net Sport ScaleXtreme Workbrain Yandex Mozy Zynga Mixi Heroku CYworld BeyondCore Taleo Splunk Pinterest Mobile, social, Big Data & the cloud Zettabytes Photo & Video Yammer Answers.com Viber Social networking dotcloud PingMe Every 60 seconds 1,820 TB of data created The volume, velocity and breadth of channels often overwhelms Information Management strategies leading to dark data TCO for unstructured data varies between $4/GB to $100/GB annually, but $25GB is a good rule of thumb* Storage costs are visible, soft costs such as opportunity & risk costs are less so, but no less real 4 *Source: ESG White Paper The Cost of Managing Unstructured Data, May 2014
What is legacy data and dark data? Redundant, obsolete, trivial, and the unknown Legacy data resides in: Legacy applications and repositories Unmanaged SharePoint sites, file shares and mail systems Legacy data can contain or be: Redundant Duplicates and unauthorized copies Obsolete No longer in use or out of date Determined through creation, last modified or accessed date and retention policy Trivial File type with no content value Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What is dark data? What lies hidden in your enterprise data the unknown Beyond legacy data Dark data tends to be: Human readable Unstructured Unindexed Unmanaged Inactive Orphaned Dark data resides in: File servers SharePoint Email servers Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The risk of ignoring legacy and dark data Legacy & dark data sitting outside the information governance strategy exposes the organization to risk: Spiralling costs Expanding information footprint and storage costs Litigation and ediscovery costs ( smoking gun or inability to deliver) Security breaches and reputational damage Sensitive information unprotected (personally identifiable information, privacy regulations) Data leakage and misuse Poor business execution and performance Incorrect context Decisions based on outdated information Duplicate effort spent re-creating information Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Today s reality 86% of corporations cannot deliver the right information at right time* 0.5% 3% 23% % of the digital universe that is actually being tagged, analyzed and leveraged actually being tagged for Big Data value % of data that would be potentially useful if effectively engaged 8 *Source: IDC Predictions 2012: Competing for 2020 & ¹Source: IDC The Digital Universe in 2020, December 2012
Data is exploding but traditional data technologies impose limits - We need connected intelligence Insight from 100% of the data Connected Intelligence Machine data Human information Structured data 9
leaving a trail of digital footprints. Copyright 2013 2014 Hewlett-Packard Development Company, L.P. L.P. The The information contained herein is is subject to to change without notice.
Accuracy and insight Engage 100% of data to gain competitive advantage Traditional enterprise data Big Data Dark data CRM ERP Data warehouse Web Social Log files Machine data Semi-structured Unstructured Data volumes 11
It takes a Big Data platform to cash in on all your data assets Siloed data challenge Only.5% of data in the average organization is tagged and analyzed Information silos - everywhere Tools for finding and understanding information, tied to original application and format Queries take too long and are too rigid, difficult to uncover opportunities, emerging patterns & unexpected threats Big Data platform needs Ad hoc discovery - find what s in the data without pre-structuring it Ubiquitous but secure data access Real time data collection and analysis, any format, any data source An extensible platform to harness100% of data, on-premise, in the cloud 12
HP Haven Big Data platform HP applications Customer applications Developer applications Gain insight from 100% of your data Analyze machine, business, human data Connect to any existing data source system Scale 50-1000x faster than legacy systems Develop modern datadriven applications & web services Haven Defined programming interfaces Analytics, context and categorization Scalable data stores Data connectors In the Cloud Social media Video Audio Email Texts Mobile Transactional Documents IT/OT Search data engine Images Records Compliance archives On-premise 13
This is a rolling (up to 3 year) roadmap and is subject to change without notice Use case # 1: Smart / Safe City Improving public safety by detecting high-risk activities and investigating threats Deployment Environment - Ingest data from 2,000+ CCTV cameras in Auckland View network of road and environmental sensors Social media trending, broadcast monitoring, and real time web news Phase 1 scene analysis and license plate recognition Future Phase - Integrate HP Vertica to uncover breaking trends and facilitate incident responses HP IDOL eduction sends interesting data to Vertica for statistical analysis and slice/dice Combine HP Vertica s pattern-matching and graph-analysis at scale with HP IDOL s ability to model concepts and enrich data 14
This is a rolling (up to 3 year) roadmap and is subject to change without notice Use case # 2: Catch Insider Traders Financial Services - Information Surveillance & Digital Forensics Solution Multiple data sources: 15 HP Digital Safe data Transactional trading data Financial news feeds Social media Email, voicemail recordings, instant messaging Phase 1 complex policies such as highlighting suspect trades where no communication can be found between related Bank A and Bank B contacts Future Phase - Integrate HP Vertica for trend and anomaly detection HP IDOL eduction sends interesting data to HP Vertica for statistical analysis and slice/dice Combine HP Vertica s pattern-matching and graphanalysis at scale with HP IDOL s ability to model concepts and enrich data
This is a rolling (up to 3 year) roadmap and is subject to change without notice Use case # 3: Smart Retail / Voice of Customer Prevent churn, analyze NPS surveys, react to product/warranty issues Multiple data sources: Phase 1: Enterprise documents, email, ticketing systems, CRM cases, videos Customer social media, blogs, forums, User Generated Content, surveys Public Websites, News Sentiment detection, clustering Eduction people, places, credit card #s Link expansion, Gender detection Curation, tagging, alerts Future Phase - Integrate HP Vertica for demographic profiling HP IDOL eduction sends interesting data to HP Vertica for statistical analysis and slice/dice Combine HP Vertica s pattern-matching and graphanalysis at scale with HP IDOL s ability to model concepts 16 and enrich data
Real world: claims integrity Leading health insurance company Business need Identify duplicate or inaccurate health insurance claims and transactions (i.e. overpayment) Multiple legacy systems containing claims data, with little integration Solution Connect legacy systems and create a common index of claims data regardless of location, type or source Identify unusual patterns in transactions to identify fraud or error Business benefits Massive ROI through reduction in duplicate claims paid Improved operational efficiency 17 Copyright 2013 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Real world: expertise networks Aircraft manufacturing 18 Business need Employees waste 30 min/day finding info, duplicate work of others Identify expertise across global community of 35,000 engineers Avoid manual approaches such as describing areas of interest & expertise in contacts directory using predefined keywords Solution Generate user profiles automatically and in real time based on the pages visited and documents read Alert employees when documents, other employees, match the work they are doing Business benefits Reduced time spent retrieving information by over 90% Identified teams working on similar projects across the globe ROI within 7 months Copyright 2013 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Summary Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Confidential.
HP Haven Big Data platform HP applications Customer applications Developer applications Gain insight from 100% of your data Connect to all of your machine, business, & human data sources Analyze at volume and velocity of data Develop modern data-driven applications Haven Defined programming interfaces Analytics, context and categorization Scalable data stores Data connectors In the Cloud Social media Video Audio Email Texts Mobile Transactional Documents IT/OT Search data engine Images Records Compliance archives On-premise 20
Check out the websites. www.autonomy.com www.vertica.com www.hp.com 21
Thank you Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Confidential.