VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations and perform deeper real-time analytics in order to deliver better customer experience, improve operational efficiency and monetize their data assets to uncover new revenue streams. In addition to answering the traditional questions around sales, performance, profitability etc., enterprises are now looking to discover correlations between seemingly unrelated streams of data to gain a competitive advantage.
Utilities Smart grid analytics Preventive maintenance Distribution load forecasting and scheduling Create targeted customer offerings Condition-based maintenance Telecommunications Network data monetization Revenue assurance CDR analytics Location based services Smarter campaigns Oil & Gas Production loss minimization Advance condition monitoring Enable customer energy management Production surveillance & optimisation Banking & Financial Sector Fraud detection Risk modelling & management Insurance claim analysis Contact centre efficiency and problem management Counterparty credit risk management INDUSTRY USE CASES Government Threat prediction and prevention Social program fraud, waste and errors Tax compliance - fraud and abuse Crime prediction and prevention Figure 1: Analytics Industry Use Cases Retailers Online personalization Recommendation engines Marketing spend optimization Actionable customer insight Manufacturing & Automotive Proactive equipment maintenance Supply chain management Predictive asset optimisation Connected vehicle Actionable customer insight Discovery of such hidden correlations and insights require platforms that Ingest, Store and Process large volumes of high velocity in real time from a variety of data sources Generate rapid insights by co-relating multiple structured and un-structured data streams Enable data scientists to discover correlations and model algorithms that can deliver insights Self-learn based on new data patterns and improve accuracy of insights With the advent of new types of data emerging from web and connected devices, the challenges in delivering these insights have increased multi-fold and put a tremendous pressure on traditional Business Intelligence platforms. Structured data grew by more than 40% per year Types of Traditional content including unstructured data is growing by up to 80% per year An estimated 2.8ZB of data in 2012 is expected to grow to 40ZB by 2020. 85% of this data growth is expected to come from new types; with machine-generated data being projected to increase 15x by 2020. (Source IDC) This new form of data poses unique challenges for business and IT such as Effectively storing large multistructured data Capturing high-speed data and processing it in the right time Creating flexible yet highperformance data structures to answer new business questions Creating a platform which will provide an integrated and unified access to all the information Market Offerings and Gaps To address the challenges mentioned above, the industry is now looking at platforms to support the Volume, Velocity, Variety and Value needs of Next-Gen Business Intelligence. These platforms are typically a combination of Data and Discovery/Analytical Tools. Current offerings available in the market can be classified as Data, which offer open source platforms for building and managing Data Lakes and Big Data Warehouses. These platforms leverage components from open source Hadoop stack and provide platform administration, governance, security and basic discovery capabilities on top of the core stack. Some of the leading vendors in this space include Cloudera, HortonWorks and MapR External Document 2015 Infosys Limited
External Document 2015 Infosys Limited
Analytics, which offer point analytical solutions to industry use cases such as Customer Analytics, Network Analytics etc. These offerings provide pre-built analytics on top of Hadoop ecosystems to serve specific business needs. Some of the leading vendors in this space include Datameer, Platfora, Guvavus etc. Augmented Appliances, which offer high speed appliances on top of Hadoop storage to speed up data access. Teradata Aster, EMC Greenplum, SAP HANA, HP Vertica and IBM Big Insights are some of the leading vendors in this category Data Discovery tools, which provide exploratory capabilities on top of Big Data storage platforms. They help ingest data from a variety of data sources, model it and create consumable data sets out of the underlying data. Tableau, QlikSense, Tibco Spotfire etc. are some of the leading vendors in this space While these offerings help in addressing some of the challenges posed, they do not offer a silver bullet to solve all of the Big Data Challenges. Data Analytics Augmented Appliances Data Discovery Tools BENEFITS Low cost Simplify the platform management functions and allow organizations to focus on data Point solutions that come with pre-built analytics Industry standard algorithms MPP and In-memory capabilities offer fast response time for on-demand analytics Enable data scientists to access raw data and discover insights Cuts down on data preparation time significantly No pre-built analytics High cost High Cost No data platform GAPS Minimal discovery capabilities Point solution that caters to specific use cases and is not meant for integrated analytics No pre-built analytics Minimal discovery capabilities No pre-built analytics Infosys PoV Infosys believes that the challenges posed by Big Data need a High Performance Analytics (HPA) platform that provides a comprehensive set of building blocks to provision data, define storage structures, create data sets of consumption, enable exploration and run analytical models against the data. The solution should offer enough flexibility to extend the available analytical models to suit enterprise specific needs. The core of any HPA platform is a data management platform that can practically store unlimited amounts of data of any format, schema and type, that is relatively inexpensive and massively scalable. Data Lakes are designed to offer this capability. User Access In-Memory Performance Layer Enterprise ETL Framework Making Hadoop the primary component of DW is a game changing trend Data Factory Business Data Lake Data Pools Harmonized Data Zone Transformed Data Zone Actionable Information In-motion processing Near-Real time batch processing Leveraging the source-once and reuse approach improves efficiency, reduces data-silo, latency and time to value; massively improves analytics and discovery, and greatly reduces cost Data Pipeline management Integrated Data Management & Governance Enterprise ETL integration Real-Time 100% Source Data Transaction Master Data Machine Data Web Data Reference Data Lookup data Micro-Batch Public/ 3 rd party data Batch... Enriched Data Zone Raw Data Reservoir Discovery Lab / Analytics Sandbox Actionable Insights External Document 2015 Infosys Limited Figure 2: Logical Architecture of Data Lake
They help in two ways Information Discovery/Agility in Analytics simplify the data acquisition to initiate discovery on raw data by exposing business users through discovery tools. Data Warehouse Expansion helps in expanding data warehouse to capture data at a lower grain and higher diversity, which is then fed into upstream systems. Unlike traditional relational databases, data can be stored in the raw format where analysts and developers can then apply a structure to suit the needs of their applications at the time they access the data with Schema on Read instead of Schema on Write. Intelligent Data Discovery tools enable data scientists to build data models and views that can be used for the analysis of structured and unstructured data. They offer capabilities to search on metadata and create data sets for running analytics. They eliminate the need for IT involvement and reduce the time involved in data preparation. Analytical modules built on statistical tools, enable data scientists to build algorithms and models that can deliver predictive and prescriptive insights. While the analytical models will differ from enterprise to enterprise, a best-inclass analytics platform should have the basic building blocks such as Sentiment Analysis, Text Mining, Fault Prediction, Fraud Detection, Risk Analytics etc. which can be extended based on the enterprise s specific needs. Data Lakes combined with Data Discovery tools and Analytical Algorithms form the core of a High Performance Analytics Platform. Discovery Data Lakes/ Big Data DW High Performance Analytics Analytical Algorithms Figure 3: High Performance Analytics Platform Components NoSQL Maturity Phase Advanced Analytics Build Platform Build Business Data Lake platform Add basic capabilities like Data Ingestion Onboarding Data Sources Adding data sources Build Metadata Capabilities Build basic exploration capabilities Standardization Bringing in more Data Sources Build Data Governance Capabilities Enrich Data Sets with Reference Data Build a semantic layer Curative Layer Enable data hub layer for important operational reports Enable dimensional layer for analysis Sand pits for data discovery Build data services to/from existing data marts Create an analytics CoE Scale the analytics process by business area Build the pool of data analysts, scientists & domain experts Figure 4: High Performance Analytics Platform Implementation Phases High Performance Analytics Platform addresses the challenges posed by Big Data by providing Data Lakes built on commodity hardware that are cost-effective for storing large volumes of data Distributed processing architecture that efficiently processes large volumes of structured, semi-structured and unstructured data Horizontal scalability that can support future needs In-memory processing engines to deliver rapid insights Pre-built analytical models that can be extended to enterprise specific needs, reducing the time-to-insights Data discovery and deep analytics capability that can uncover hidden correlations and deliver deep insights based on all available data assets External Document 2015 Infosys Limited
Success Stories Implemented a Business Data Lake for an Australian telecom major to provide insights into various lines of business like Revenue Assurance, Marketing etc. and improved the ability to enable 11.2% of the total USD 26.8 billion revenue persistently, with coverage accelerated to 27% Implemented a Data Lake for a leading financial major in US which gave a 360 0 view of the wholesale customer that improved prospecting effectiveness, market segmentation and positioning Increased ARPU, reduced customer churn and identified new revenue streams by selling anonymized data to advertisers and retailers for a Singapore Telco by building a Big Data enabled customer analytics platform Created a model to predict ATM failure with 80% confidence level for over 8500 ATMs which resulted in a 14% increase in call center efficiency and 18% cost reduction For more information, contact askus@infosys.com 2015 Infosys Limited, Bangalore, India. All Rights Reserved. Infosys believes the information in this document is accurate as of its publication date; such information is subject to change without notice. Infosys acknowledges the proprietary rights of other companies to the trademarks, product names and such other intellectual property rights mentioned in this document. Except as expressly permitted, neither this documentation nor any part of it may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, printing, photocopying, recording or otherwise, without the prior permission of Infosys Limited and/ or any named intellectual property rights holders under this document. Stay Connected