Cloud Analytics. A Path Towards Next Generation Affordable BI. P. Radha Krishna and Kishore Indukuri Varma



Similar documents
Cloud Computing. Key Considerations for Adoption. Abstract. Ramkumar Dargha

View Point. Oracle Applications and the economics of Cloud Computing. Abstract

Data Integrity Check using Hash Functions in Cloud environment

How To Handle Big Data With A Data Scientist

Hadoop in the Hybrid Cloud

Figure 1 Cloud Computing. 1.What is Cloud: Clouds are of specific commercial interest not just on the acquiring tendency to outsource IT

Geoprocessing in Hybrid Clouds

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications

How to Enhance Traditional BI Architecture to Leverage Big Data

WHITEPAPER. An ECM Journey. Abstract

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

IJRSET 2015 SPL Volume 2, Issue 11 Pages: 29-33

Cloud Computing for SCADA

Driving Multi-channel Commerce

THE CLOUD AND ITS EFFECTS ON WEB DEVELOPMENT

An Overview on Important Aspects of Cloud Computing

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

A Review of Data Mining Techniques

Supply Chain Platform as a Service: a Cloud Perspective on Business Collaboration

Secured Storage of Outsourced Data in Cloud Computing

Data Virtualization A Potential Antidote for Big Data Growing Pains

View Point. Lifting the Fog on Cloud

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

bigdata Managing Scale in Ontological Systems

CLOUD COMPUTING An Overview

Introduction to Cloud Computing

BPM for Structural Integrity Management in Oil and Gas Industry

View Point. Overcoming Challenges associated with SaaS Testing. Abstract. - Vijayanathan Naganathan, Sreesankar Sankarayya

Gradient An EII Solution From Infosys

Testing Big data is one of the biggest

The Private Cloud Your Controlled Access Infrastructure

perspective Progressive Organization

Mitra Innovation Leverages WSO2's Open Source Middleware to Build BIM Exchange Platform

Big Data With Hadoop

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

VIEW POINT. Getting cloud management and sustenance right! It is not about cloud, it s about tomorrow s enterprise

Running head: CLOUD COMPUTING 1. Cloud Computing. Daniel Watrous. Management Information Systems. Northwest Nazarene University

Data Refinery with Big Data Aspects

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

Introduction to Data Mining

An Overview of Knowledge Discovery Database and Data mining Techniques

Universal PMML Plug-in for EMC Greenplum Database

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

HYBRID CLOUD: THE NEXT FRONTIER

PROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015

View Point. Choosing the Right Cloud Based QA Environment for your Business Needs

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Big Data - Infrastructure Considerations

A Brief Introduction to Apache Tez

Analance Data Integration Technical Whitepaper

Grid Computing Vs. Cloud Computing

Cloud Computing Services and its Application

Cloud Computing and Business Intelligence

Big Data Defined Introducing DataStack 3.0

Innovative technology for big data analytics

NoSQL and Hadoop Technologies On Oracle Cloud

Big Data Using Cloud Computing

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Analance Data Integration Technical Whitepaper

Why DBMSs Matter More than Ever in the Big Data Era

Software as a Service (SaaS) Testing Challenges- An Indepth

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising

Oracle Big Data SQL Technical Update

Improving MapReduce Performance in Heterogeneous Environments

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BUSINESS MANAGEMENT SUPPORT

Market Maturity. Cloud Definitions

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

CiteSeer x in the Cloud

CHAPTER 7 SUMMARY AND CONCLUSION

Harnessing the power of advanced analytics with IBM Netezza

Easy Execution of Data Mining Models through PMML

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Leveraging unstructured data for improved decision making: A retail banking perspective

ORACLE DATABASE 10G ENTERPRISE EDITION

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose

Mobile Cloud Middleware: A New Service for Mobile Users

Placing Your Applications in the Best Cloud Model

Big Data. Fast Forward. Putting data to productive use

Realizing the Value Proposition of Cloud Computing

Introduction to Cloud Computing

An Accenture Point of View. Oracle Exalytics brings speed and unparalleled flexibility to business analytics

Big Data 101: Harvest Real Value & Avoid Hollow Hype

Big Data for Investment Research Management

SOA and Cloud in practice - An Example Case Study

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

IBM System x reference architecture solutions for big data

CHAPTER 1 INTRODUCTION

A Proposed Case for the Cloud Software Engineering in Security

Transcription:

Cloud Analytics A Path Towards Next Generation Affordable BI P. Radha Krishna and Kishore Indukuri Varma Abstract Technology innovation and its adoption are two critical successful factors for any business/organization. Cloud computing is a recent technology paradigm that enables organizations or individuals to share various services in a seamless and cost-effective manner. Business Intelligence for organizations, on the other hand, is becoming a growing need to understand their business insights and trends. Currently organizations are trying to leverage BI to maximum extent; however, there is a big gap in turning BI outcome to aid their ROI expectations and further growth. This necessitates porting current data analytic applications on to the cloud due to its ability to process large datasets as well as extensive support for scalability at low cost. This article brings out the technology challenges and opportunities to enable analytics in cloud environment, which makes BI affordable for all organizations.

Cloud Paradigm Cloud computing is an emerging computing paradigm that was innovated to deploy cost effective solutions over Internet. Companies such as Google, IBM, Amazon, Yahoo and Intel have already started providing computing infrastructures for its intend use. Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet [5]. It is a scalable service delivery platform build over Service Oriented Architecture (SOA) and Virtualizations concepts. The benefit of cloud can be stated in two perspectives: (i) from cloud service provider perspective and (ii) from cloud user perspective. Cloud service providers get benefited with the better utilization of infrastructure they own. Even, they can obtain an improvement in the processing power with minimal additional cost. Thus, giving way for new business opportunities and speed up existing business processes. Users get benefited by paying only for what they consume and even they need not have insights of the technology infrastructure used in the cloud that supports them. Data analytics applications which require processing extremely large datasets while support extreme scalability are attracting huge interest towards cloud. The cloud computing generally incorporates combinations of the following: Infrastructure as a service (IaaS) - Providing servers, software, office space and network equipment as service. Platform as a service (PaaS) - Providing computing platform and solution stack as a service. Software as a service (SaaS) - Providing an application as a service. Cloud consists of huge number of servers facilitating distributed computing platform. Each server can have multiple virtual machine instances (VMs) of the applications hosted in the cloud. As customer demand for those applications changes, new servers are added to the cloud or idled and new VMs are instantiated or terminated. A major advantage with cloud is the recoverability in case of disaster. Cloud data is mostly backed up regularly and stored in a different geographic location. This eases the recovery process which otherwise is a very costly feature for companies. Cloud computing infrastructure is totally different from the current data warehouse infrastructures. Unlike the high-end servers which constitute the current infrastructure, Cloud computing offers the same functionality with low-cost. Since most analytics for an organization uses huge number of database reads than writes, there is a requirement for new relational database software architecture to efficiently store and retrieve large volumes of data for BI on Cloud. In a cloud, node failures and node changes may occur frequently. Since the cloud uses a vast number of processing units, the failures occurring in the cloud can be shielded from its users using robust policies to make the cloud as tolerant as required (or requested). The best cloud databases will replicate data automatically across the nodes in the cloud cluster, be able to continue running in the event of 1 or more node failures [6], and be capable of restoring data on recovered nodes automatically without DBA assistance. Ideally, replication will be active-active in that the redundant data may be queried to increase performance. Cloud Analytics Few decades back, the problem was the shortage in information or data. In recent past, this problem has been overcome with the advent of Internet and reduced Storage Memory cost. But a new challenge is how to analyze the data. Data is getting generated at a much faster pace than the speed at which it can be processed with the current infrastructure. Huge and dedicated servers were developed to solve this problem. But the problem is with the cost of such an infrastructure which is not affordable to all the companies for each and every specific purpose. So today, these companies are looking for Cloud computing which makes feasible for all these companies to hire on a temporary basis, the computational power and storage space for a specific purpose. This is termed as Utility Computing derived from the utilities like electricity, gas for which we only pay for what we use from a shared resource. With the growing interest in cloud, analytics over cloud has seen surge in interest. Recent reports (ex., [7], [8]) illustrate this with various architectures and analytic services possible over cloud. Availability of data and accessing it is the key success factor for value-based analytics. Migrating required data in and out of Cloud is a challenging task. 2 Infosys White Paper

In general, Business Intelligence applications such as image processing, web searches, understanding customers and their buying habits, supply chains and ranking and Bio-informatics (e.g. gene structure prediction) are data intensive applications. Cloud can be a perfect match for handling such analytical services [1, 2]. For example, Google s MapReduce [4] can be leveraged for analytics as it intelligently chunks the data into smaller storage units and distributes the computation among low-cost processing units. Several research teams have started working on creating Analytic frameworks and engines which help them provide Analytics as a Service. For example, Zementis launched the ADAPA predictive analytics decision engine on Amazon EC2, allowing its users to deploy, integrate, and execute statistical scoring models like neural networks, support vector machine (SVM), decision tree, and various regression models. Cerri et al [3] proposed Knowledge in the cloud in place of data in the cloud to support collaborative tasks which are computationally intensive and facilitate distributed, heterogeneous knowledge. According to Talia [4], knowledge services over cloud can be classified into four: 1. Single Steps: That composes a KDD process such as pre-processing, filtering, and visualization. 2. Single Data Mining Tasks: such as classification, clustering, and association rules discovery. 3. Distributed Data Mining Patterns: such as collective learning, parallel classification and meta-learning models. 4. Data Mining Applications or KDD processes: including all or some of the previous tasks expressed through a multi step workflow. Processing Units Data Servers Knowledge, Analysis Result Analysis Guidelines Triggers/Alerts Lexicon, Ontology Reports, Graphs, Charts Analysis/ Forecasting Figure 1: Illustration of Cloud for Analytics Infosys White Paper 3

As illustrated in Figure 1, unlike the usual Analytics, the analytics on cloud require separate functionality to leverage the cloud infrastructure for complex tasks. While in some scenarios, there will be a need for incremental machine learning models which can visit each of the node sequentially, iterating over all the nodes several times, there are many scenarios where the data or tasks can be parallel processed in their respective nodes and the derived knowledge is aggregated in a common interfacing node. We define a Knowledge Process as one that serves in capturing required knowledge in a virtual environment in order to work with analytical models. Analytics for cloud: Analytics has its applications for cloud economics also. i. Resource optimization: Helps in optimally scheduling resources available based on the nature of nodes waiting to be allocated and their cost factor. ii. Demand Forecasting: Helps the Architects and Managers forecast the demand and act accordingly. Estimations can even help in pricing the service which will bring in noticeable profits while minimizing expenses in terms of new hardware or infrastructure to meet the forecasted demand. iii. Billing strategy based on Seasonal analysis can even be learned into an analytics model which can automatically quote the price for any time period and computing service requirements. iv. Identifying rarely accessed data and moving it to highly compressed and low cost devices. Two-kinds of services which can be visualized for cloud analytics are: (a) Analytics as a Service (AaaS) Analytics as a Service provides clients with analytics on demand. They pay for the usage of the analytic solutions as a service by CSP. The idea here is that, CSP lists analytic solutions as a service and customers pick required solutions and leverage it for their specific purposes. For example, a customer can select sentiment analysis as a service, which helps him/her analyze their customer s sentiment on his/her products Vs competitor s products. This helps in drafting product roadmaps based on market analysis. The cost incurred in the traditional process is very high as a company has to plan its course from scratch. Instead providing AaaS will be highly cost-effective as the data is publicly available on web as blogs or review articles. (b) Models as a Service (MaaS) For realizing (near) Real-time BI, high-performance analytic capabilities (in terms of database and analytical modeling) that run over cloud are required. Models as a Service provide clients with building blocks to develop their analytical solutions by subscribing to the models available over a cloud. Various models which have extremely high CPU and memory usage tasks (e.g., Clustering models like SVM, Neural Nets, Bayesian models) can be ported to a cloud. Issues and Challenges Though cloud computing is on the verge of becoming a reality, there are several issues and challenges. Few of them are described below from analytics point of view: Tuning Knowledge models Tuning existing models for a particular application being served over the cloud will enable the clients to leverage existing skill sets to customize the model to their particular requirement. Ensuring Privacy The data or knowledge Analytic services may be processed outside the client s premises. So, one needs to ensure the privacy of the data as well as knowledge derived as a service. Supporting Column Based Indexing [6] Most of the current relational databases being used by the clients are row-based. However, the analytics over cloud which accesses data on column at a time motivate one to opt for column oriented databases in place of the row oriented databases. Existing models have to be augmented to support column-oriented databases. 4 Infosys White Paper

Ensuring Data Availability [1] Data analytics are data intensive applications and hence required new mechanisms when one or more nodes of a cloud fail. Cloud should have the capability to recover and progress in the event of multiple node failures. Specialized file systems are needed for cloud units to handle such failures. Embedding Analytical models inside Databases Analytics are also computational intensive. Moving the analytical computations inside database engine paves way for better performance and reduced cycle time. These features also help in utilizing the parallel-processing capabilities of database engine and converting cloud databases into active cloud databases. Ensuring Data Quality [3] It is a reported fact that, in P2P systems, the output of a query is most of the times incomplete due to poor quality of data. Adopting similar situations over the cloud, the data residing on multiple and highly distributed processing units creates poor data quality that may not be suitable for analytics. Ensuring Data Currency In Data Analytic applications, by the time models are generated from the data that is available, there might be more recent data on which the model built might stay invalid. Cloud can tackle this problem by reducing the cycle time between data availability and model generation. Specialized algorithms which enable fast incremental learning model generation over cloud need to be explored. Enabling Knowledge Process flow Knowledge process can move from one active node in the cloud to the other, extracting enough knowledge from it. These knowledge instances learned need to be consolidated from time to time to infer Knowledge on the cloud. Public Vs Private Cloud Clouds, synonymous to Public Clouds, facilitate sharing available computing resources to multiple businesses. One of the major issues with cloud computing today is the trust in security and privacy of business data. To overcome this issue, organizations are building their own clouds, referred to as Private Clouds. Though Private Clouds are costlier since the business owns the resources, they make organization to relax from unnecessary stress of dealing with the unknown. As standards around security and privacy are becoming stronger, we may envisage that public clouds will slowly increase their impact and importance to businesses in terms of trust. We can visualize the following scenarios from analytics point of view: Data Private, Model Private Analytic services are used for company s internal data analysis and may form a secret to the outsiders. This introduces the need for Data to be private as well as the Model to be Private. This scenario can be handled purely with private clouds. Data Public, Model Private Service Applications like web crawlers depend on data over a cloud which gets data from public resources and leverages this data for analytics internal to the business. For example, a crawler can crawl the web to meet together various service requirements by different clients in different applications and provide the analytics or statistics requested as a service. Data Public, Model Public This is a trivial case of cloud computing, for instance, blogs are now being widely used by companies to advertise their products and ask for reviews, web blogs managed by Industry verticals or specific companies are usually public along with their content. Data Private, Model Public This is a classic case for Model as a Service. In this scenario, private data (stored in a private cloud or enterprise) analysis is done using a public model residing over the cloud. For example, the standard financial models can be used from the public cloud to analyze the financial data which is confidential to the business. Infosys White Paper 5

There is a plethora of CSPs available in the market, however, there is still a need of evolving research models and ideas to meet growing demands in a more efficient and cost-effective manner. One such attempt is Cloud Views described in the next section. Cloud Views Cloud Views, introduced in this article, is a solution that overcomes the limitations of security, privacy, consistency and compliance aspects of cloud management. The concept Cloud Views evolved from the Views which can be created over a relation database. As illustrated in Figure 2, a Cloud View similar to the database views, abstracts the physical schema of the cloud limiting the degree to which the actual infrastructure is exposed to the outer world. Cloud Views are either virtual or materialized views created in a cloud environment. Each cloud view abstracts the required functionality to share cloud infrastructure. Unlike database views and workflow views, cloud views abstracts the data, application, infrastructure and all other computing resources provided over a cloud. Moreover, like other views, cloud views enable data access, privacy and security. Cloud View -Public Cloud Cloud View - Abstracted, SimplifiedSecure presentation of a cloud. 6 Infosys White Paper Figure 2: Illustrating Cloud Views The Cloud views, when developed with industry-specific standards, can create a logical infrastructure which eliminates the need for a Intranet based cloud for any company. Cloud views also enable semantic definition of the application requirements and also facilitate manifestation and support of application requirements in cloud environment. Cloud Views facilitate the following: i. Privacy and Security ii. Performance iii. Ease of application development Cloud views can be created incorporating either a community-wide (such as cloud view for financial industry) or a corporatewide model for adopting industry-specific standards as well as corporate-specific standards respectively. One point of view is the creation of a community-wide cloud is to provide cloud views for a specific industry/community at low-cost, high reliability and heavy security mode. One of the major limitation of adopting cloud computing is the lack of stringent standards around usage of cloud services. However, using cloud views, the existing standards can be adopted and deployed easily on respective cloud view. The second advantage of having these views is that the policies can be maintained at view level, rather than at cloud level as it is difficult to maintain different policies on the same cloud, which may show redundant and arising conflicts. Without

a cloud view, this would have been a very complicated task. Cloud views can also have sub-views, for instance, industry-wide view as a main view and corporate-wide view as a sub-view. Currently there are some standards available for analytics like Predictive Model Mark-up Language (PMML) created for the ADAPA engine by Zementis, which is supported by commercial vendors as well as open source tools like R. Also SOA and Web Services Resource Framework (WSRF) which can be used to define basic services for data-mining on cloud [4]. The migration and acceptance of these standards by the industry is still maturing. Conclusions Cloud computing is a recent technology paradigm that most of the infrastructure and services industries are focusing to capture potential opportunities. Analytic over cloud enables organizations to realize their analytical needs in a more affordable manner. This paper explores various technological perspectives for cloud analytics and various cloud services that can be envisaged in future. References : Ananthanarayanan R., Gupta K., Pandey P., Himabindu P., Sarkar P., Shah M., and Renu Tewari R., Cloud Analytics: Do We Really Need to Reinvent the Storage Stack?, (http://www.usenix.org/event/hotcloud09/tech/full_papers/ ananthanarayanan.pdf). Armbrust M., Fox A., Griffith R., Joseph A. D., Katz R. H., Konwinski A., Lee G., Patterson D. A., Rabkin A., Stoica I., and Zaharia M., Above the clouds: A Berkeley view of Cloud computing, Technical report UCB/EECS-2009-28, Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, USA, February 2009. Cerri, D., Valle, E., Francisco Marcos, D., Giunchiglia, F., Naor, D., Nixon, L., Teymourian, K., Obermeier, P., Rebholz- Schuhmann, D., Krummenacher, R., and Simperl, E., Towards Knowledge in the Cloud, The OTM Confederated international Workshops and Posters on the Move To Meaningful internet Systems, Spain, 2008. Dean J and Ghemawat S., MapReduce: Simplified data processing on large clusters. Sixth Symposium on Operating System Design and Implementation, pp. 137 150, 2004. http://en.wikipedia.org/ http://www.vertica.com/ http://www.forrester.com/rb/research/in-database_analytics_heart_of_predictive_enterprise/q/id/48289/t/2 http://www.reuters.com/article/idus218210392320100618 Infosys White Paper 7

For more information, contact askus@infosys.com About Infosys Many of the world's most successful organizations rely on Infosys to deliver measurable business value. Infosys provides business consulting, technology, engineering and outsourcing services to help clients in over 30 countries build tomorrow's enterprise. For more information about Infosys (NASDAQ:INFY), visit www.infosys.com. 2012 Infosys Limited, Bangalore, India. Infosys believes the information in this publication is accurate as of its publication date; suchinformation is subject to change without notice. Infosys acknowledges the proprietary rights of the trademarks and product names of other companies mentioned in this document.