1 TDWI RESEARCH TDWI CHECKLIST REPORT Using and Choosing a Cloud Solution for Data Warehousing By Colin White Sponsored by: tdwi.org
2 JULY 2015 TDWI CHECKLIST REPORT Using and Choosing a Cloud Solution for Data Warehousing By Colin White TABLE OF CONTENTS 2 FOREWORD 3 NUMBER ONE Understand the potential technology and business advantages of the cloud for data warehousing 4 NUMBER TWO Identify projects with pain points and needs that cloud data warehousing can address 4 NUMBER THREE Assess the differences between current cloud data warehouse technologies and services 5 NUMBER FOUR Identify the cloud offering that best fits with project requirements and existing skills and tools 5 NUMBER FIVE Assess the cost and complexity of deploying and maintaining the selected cloud data warehousing solution 6 NUMBER SIX Understand how the cloud data warehousing solution will integrate with the existing Information technology environment 6 NUMBER SEVEN Look for opportunities to use cloud data warehousing to enhance the current data warehouse environment 7 ABOUT OUR SPONSOR 7 ABOUT THE AUTHOR 7 ABOUT TDWI RESEARCH 7 ABOUT THE TDWI CHECKLIST REPORT SERIES 555 S Renton Village Place, Ste. 700 Renton, WA T F E tdwi.org 2015 by TDWI, a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. requests or feedback to Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies.
3 FOREWORD Cloud computing is a hot topic and more organizations are using the cloud for data warehousing. The cloud environment offers a pay-asyou-go, on-demand, and elastic scalability model that can provide significant benefits for both the business and IT. Compared to an on-premises IT environment, cloud computing reduces up-front project costs and enables organizations to scale their applications as required while paying only for the resources they use. The cloud is, therefore, an ideal environment for data warehousing projects given the large data volumes and unpredictable nature of the analytic workloads involved. Determining the data warehousing projects best suited to cloudbased computing is not easy, especially given the significant changes taking place in data warehousing. Companies are now beginning to take advantage of new data sources, advances in business analytics, and enhanced database technologies, and this can require significant changes and upgrades to the data warehouse architecture. Organizations are also anxious that without proper management, the cloud could simply become a way of bypassing IT bottlenecks and budget constraints, which could lead to data warehousing projects being deployed in the cloud that are not well suited to that environment. Data proliferation, data security, poor quality data, and inconsistent analytics results are also concerns if there is no gatekeeper managing access to the cloud environment. It is essential, therefore, that IT and business groups work together to identify, develop, and manage those projects that can gain the most from the business and technological advantages the cloud offers for data warehousing and analytical processing. Another challenge facing organizations moving to the cloud is choosing the right platform for deploying and supporting data warehousing as a service. A wide range of products and services are offered by both traditional and start-up vendors. Selecting the right vendor is difficult and careful evaluation is required before committing to a particular vendor and service. For organizations with existing data warehousing and business intelligence systems, one potential barrier to successful cloud adoption is the complexity of integrating the cloud and on-premises systems so that data can be efficiently and rapidly moved into and out of the cloud. For these companies, the ability to seamlessly integrate cloud services into their existing data warehouse environment is an important and critical requirement. To address these concerns and help readers succeed in their cloud data warehousing projects, this checklist identifies the benefits the cloud offers, offers potential use cases, and presents key criteria for using and choosing a cloud solution for data warehousing. 2 TDWI RESEARCH tdwi.org
4 NUMBER ONE UNDERSTAND THE POTENTIAL TECHNOLOGY AND BUSINESS ADVANTAGES OF THE CLOUD FOR DATA WAREHOUSING Aids data warehouse modernization. Data warehousing is changing rapidly as vendors introduce powerful new technologies that enable companies to process a wide range of new types of data and employ sophisticated analytics to gain greater in-depth insight into their business operations than ever before. To rapidly maximize the business benefits these new technologies offer and keep ahead of competitors, organizations need to deploy new analytic solutions both faster and at a lower cost. Unfortunately, traditional data warehouse approaches can become a bottleneck to achieving this fast time to value, so many organizations are modernizing their data warehouse architectures and development processes to exploit the business benefits of new data management and business analytics technologies. The main goals of modernizing the data warehouse are to: Capture, manage, and analyze data from a broader set of both internal and external data sources, including data from Web operations, social media interactions, sensors, public information databases, and cloud-based operational systems Maintain business-user service levels despite ever-increasing and unpredictable data volumes and analytic workloads Increase business user self-sufficiency and support the rapid growth of mobile device use Provide an investigative computing environment that enables users to rapidly prototype potential analytic solutions against both existing and new types of data without adversely affecting the production data warehousing environment The cloud environment is an important component of a data modernization effort because, as discussed later in more detail, it helps reduce costs, increase flexibility, and speed up deployment. Pay-as-you-go pricing model reduces up-front costs. The challenge facing IT is how to modernize the data warehouse while enabling analytics solutions to be built faster and at a lower cost. New technologies often require upgrades to hardware, operating systems, data management systems, and data integration and analytic tools and applications. These upgrades frequently increase IT costs and create delays in implementing new business analytics applications. Cloud computing can help overcome the costs and delays often incurred in deploying new technologies for prototyping, developing, and operationalizing new analytic solutions. The software-as-aservice cloud model eliminates the need to install and maintain new hardware and software and reduces up-front infrastructure costs by offering pay-as-you-go pricing. Elastic capacity supports changing resource requirements. Cloud processing and storage capacity adjusts to changes in resource and workload needs, which is especially important given the unpredictable nature of analytics workload processing needs and growth. Some cloud-computing vendors also provide additional services that make it easier for data warehousing applications to adapt to changing resource requirements. Provides fast time to value for the business. Although it can be argued that business users should not be concerned about the underlying technologies supporting the analytics they use, there is still a direct correlation between IT benefits and business benefits. For example, if the business user requires a certain type of information, and new technologies enable this information to be made available, modeled, and analyzed rapidly at a low cost, then this is a direct benefit to the business user. The technology benefits of cloud computing to the IT department are also of business value because they enable IT to respond rapidly to the needs of the business. 3 TDWI RESEARCH tdwi.org
5 NUMBER TWO IDENTIFY PROJECTS WITH PAIN POINTS AND NEEDS THAT CLOUD DATA WAREHOUSING CAN ADDRESS NUMBER THREE ASSESS THE DIFFERENCES BETWEEN CURRENT CLOUD DATA WAREHOUSE TECHNOLOGIES AND SERVICES Start with business units that recognize the value of cloud computing. The best place to begin identifying potential projects is in a business unit that recognizes the benefits of cloud computing for data warehousing and that has specific pain points or needs that are not being addressed by IT for priority, resource, budget, technology, or skills reasons. The cloud data warehousing environment has been especially successful in small and midsize companies with limited IT resources and in business units that may already run some of their operational business processes in the cloud. However, not all data warehousing projects are suited to cloud computing, and it is important to clearly identify those projects that lend themselves to a cloud environment. Examples of potential use cases include the following: Standalone reporting and analysis of Web, social media, or sensor data: A cloud-based reporting and analysis system is a cost-effective way of capturing, storing, and analyzing highvolume Web log, clickstream, social media (Twitter, for example), or sensor (such as telemetric devices) data. Analysis and visualization of e-business data and processes: Many organizations (Web retailers, online gaming companies, etc.) run their entire businesses on the Web. The applications involved in e-business are often deployed on hundreds of servers and generate terabytes of data every day. A cloud-based system is ideally suited to analyzing and visualizing all of this data to help managers analyze business operations and performance. Data warehouse augmentation: A cloud-based data refinery or data lake is a cost-effective way of capturing, storing, transforming, and archiving raw data while providing connectivity to an in-house data warehouse for transferring data. The cloud can also be used for investigative computing (i.e., data exploration and discovery) that can be expensive to implement on premises. The cloud can be used for prototyping, development, and production. It is also essential that the complete project management life cycle be considered when evaluating projects. This is important because not all components of the life cycle will necessarily occur in the cloud some projects may be prototyped or developed in the cloud but deployed on premises. Look for vendors that understand the unique requirements of data warehousing. When public cloud services initially became available, the market was defined as three types of service: software-as-a-service (SaaS), platform-as-a-service (PaaS), and infrastructure-as-a-service (IaaS). As cloud use has grown and more vendors have entered the market, this simple classification scheme has become inadequate, and a variety of new schemes have emerged. These newer categorization schemes may be useful for selecting a solution that addresses a specific type of technology, but they say little about implementing a data warehousing project that employs a variety of technologies. When assessing cloud vendors, do not focus on terminology. Instead, identify vendors that understand the unique requirements of data warehousing and can provide a complete solution for implementing data warehousing in the cloud. Evaluate the components of the data life cycle the vendor supports. There are many components to the data life cycle, from data integration and management to data analysis and delivery. The actual components required will vary by project, but it is important to assess the components of the life cycle supported by the vendor or provided by partner organizations. Some vendors offer datawarehousing-as-a-service (DWaaS) or analytics-as-service (AaaS), which combine SaaS, Paas, and IaaS and enable: Capturing and extracting data from trusted sources Managing and controlling data under comprehensive policy and governance guidelines Performing data integration, transformation, analysis, and visualization Managed services for data warehousing is a key requirement. To simplify deployment and administration in the cloud, some vendor offerings include managed services that may involve free or fee-based consulting services or additional capabilities that simplify development and administration. The issue here is that the term managed services is used differently by vendors. Often these services are technology- or platform-specific and are not suited to helping implement, manage, and administer a data warehousing environment. During vendor evaluations, it s important to understand what a vendor means by managed services and whether they provide specific services for deploying and managing a data warehousing environment. 4 TDWI RESEARCH tdwi.org
6 NUMBER FOUR IDENTIFY THE CLOUD OFFERING THAT BEST FITS WITH PROJECT REQUIREMENTS AND EXISTING SKILLS AND TOOLS NUMBER FIVE ASSESS THE COST AND COMPLEXITY OF DEPLOYING AND MAINTAINING THE SELECTED CLOUD DATA WAREHOUSING SOLUTION Identify the best fit to system hardware and software requirements. Hardware processing and storage requirements will be largely dependent on the project s data volumes and analytics workloads. These requirements are often difficult to determine and this is where the elasticity of the cloud is beneficial. The system software required will depend on the data warehousing software used, so be sure you understand what tools you ll need to integrate with your chosen solution. Identify the best fit to data management requirements. Most data warehouses have been implemented with relational technology, but new developments provide open source and non-relational options (there are several Hadoop-based products, for example) that can reduce software costs and improve performance for certain types of workloads. A barrier to success with these new options is that their reliability and development costs are often misunderstood and underestimated. Identify the best fit to data integration tools and applications requirements. Modern data warehousing projects often involve the integration of new data types. This data may be structured or multi-structured and may reside on internal and/or external systems. Multi-structured data, such as Web data, is more difficult to process, making integration more difficult. Data integration is one of the most resource-intensive of data warehousing tasks, and your cloud solution must fit with your data integration strategy and software. Identify the best fit to analytic tools and application requirements. The technology involved in a modern data warehouse may require installing new analytic products. The main objective here is to provide a seamless user interface to data no matter where it resides. Assess the differences between the vendor solution and the in-house IT environment. Additional factors to consider include data import and export, workload management, authorization and security, disaster recovery, and help desk support. The tools used in a cloud solution may also differ from those used in house, and this can impact skills and education requirements for both IT and business users. Careful evaluation is also required of the vendor s pricing model and the managed services provided. Consider the complete application and data life cycle. The total cost of ownership (TCO) is a key metric for assessing the cost and complexity of using a particular data warehousing solution. This metric enables an organization to select the right cloud service and also determine how much can be saved by using a cloud environment. It is important to note that TCO considers more than just the project s hardware and software costs. TCO calculations must consider the complete application and data life cycle, from initial conceptual design to final operation, administration, and support. Importing, exporting, and accessing data can be costly. It is essential to consider the costs of importing, exporting, and accessing data in a cloud service. These costs will depend on where the data resides and its volume. Companies are often surprised by the costs of data movement. Potential data growth and the archiving of less-active data are also important cost factors to consider. Understand the managed services provided. The availability and cost of managed services varies by cloud vendor. Many vendors provide some level of services and/or tools for helping companies implement projects in the cloud. In many situations, however, these services only provide support and administration of the system hardware and software infrastructure. In the case of a data warehousing project, managed services often do not support data-related tasks in areas such as data design, acquisition, transformation, loading, exporting, archiving, and analysis. A data-warehouse-specific service is a distinguishing factor. Support for data-warehouse-specific operations is a key distinguishing feature among cloud vendors. An end-to-end managed cloud service for data warehousing cuts time to value when implementing a new cloud project. The managed service model is especially attractive to companies with limited IT resources because it eliminates many of the standard tasks in an in-house project. 5 TDWI RESEARCH tdwi.org
7 NUMBER SIX UNDERSTAND HOW THE CLOUD DATA WAREHOUSING SOLUTION WILL INTEGRATE WITH THE EXISTING INFORMATION TECHNOLOGY ENVIRONMENT NUMBER SEVEN LOOK FOR OPPORTUNITIES TO USE CLOUD DATA WAREHOUSING TO ENHANCE THE CURRENT DATA WAREHOUSE ENVIRONMENT Integrated data warehouse service simplifies implementation. The main tasks in any data warehousing project involve acquiring and integrating the raw source data, managing and processing the data, and delivering the results to the systems and users that require the processed data. As in an in-house environment, cloud users have the choice of integrating various cloud products and services themselves or using an integrated, end-to-end solution. In the same way that an integrated hardware and software appliance simplifies development, deployment, and administration for on-premises projects, an integrated end-to-end cloud solution for data warehousing offers similar benefits to an appliance approach. Data integration and movement can become a barrier to success. One of the biggest barriers to cloud deployment is data integration and data movement. Ideally, the data should be processed where it resides, but even when the source data already resides in the cloud, it may still have to be moved to a different cloud system for processing in the same way that data is moved from business transaction systems to a data warehouse in an in-house environment. Users need to access both cloud and in-house data. An added complication is that the project may also involve a mixture of in-house and cloud data. In this case, the in-house data may be accessed dynamically (using data virtualization, for example) by a cloud application or staged from the in-house environment to the cloud for use by the cloud application. Again, this is the same as in an in-house environment where data warehouse projects are increasingly using data from a variety of sources in addition to a data warehouse. It is important to realize, however, that data movement in a cloud environment occurs across a public Internet connection and this may have security, performance, and cost implications. Data integration is an important success factor. It is very important in a cloud environment to look for solutions that simplify development, deployment, and administration as well as provide solid and well-performing data integration and data movement capabilities. Enhance the existing data warehouse. When considering modernizing the traditional data warehouse and looking at the use of cloud computing for data warehousing, much of the emphasis is on building new applications and capturing and analyzing new data sources for those applications. It should not be forgotten, however, that data warehouse modernization and cloud computing can also be used to enhance the current data warehouse environment. Apply new technologies to existing projects. New types of data can be used in existing applications to broaden and deepen an organization s understanding of the business factors that affect business operations and processes. New analytics processes enable business analysts and managers to move beyond basic reporting and descriptive (i.e., diagnostic) analytics to exploit advances in predictive analytics and data visualization. The items in this checklist can be used to evaluate the use of cloud-based data warehousing for new projects as well as for enhancing existing ones. Overcome in-house performance and cost issues. At a more general level, cloud computing can be used to reduce the costs and/or improve the performance of existing data warehouse operations. In some cases, cloud data warehousing can even make possible what cannot be achieved with an in-house system. Increasing data volumes, aging hardware, rising software costs, and loss of skills due to staff turnover can all affect the ability to maintain existing service levels and manage costs. Equally, in many organizations, administration and maintenance costs are steadily increasing and becoming a higher percentage of the total IT budget. A cloud service provides an elastic computing environment that automatically adjusts to changing resource requirements, and outsourcing resource-constrained projects to cloud-based data warehousing can alleviate cost and service-level issues. Enable the organization to focus on data rather than technology. Cloud computing eliminates the obstacles and pains of managing infrastructure, enabling your organization to focus on using its data rather than on dealing with the technology. The inclusion of a managed service by the cloud vendor is also a key success factor in using the cloud to reduce the costs and improve the performance of existing data warehouse operations. 6 TDWI RESEARCH tdwi.org
8 ABOUT OUR SPONSOR ABOUT TDWI RESEARCH Snowflake Computing, the cloud data warehousing company, has reinvented the data warehouse for the cloud and today s data. The Snowflake Elastic Data Warehouse is built from the cloud up with a patent-pending new architecture that delivers the power of data warehousing, the flexibility of big data platforms, and the elasticity of the cloud at a fraction of the cost of traditional solutions. The company is backed by leading investors including Altimeter Capital, Redpoint Ventures, Sutter Hill Ventures, and Wing Ventures. Snowflake is headquartered in Silicon Valley and can be found online at snowflake.net. TDWI Research provides research and advice for data professionals worldwide. TDWI Research focuses exclusively on business intelligence, data warehousing, and analytics issues and teams up with industry thought leaders and practitioners to deliver both broad and deep understanding of the business and technical challenges surrounding the deployment and use of business intelligence, data warehousing, and analytics solutions. TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor organizations. ABOUT THE TDWI CHECKLIST REPORT SERIES ABOUT THE AUTHOR Colin White is the founder of BI Research and president of DataBase Associates Inc. As an analyst, educator, and writer he is well known for his in-depth knowledge of data management, information integration, and business intelligence technologies and how they can be used for building the smart and agile business. With many years of IT experience, he has consulted for dozens of companies throughout the world and is a frequent speaker at leading IT events. Colin has written numerous articles and papers on deploying new and evolving information technologies for business benefit, and is a regular contributor to several leading print- and Web-based industry journals. For 10 years he was the conference chair of the Shared Insights Portals, Content Management, and Collaboration conference. He was also the conference director of the DB/EXPO trade show and conference. TDWI Checklist Reports provide an overview of success factors for a specific project in business intelligence, data warehousing, or a related data management discipline. Companies may use this overview to get organized before beginning a project or to identify goals and areas of improvement for current projects. 7 TDWI RESEARCH tdwi.org