Analytics Outsourcing: The Hertz Experience 4 Hugh J. Watson, Barbara H. Wixom, and Thomas C. Pagano
|
|
|
- Harvey Sparks
- 10 years ago
- Views:
Transcription
1 EXCLUSIVELY FOR TDWI PREMIUM MEMBERS volume 18 number 4 The leading publication for business intelligence and data warehousing professionals Analytics Outsourcing: The Hertz Experience 4 Hugh J. Watson, Barbara H. Wixom, and Thomas C. Pagano Three Best Practices for IT and Business Users 8 in Big Data Projects Fern Halper Mainframes: The (Other) Elephant in the 10 Big Data Room Jorge A. Lopez Filling the Demand for Data Scientists: 13 A Five-Point Plan John Santaferraro Marketing IT to BI Users In-House: 19 The Importance of Small Talk Max T. Russell BI Training: Closing the Business Analytics 22 Gap at UT Austin Linda L. Briggs Overcoming Data Challenges with Virtualization 25 Nilesh Bhatti Big Data Management Platforms: Architecting 32 Heterogeneous Solutions Ravi Chandran BI Experts Perspective: Aligning Business 39 Strategy with BI Capabilities Alicia Acebo, Jim Gallo, Jane Griffin, and Brian Valeyko Implementing an Enterprise Data 46 Quality Strategy Nancy Couture Data Variety: The Spice of Insight 53 David Stodder
2 TDWI TDWI ONSITE ONSITE EDUCATION EDUCATION BI Training Solutions: As Close as Your Conference Room TDWI Onsite Education brings our vendor-neutral BI and DW training to companies TDWI worldwide, Onsite tailored Education to meet brings the specific our vendor-neutral needs of your BI organization. and DW training From to fundamental companies worldwide, courses to advanced tailored to techniques, meet the specific plus prep needs courses of your and organization. exams for the From Certified fundamental Business courses Intelligence to advanced Professional techniques, (CBIP) designation we plus prep courses can and bring exams the for training the Certified you need Business directly Intelligence to your team Professional in your own (CBIP) conference designation we room. can bring the training you need directly to your team in your own conference room. YOUR TEAM, OUR INSTRUCTORS, YOUR LOCATION. YOUR TEAM, OUR INSTRUCTORS, YOUR LOCATION. Contact Yvonne Baho at Contact or Yvonne Baho for at more information. or for more information. tdwi.org/onsite tdwi.org/onsite
3 volume 18 number 4 3 From the Editor 4 Analytics Outsourcing: The Hertz Experience Hugh J. Watson, Barbara H. Wixom, and Thomas C. Pagano 8 Three Best Practices for IT and Business Users in Big Data Projects Fern Halper 10 Mainframes: The (Other) Elephant in the Big Data Room Jorge A. Lopez 13 Filling the Demand for Data Scientists: A Five-Point Plan John Santaferraro 19 Marketing IT to BI Users In-House: The Importance of Small Talk Max T. Russell 22 BI Training: Closing the Business Analytics Gap at UT Austin Linda L. Briggs 25 Overcoming Data Challenges with Virtualization Nilesh Bhatti 32 Big Data Management Platforms: Architecting Heterogeneous Solutions Ravi Chandran 39 BI Experts Perspective: Aligning Business Strategy with BI Capabilities Alicia Acebo, Jim Gallo, Jane Griffin, and Brian Valeyko 46 Implementing an Enterprise Data Quality Strategy Nancy Couture 52 Instructions for Authors 53 Data Variety: The Spice of Insight David Stodder 56 BI StatShots BUSINESS INTELLIGENCE Journal vol. 18, No. 4 1
4 volume 18 number 4 tdwi.org EDITORIAL BOARD Editorial Director James E. Powell, TDWI Managing Editor Jennifer Agee, TDWI President Director, Online Products & Marketing Senior Graphic Designer rich Zbylut Melissa Parrish Bill Grimmer Senior Editor Hugh J. Watson, TDWI Fellow, University of Georgia Director, TDWI Research Philip Russom, TDWI Director, TDWI Research David Stodder, TDWI Director, TDWI Research Fern Halper, TDWI Associate Editors Barry Devlin, 9sight Consulting Mark Frolick, Xavier University Troy Hiltbrand, Idaho National Laboratory Claudia Imhoff, TDWI Fellow, Intelligent Solutions, Inc. Barbara Haley Wixom, TDWI Fellow, University of Virginia Advertising Sales: Scott Geissler, [email protected], List Rentals: 1105 Media, Inc., offers numerous , postal, and telemarketing lists targeting business intelligence and data warehousing professionals, as well as other high-tech markets. For more information, please contact our list manager, Merit Direct, at or Reprints: For single article reprints (in minimum quantities of ), e-prints, plaques, and posters, contact: PARS International, phone: , [email protected], Copyright 2013 by 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. Mail requests to Permissions Editor, c/o Business Intelligence Journal, 1201 Monster Road SW, Suite 250, Renton, WA The information in this journal has not undergone any formal testing by 1105 Media, Inc., and is distributed without any warranty expressed or implied. Implementation or use of any information contained herein is the reader s sole responsibility. While the information has been reviewed for accuracy, there is no guarantee that the same or similar results may be achieved in all environments. Technical inaccuracies may result from printing errors, new developments in the industry, and/or changes or enhancements to either hardware or software components. Printed in the USA. [ISSN ] Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies. President & Chief Executive Officer Senior Vice President & Chief Financial Officer Executive Vice President Vice President, Finance & Administration Vice President, Information Technology & Application Development Vice President, Event Operations Chairman of the Board neal Vitale Richard Vitale Michael J. Valenti Christopher M. Coates Erik A. Lindgren David F. Myers Jeffrey S. Klein Reaching the Staff Staff may be reached via , telephone, fax, or mail. To any member of the staff, please use the following form: [email protected] Renton office (weekdays, 8:30 a.m. 5:00 p.m. PT) Telephone ; Fax Monster Road SW, Suite 250, Renton, WA Corporate office (weekdays, 8:30 a.m. 5:30 p.m. PT) Telephone ; Fax Oakdale Avenue, Suite 101, Chatsworth, CA Business Intelligence Journal (article submission inquiries) Jennifer Agee [email protected] tdwi.org/journalsubmissions TDWI Premium Membership (inquiries & changes of address) [email protected] tdwi.org/premiummembership Fax: BUSINESS INTELLIGENCE Journal vol. 18, No. 4
5 From the Editor Snow White s Seven Dwarfs happily whistled while they worked. These days, we whistle in amazement that all their work could be done by such a small team and wonder where we can find the skilled talent we need to maintain or expand our BI initiatives. Senior editor Hugh J. Watson, Barbara Wixom, and Thomas Pagano look at how Hertz used outsourcing to solve their resource problems. The authors describe what s driving outsourcing of BI staff and what projects the auto rental firm chose to outsource. John Santaferraro presents a five-point plan for filling your open data scientist positions. He touches on incentive programs, technology infrastructure, and the value of an enterprisewide culture of analytics. Linda Briggs describes a new program at the University of Texas at Austin that targets the business analytics gap plaguing many organizations. Perhaps your organization doesn t need more resources but rather needs to use its existing resources more effectively. Director of TDWI Research for advanced analytics Fern Halper looks at three best practices IT and business users can follow to work better together and achieve success in big data projects. Max T. Russell explains how IT professionals can build a stronger relationship with their user base by learning the art of small talk a simple way to build trust and respect and help the IT team play a bigger, more important role in an organization s BI efforts. Having the right tools and technology may also reduce the stress on resources. Nilesh Bhatti discusses the benefits and challenges of implementing data virtualization to help manage increasing data volumes. Organizations must be sure the data they manage is accurate, complete, and up to date. Nancy Couture looks at how best to implement an enterprise data quality strategy. TDWI s David Stodder looks at why organizations should harness the wide variety of data they collect. Getting work done isn t just a matter of human resources. Jorge Lopez explains how you get more done by leveraging mainframe data with Hadoop. Although they seem like an unlikely duo, Lopez offers some practical Hadoop use cases for mainframe users. Organizations can also run smoother when the business strategy aligns with BI s capabilities, which is the subject of our BI Experts Perspective column. We provide advice from Alicia Acebo, Jim Gallo, Jane Griffin, and Brian Valeyko. Are you working smarter? Do you whistle while you work? Let us know. We welcome your feedback and comments; please send them to [email protected]. BUSINESS INTELLIGENCE Journal vol. 18, No. 4 3
6 Analytics Outsourcing Analytics Outsourcing: The Hertz Experience Hugh J. Watson, Barbara H. Wixom, and Thomas C. Pagano Analytics is becoming increasingly important for many organizations. To address their need for advanced analytics, some firms are using outside organizations to help provide analytics expertise and capabilities. Hugh J. Watson is a Professor of MIS and a C. Herman and Mary Virginia Terry Chair of Business Administration in the Terry College of Business at the University of Georgia. [email protected] Barbara H. Wixom is a principal research scientist in the MIT Sloan Center for Information Systems Research at MIT. [email protected] Thomas C. Pagano is director, business information and data warehouse systems, for The Hertz Corporation. [email protected] We use the term analytics outsourcing in this article to refer to the use of any external organization to provide parts of or an entire analytics solution. Analytics outsourcing comes in a variety of forms. For example, your enterprise may hire a consulting firm to help implement a dashboard or scorecard system. You may choose a firm that provides fraud detection through software-as-aservice. You might select a third-party supplier of data that supports your company s CRM application. Outsourcing has been around for many years and is now a commonly accepted business practice. It started with less complex business processes such as call centers and has moved on to more knowledge-intensive processes such as analytics. This market is experiencing significant growth. Enterprises will spend an estimated $46.9 billion this year on analytics outsourcing, and IDC projects that spending will grow to $70.8 billion by 2016 (Zaidi and Dialani, 2013). We are in the initial stages of a case study with Hertz, a leader in the rental car industry, to understand their use of analytics. As you would expect, Hertz has long used analytics for pricing rental cars, forecasting demand, designing marketing campaigns, and so on. One current initiative is the use of analytics to better understand customers, communicate with them better through real-time analytics, and increase customer loyalty. 4 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
7 Analytics Outsourcing In our interviews with key people at Hertz, we found many instances where the company outsourced its analytics, so we asked them to help us understand and describe the company s use of analytics outsourcing. First, however, let s consider some of the reasons why firms are turning to analytics outsourcing. The Drivers The reasons vary with the company and the application, of course, but they most often include a combination of the following: Competitive advantage. An outside firm may be able to help develop applications that create a strategic advantage. It is quite likely, though, that the outside firm will be willing to provide the same services to competitors, thus negating the advantage. In the long run, true competitive advantage can normally be achieved only if the company does most, if not all, of the analytics in-house. Organizational agility. There may be changes in market conditions, the emergence of new competitors, or changes in technology that require a fast response. An outside analytics provider may be able to act quickly. Core competency. A firm may choose to outsource those activities, such as analytics, that are not considered to be critical to success. Faster development. If a company doesn t have the hardware, software, or specialized skills needed for a particular project, an outside firm may be able to provide a solution more quickly. However, once the decision is made to invest in the required resources to do the project in-house, it may be possible to respond more quickly with changes or enhancements to the application later on or to develop new applications. Improved quality. Because they work in specific analytics areas and depend on delivering successful implementations, outsourcing firms may be able to provide higher-quality solutions than in-house personnel. Specialized skills. Analytics may require specific skills that are not available in-house, and it may be better to contract with a firm that employs people with the needed skills and experience. Analytics outsourcing firms have a high level of specific domain expertise because of their work with a large number of clients. Cross-industry experience. Some analytics outsourcing firms work across industries, and based on this experience, may be able to bring new approaches and technologies to a firm in a particular industry. Cost. Analytics often requires specialized hardware, software, and skills, and it may be less expensive to outsource these to a firm that is able to spread the costs over multiple companies. In the long run, however, the costs may favor an in-house implementation. Hertz Hertz is a leader in the rental car business with approximately 10,400 corporate, licensee, and franchise locations in North America, Europe, Latin America, Asia, Australia, Africa, the Middle East, and New Zealand. Hertz is the number one airport rental car brand in the U.S. and operates at 111 major airports in Europe. With its recent acquisitions of Dollar and Thrifty, Hertz offers rental cars across a variety of price points. Analytics at Hertz Hertz has a BI and data warehousing group that maintains the company s 2 TB data warehouse. The group includes developers in the U.S. and Europe who are responsible for queries, reporting, and multidimensional modeling. It also has BI analysts who are responsible for determining information requirements. Business units have additional analysts with deep domain knowledge; the units also employ a variety of analytical specialists. Many of the business units rely on analytics and use outside firms for analytical services. Consider the following examples, the external services used, and the reasons why. Data Warehousing A data warehouse has been in place at Hertz for many years, but as the amount of data, number of users, and the complexity of the analytics grew, it was no longer BUSINESS INTELLIGENCE Journal vol. 18, No. 4 5
8 Analytics Outsourcing meeting organizational needs. After an evaluation process, Hertz selected Teradata for its new platform along with Aprimo (a Teradata product offering) for CRM applications. Teradata s professional services staff was brought in to help with the customization of Teradata s logical data model for the transportation industry, create connections to query tools, and develop a semantic layer between Teradata and the tools. Teradata professional services were used because Hertz did not have the requisite skills in-house, but the long-term plan is to reduce its dependency by developing the needed expertise internally. Other third parties also contributed to the selection and rollout of the Teradata and Aprimo products. Gartner was consulted on the strategic direction for data warehousing, and LoyaltyOne (discussed later) contributed to the design of the data model Hertz implemented. Teradata s professional services were selected because of their in-depth knowledge of the Teradata and Aprimo products and their experience within and across different industries. The desire to decrease the use of the services over time is primarily due to cost considerations. Revenue Management and Pricing A rental car is a perishable good, much like an airline seat or a hotel room, in that it generates value only if it is used. In other words, a rental car sitting on the lot is not generating any revenue. A key to success in this industry is dynamically pricing cars so that revenues and profits are optimized. Pricing is a combinatorial, challenging problem because of the large number of locations, types of cars, possible rental dates, and other factors that affect pricing decisions. It is also an area where analytics has been applied for many years. Pricing systems operate using a combination of inventory data (what cars are available), demand forecasts (what cars are likely to be demanded), and mathematical programming techniques (what prices are optimal). Hertz handles pricing in-house but partners with an industry leader in revenue management systems to assist in developing and maintaining its various models. Working with this industry-leading firm allows Hertz to leverage the specialized skills, experience, and insight they have gained as they operate across industries. Customer Relationship Management For over 20 years, Hertz has used Brierley + Partners to drive its CRM efforts. Brierley specializes in loyalty programs, customer-centric marketing services, and analytics and customer research services. It also provides production/fulfillment services such as loyalty program design, customer relationship management strategy, technology, and creative services. Hertz s CRM relies on Brierley s production team, creative staff, technology people, data quality experts, and analytics specialists. Brierley s employees are essentially a part or extension of Hertz s CRM team. Hertz works with Brierley to set the CRM strategic direction and goals and works closely with Brierley daily on the execution of its CRM initiatives. For example, Brierley now uses the customer-centric Teradata data warehouse (Brierley used to maintain a similar data mart) to generate dashboards that show key metrics; the firm also performs customer analytics such as market segmentation analysis. Brierley works on Hertz s customer rewards program, including research and advice about issues such as whether a customer s points should be transferable to anyone else (they now are). Hertz works with Brierley because of the advanced skills and expertise of its people. For example, some of its analytics staff have Ph.D.s and years of CRM experience. Another reason is the ability to leverage Brierley s state-of-the-art technology. Hertz benefits from Brierley s expertise and their knowledge of the best practices of customers across many firms and industries. Finally, Brierley has dedicated resources to respond quickly to any issues or problems that arise. Hertz also uses LoyaltyOne to help with its CRM initiatives. LoyaltyOne specializes in customer insight and strategy, loyalty and marketing programs, and managing the customer experience. Hertz has worked with Loyalty- One in a variety of ways over the years, but the recent 6 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
9 Analytics Outsourcing emphasis is on developing the technical requirements (working with Teradata and Brierley) to ensure Hertz will successfully execute its new CRM initiatives. Hertz works with LoyaltyOne because of its specific skills and experience. The firm was helpful in evaluating Hertz s data architecture and designing the customercentric data model for the data warehouse. Its people provide a valuable combination of technical and business skills and provide great ideas about what can be done with existing data. Conclusion Like many firms, Hertz uses a wide variety of analytics services providers. In many instances, the relationship has existed for many years. In the case of Brierley and LoyaltyOne, the major reasons for their continuing use include their specialized skills and experience, their ability to keep up with and provide the latest technology, and their knowledge of best practices across different firms and industries. Of course, good performance is also key. If outside providers fail to perform satisfactorily, they are unlikely to stay in business for long. Analytics outsourcers need to be competitive to keep their client base. In the case of Teradata, specific skills were needed to get started and will still be needed as new source systems and subject areas are added to the warehouse and the technology evolves. For cost reasons, most of the skills needed for operating the warehouse will be in-house. The IDC study projects a continuing rise in analytics outsourcing. It is difficult to forecast specific growth numbers, but based on the experiences at Hertz and other companies, it will be significant. Reference Zaidi, Ali, and Mukesh Dialani [2013]. Worldwide Business Analytics Services Forecast, IDC Report, April. BUSINESS INTELLIGENCE Journal vol. 18, No. 4 7
10 Big Data Projects Three Best Practices for IT and Business Users in Big Data Projects Fern Halper Fern Halper is director of TDWI Research for advanced analytics, focusing on predictive analytics, social media analysis, text analytics, cloud computing, and other big data analytics approaches. TDWI recently built a big data maturity model and benchmark assessment tool. The goal of the model is to provide guidance for IT and business professionals on their big data journeys. The model provides a framework for companies to understand where they are, where they ve been, and where they still need to go on their big data deployments. The model itself consists of five dimensions: organization, infrastructure, data management, analytics, and governance. A great feature of the TDWI Big Data Maturity Model is the interactive benchmark assessment. At the end of the benchmark survey, you can quantify the maturity of your deployment in an objective way, understand your progress, and identify what it will take to get to the next maturity level. We have identified five levels of maturity in the big data model: nascent, pre-adoption, early adoption, corporate adoption, and mature/visionary. Of course, organizations can be at different stages of maturity in each of the five dimensions, and most are. There is also a chasm that companies need to cross to get from early adoption to corporate adoption. You can take a look at the TDWI Big Data Maturity Model assessment tool and guide at tdwi.org/bigdatamaturity. As part of our research for the model, my co-author, Krish Krishnan, and I spoke to quite a few companies. Some are Internet-based, while others are traditional companies. They are at different stages of big data maturity. For instance, some are using relatively advanced analytics on huge amounts of structured data. Others are building out Hadoop clusters as a means of making high volumes of data storage more cost effective. Still others are primarily content-based businesses that are building out a big data 8 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
11 Big Data Projects infrastructure and BI practice to support it. Only a small number could be considered mature or visionary. In our research, we discovered how important it is for IT and the business to work together to achieve success in their big data efforts. This principle was cited by a number of organizations as a key success factor, especially in the enterprise, but also for smaller, Internet companies as they start to grow. Here are just a few of the insights we heard related to business and IT working together. Insight #1: Business needs to help identify the big data opportunities In a TDWI survey conducted at a recent conference, we asked respondents about some of the challenges they faced around big data. The top response was identifying the right problem to solve. Many of the companies we spoke to echoed this. They stated that in order for big data and big data analytics to be widely accepted, an enterprise must find a problem that is worth solving. Chances are that problem is going to be articulated by the business. Therefore, as the seeds of a big data project start to germinate, it s important to get a business person involved so you can get their input and so they can help you articulate the problem in a way that business users will understand. This involves building relationships with the business. Insight #2: Funding must move out of IT for big data success In our discussions with companies, we asked several questions about funding. Some companies were at the experimentation and proof-of-concept (POC) phase and were funding the project out of an IT organization. However, those that were more mature stressed the importance of getting funding from outside of the CIO organization and moving it to a marketing or sales organization, for instance, so that the business has a vested stake in the game. One end user related a story about sitting down with executives (one at a time) and showing them what was possible with big data analytics. He was looking for someone who was asking questions about the data and analysis because this indicated that the executive was serious about big data. Of course, this can involve a lot of show and tell. The key is to demonstrate some wins that get people excited. Executives then bring others on. It is difficult for IT to sustain a big data effort alone. Insight #3: Data sharing is key In order for a company to build a big data ecosystem that drives business action, organizations have to share data. Collaboration is necessary for big data projects. Of course, there are many considerations involved in assembling big data, including people, processes, and technologies. However, sometimes companies get to a certain point with their big data programs where they have assembled large amounts data from across the company and then the ax falls because of company politics. Some companies pointed to the need for a chief data officer someone responsible for data usage and governance at the corporate level. In our research, we discovered how important it is for IT and the business to work together to achieve success in their big data efforts. Others stressed the need for a well-organized data governance program. As one person put it, Big data is a liability waiting to happen. Whose data was it? Whose data is it? Where is it going? How long will it last? These are important questions that people aren t asking. This is a clear case where business and IT need to work together to ensure that data can be shared as well as to put the policies and practices in place for this sharing to occur. Assembling a governance team should start early in the big data process, even if you re an Internet company that is more concerned about getting a product or service out the door than about governing the data that feeds the product or service. BUSINESS INTELLIGENCE Journal vol. 18, No. 4 9
12 mainframes and Big Data Mainframes: The (Other) Elephant in the Big Data Room Jorge A. Lopez Abstract With up to 80 percent of data originating on your mainframe, you can t ignore big data trends. In this article, we recommend steps to get started using Hadoop to leverage your mainframe data. Jorge A. Lopez is the director of product marketing for Syncsort. [email protected] At first sight, mainframes and Hadoop might seem like the most unlikely duo. One appeared in the late 1950s even before the PC while the other (to this day) hasn t reached its teenage years but is already bragging about managing big data. Much has been said and written about the death of mainframe computers, but the truth is, some of the largest organizations (think of the top telcos, retailers, insurance, healthcare, and financial organizations of the world) still rely on mainframes for mission-critical applications. When talking to these organizations, it s not unusual to hear that up to 80 percent of their corporate data originates on the mainframe. That is some serious big data, and organizations cannot afford to neglect it! That s why they are making the mainframe a core piece of their big data strategies. How can such organizations get started with Hadoop? What are some practical Hadoop use cases for mainframe users? Paving the Road from Mainframe to Hadoop These four practical steps can help you and your organization start off on the right foot to leverage mainframe data with Hadoop. 10 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
13 mainframes and Big Data Step 1: Build knowledge and sensitivity about mainframe and Hadoop This may sound like the most obvious step, but you would be surprised at how often this aspect is overlooked. Despite the importance of both Hadoop and mainframe technologies, it s common to find that their advocates know little or nothing about each other. Therefore, before you get started, it s important to understand the intricacies behind these two technologies and the teams that support them. Some aspects to cover include: Understand the nature of existing applications running on mainframes versus Hadoop. Mainframe applications typically include a combination of batch as well as transactional processing (OLTP). Hadoop applications are mostly batch oriented but are more analytical in purpose. The most important difference is that mainframe applications are the most mission-critical and need to accurately and reliably operate 24/7. pedigree. Therefore, any sensible approach to leverage mainframe data in Hadoop needs to be examined carefully. Decisions must be made regarding what data is fair play and about a security infrastructure that guarantees secure data access and storage. Understand your mainframe SLAs and costs. Storing and processing data in the mainframe is expensive. The costs are relatively easy to quantify because mainframes are billed in terms of CPU utilization. This is important because before mainframe data can be leveraged, it needs to be moved and transformed. Depending on the amount of data and required frequency of data loads, moving data alone can be costly. For instance, a major bank won t be happy if customers have to wait longer at ATMs because IT is moving or copying a terabyte of data. Meticulous load scheduling and capacity planning can go a long way to avoid issues on this front. Identify critical data generated by mainframes. Big data and Hadoop initiatives may center around capturing and processing unstructured and semi-structured data coming from Web logs, social media, and other sources that is, the data that influences or leads to a transaction. Mainframes then process and capture the transactions, also generating critical data that provides reference and valuable context to big data. Similarly, the team will need to look at where the data transformations take place. Factors such as the amount of data that you need to transfer, SLAs, and mainframe CPU utilization are part of the decision, which usually comes down to a compromise between performance and costs. In addition, be aware that any additional thirdparty software on the mainframe will use more CPU cycles and thus increase your annual mainframe costs. Address security concerns. There s a reason why the mainframe is still around: it is quite possibly the fastest, most reliable data processing platform. There s more: it s highly secure. That s the kind of system you need when processing health records or financial transactions. Therefore, mainframe developers are very keen about the security, confidentiality, reliability, and integrity of transactions. On the other hand, Hadoop developers might be more interested in agility, finding relevant trends, right-time analytics, and scalability. After all, missing a single tweet or a Web click might not be such a big deal, but the slightest error in a financial transaction can send you and your CEO to jail. Needless to say, mainframe administrators will be very reluctant to allow access or even to install third-party software without a mainframe Step 2: Be clear about the business and IT objectives In most cases, mainframes are not going away. It s not difficult to see why after considering Step 1. Instead, Hadoop presents a tremendous opportunity to uncover valuable business insights from otherwise unused (or poorly used) data. By doing so, you ll be able to complement the capabilities of your mainframe with increased business agility, virtually unlimited reporting, and new analytics opportunities. Why not? You can maximize the return on your mainframe investment by selectively choosing the right home for the right workload (a phrase I learned from Shawn Rogers a recognized analyst and long-time contributor to TDWI). After all, not all of your data deserves the BUSINESS INTELLIGENCE Journal vol. 18, No. 4 11
14 mainframes and Big Data first-class treatment, right? More important, by offloading some of the less critical data and batch processing from your mainframe into Hadoop, you can lower costs, provide better access to mainframe data, and deliver better service for all your mainframe users. My advice: be clear about the value and expertise that mainframe admins and developers bring to the table. You will need them on your side in order to succeed. Step 3: Create a road map that gradually builds the skills of your organization Leveraging mainframe data in Hadoop is not easy. I know this is not what you want to hear, but that s the reality and you should probably beware of any person, website, or individual that tells you otherwise. Granted, some approaches are better than others and will make it easier, but it will never, totally and simply, be easy. Therefore, it s important to create a road map that allows you to gradually build the required skills within your staff, minimize risk, and capitalize on previous successes to gain more support. Here is a high-level road map that makes sense for many organizations: Create copies of selected mainframe data sets in HDFS. Combine this data with other data sources to enrich existing analysis and create new reports, dashboards, and visualizations. The main objective here is to uncover new insights and improve decision making. only in reducing mainframe costs, but also by actually preserving mainframe capacity for more critical workloads. In the end, this is about making the best use of your available resources so only specific workloads get the mainframe VIP treatment. Step 4: Create a cross-organization team and involve all stakeholders early in the game I recently met a mainframe customer who proudly told me how he shut down a big data initiative in less than 10 minutes due to security concerns. The Hadoop team had spent months working on the project without involving the mainframe group. Now, looking for final approval, they needed mainframe s blessing. There s not much more to say about this other than you really need to bring key mainframe and Hadoop stakeholders together. Otherwise, you can jeopardize your big data strategy. These are just some of the key, initial steps you need to follow when embarking on a big data strategy that include mainframes and Hadoop. If you have a mainframe, chances are a Hadoop initiative is closer than you think. Therefore, you may want to start implementing these steps, especially Step 1, sooner rather than later. Migrate mainframe data to HDFS. Once you ve built a certain level of skills and are familiar with the challenges, you can start, not by copying, but by migrating selected mainframe data to HDFS. Legacy data is usually a good initial target; then you can move on to other data sets. The benefits at this stage add up, so in addition to better insights, you can contain or reduce costs by offloading data from the mainframe into Hadoop. Offload batch processing. A large portion of mainframe processes involve sorts, copies, reporting, and other batch operations. That sounds like the ideal workload for Hadoop, which means offloading these processes to Hadoop can actually help on many fronts not 12 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
15 Data Scientists Filling the Demand for Data Scientists: A Five-Point Plan John Santaferraro John Santaferraro is VP of solutions in the ParAccel Platform Group at Actian. He has 18 years of experience in big data, analytics, and business intelligence. [email protected] Abstract Big data offers enterprises big opportunities. To make sense of all the information being produced and collected, enterprises have been turning to data scientists, those geeks with proficiency in parallel processing, MapReduce, petabyte-sized NoSQL databases, machine learning, and advanced statistics. There s just one hitch: a McKinsey report suggests that by 2018, a shortage of data scientists will emerge, ranging from 140,000 to 190,000 in the U.S. alone. How can enterprises cost-effectively prepare for their datadriven future in the face of this shortage? The solution is an internal program that provides the opportunity for existing data analysts, BI analysts, and business analysts to acquire the skills they need to become big data analysts. These big data analysts then perform the predictive and prescriptive analysis and discovery needed to innovate and compete effectively. Along with developing education programs, companies need to consider providing incentives for existing analysts to participate, reorganizing their analyst community to support big data analysts, deploying technology infrastructure to support analytics, and fostering an enterprisewide culture of analytics. Introduction The 2011 film Moneyball (based on the 2003 book by Michael Lewis) focuses on Oakland Athletics general manager Billy Beane s ultimate success in building a competitive professional baseball team using data instead of tired truisms and the instincts associated with years of baseball experience. For me, the real hero of the film is Peter Brand, the research geek and statistician (based on real-life baseball scout and executive Paul DePodesta) who is elevated from backroom obscurity to baseball BUSINESS INTELLIGENCE Journal vol. 18, No. 4 13
16 Data Scientists celebrity as Beane s assistant GM because of the success of his analysis. The story of Peter Brand is being repeated in company after company as the executive suite looks to data scientists to help them obtain the benefits promised by big data. According to the Harvard Business Review, Thousands of data scientists are already working at both start-ups and well-established companies. Their sudden appearance on the business scene reflects the fact that companies are now wrestling with information that comes in varieties and volumes never encountered before (Davenport and Patil, 2011). Data scientists are cropping up in companies of every size and in every industry and using predictive analysis and data mining for competitive advantage. Data scientists are practitioners of data science, which, according to Data Science: An Introduction, is an advanced discipline, requiring proficiency in parallel processing, map-reduce computing, petabyte-sized nosql databases, machine learning, and advanced statistics (Andrus and Cook, n.d.). Data scientists were originally found in only a handful of enterprises (such as LinkedIn, Twitter, and Facebook) that needed to mine their massive social media streams, as well as companies such as Netflix and Amazon that wanted to leverage predictive analysis to recommend movies and books to their customers. Today, however, data scientists are cropping up in companies of every size and in every industry and using predictive analysis and data mining for competitive advantage. Data Analysis Reveals Data Scientists to be in Short Supply Like baseball, the business world will be changed by the ascendency of data, but there s a problem with the sudden demand for data scientists. There aren t enough of them and the situation is getting worse. A McKinsey report suggests that by 2018, a shortage of data scientists will emerge, ranging from 140,000 to 190,000 in the U.S. alone (Manyika, et al, 2011). As the competition to hire these experts increases, so will their salaries. In the face of this shortfall and increasing cost pressure, should you take out an insurance policy by hiring data scientists now even if you aren t ready for them, just to make sure you won t be left behind? Possibly, but probably not. This is a very expensive strategy that may deliver benefits to your company in the long term, but other solutions are available. There are several steps you can take that will provide the data expertise you need when you need it and drive your transformation into an analytics-driven company. Businesses Already Have Analysts The first step is to look at your existing legions of analysts to identify those with the background, talent, and desire to increase their skill set and fill the data analytics positions you will eventually have. Let s review the analysts most companies are already hiring to see which ones could take on the analytics role if given the right training, incentives, and tools. Data Analysts Data analysts understand where data comes from and how it can be made useful for business users. They focus on capturing, understanding, cleansing, transforming, modeling, and loading data. They may also integrate multiple data sources into a single repository, such as Hadoop or a data warehouse. Most data analysts have taken computer science courses and have a solid grounding in mathematics, possibly including statistics courses. It makes sense to look at the pool of data analysts for those who may want to expand their skill set to be able to use analytics. 14 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
17 Data Scientists BI Analysts Once data has been moved into data warehouses or into data marts by data analysts, business intelligence (BI) analysts perform the next level of data preparation. Although BI analysts generally have a solid understanding of data sources and types, their focus is on using BI tools such as MicroStrategy, Tableau, and QlikView to present information in a more user-friendly and visual way to make it accessible to business users. They typically generate static reports or create interactive reports or dashboards that let users drill down into the details. BI analysts are often thought of as report writers or dashboard builders, but many have a basic understanding of analytics and may have a desire to get additional training and expand their expertise in more advanced analytics. Business Analysts Business analysts generally don t have as deep an understanding of data sources and types as data and BI analysts, but they are able to transform the information through reports and dashboards into actionable insights for the business. For example, a supply chain business analyst uses the data to optimize supply chain processes, from sourcing materials for manufacturing through product distribution and point of sale. Customer business analysts understand the mathematics involved in segmentation, affinity, and optimization of an offer, and use data to increase the number of conversions and retain the most profitable customers. Because business analysts most directly tie the data to business insight and likely have already dabbled in data analytics they are particularly appropriate to be considered for expanding their roles. Understanding the Role of the Big Data Analyst All three of these analyst groups have generally been more comfortable with descriptive analysis, that is, with describing what has happened and what is happening. What businesses need today, however, is the ability to discover new patterns and anomalies, predict scenarios, and prevent negative business impact. Predictive analysis examines what will happen: who will respond to a specific offer, under what conditions are customers most likely to leave for a competitor, and what are the characteristics of the people most likely to commit fraud? Prescriptive analysis looks at recommendations for the next best offer or action. For example, what is the best offer to make in order to retain customers while maximizing margins? What price point is necessary to double or triple sales? If a supply chain source is disrupted by a natural disaster, what is the next best source for getting the right items to the right place at the right time for the least cost? Discovery the ability to discern something that no one else has seen or wondered about is also an essential capability. What are some new trends impacting my industry? Why are attitudes about my product or business changing? What are the latest fraud techniques hackers are using to break into networks, and what parts of my network are most at risk? It makes sense to look at the pool of data analysts for those who may want to expand their skill set to be able to use analytics. Advanced analytics, combined with new data sources and types, opens the door for a new crop of analysts big data analysts. This new breed of analyst uses advanced analytic techniques on large and diverse data sets to uncover hidden patterns, unknown correlations, and other useful information. The skills required include a basic understanding of analytics, data mining, statistics, and natural language processing what one might call Analytics 101. Big data analysts may not have to create a linear regression algorithm, but they should understand how linear regression works. They probably won t write advanced clustering algorithms to look for patterns, but they need to understand how pattern matching works and where it can be used. Big data analysts must understand a wide range of analytics use cases, such as golden path, pattern matching, triggers, events, affinity, and socially aware text analytics. BUSINESS INTELLIGENCE Journal vol. 18, No. 4 15
18 Data Scientists Big data analysts must also have a working knowledge of different types of data including conventional sources, Web and mobile application data, machine data, log data, and sensor data and they must also be able to understand and explore new types of data as they emerge. Only by doing so can they see the potential of integrating the different types of data to discover connections or arrive at insights that have never before been discovered. Equally important, big data analysts must understand the business not just general business principles, but the specific industry and organization they are a part of. Only by understanding these current practices and challenges will they be able to spot opportunities for optimization and competitive advantage. Big data analysts must understand a wide range of analytics use cases, such as golden path, pattern matching, triggers, events, affinity, and socially aware text analytics. Important capabilities to look for in a big data analyst candidate include: Fast thinking: Analytics is an iterative process: query, review results, tweak the query, iterate. To be productive, the big data analyst must be able to quickly analyze results, assess their impact or value, and formulate a new path of discovery. Innovative thinking: Big data analysis is all about thinking outside the box. It s an exploration a discovery process. Big data analysts must be open to completely new ideas and new ways of doing things. They must be able to see new patterns and detect anomalies and outliers, and they must be able to imagine the potential of combining data that hasn t been combined before. For example, does the weather impact the sale of items that one would not normally associate with the weather (such as consumer electronics or types of meat)? Does the rising cost of consumer goods reduce the use of power and cooling? Storytelling: A fundamental job of big data analysts is to put their insights into an understandable context for business users. They must be able to tell the story of the data. Whether it s discovering how the weather in one part of the country is affecting home prices in a seemingly unrelated geographic area or seeing a connection between fans of a certain artist or style of music and potential voters in an election, the big data analyst must be able to convince business users of the connection and its importance. The storytelling may need to be visual using tools such as Tableau or QlikView or in prose. People with both verbal and visual skills should be highly prized. Creating a Big Data Analyst Program and Becoming an Analytics-driven Enterprise You may be able to hire big data analysts, but the strategy that will deliver the greatest long-term benefit includes educating and promoting existing analysts with a program that fosters an analytics culture and helps transform your enterprise into a truly data-driven organization. Here are five tasks you can begin today to create such a program. 1. Create educational opportunities. Start by offering big data analytics education to all interested individuals technical and business staff alike. Think of your training program as an extended analytics center of excellence (ACE). Both business and technical teams will benefit from training in Analytics 101, developing a core understanding of how to use analytic functions. Furthermore, additional business education topics such as marketing, supply chain, or risk management will help data analysts expand their business acumen. As business users learn more about big data technology, they will be better able to understand how new tools and techniques can change the way they do business. In addition, they will learn to better communicate their needs to big data analysts and adapt their business processes to include embedded 16 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
19 Data Scientists analytics or analytics-driven action. Some of the more technical business sponsors will quickly see the potential of analytics and may even choose to pursue the big data analyst position. Finally, by opening big data programs to all participants, organizations can foster a better general understanding of the value of big data and its impact on the way they operate and the services they provide. 2. Provide incentives for participants. Provide incentives for analysts to participate in relevant education programs and reward them for reaching milestones. With the shortage of data scientists and the increasing value of their contributions, companies that provide analytics training without incentive programs run the risk of training new talent for other companies to poach. 3. Reorganize to support big data analyst success. Historically, businesses have organized their analysts in one of two ways either tied to a particular business unit or grouped into a large pool of resources that are available to the business units as needed. Neither strategy is sufficient. Analysts tied to a particular business unit tend to be isolated and out of touch with big data technology. They tend to miss out on the collaboration that normally takes place among analysts and the water cooler conversations about new advances in technology and analytic techniques. In a similar way, analysts separated from the business units into a general resource pool often find themselves a step behind what is happening in the business units. They are never able to adequately understand the issues and challenges of the business units, and therefore can t provide the insights that are needed. They end up reacting to the needs of the business instead of being proactive in their approach to analytics. A much better organizational approach is a hybrid of these two strategies, where some analysts are tied to a business unit, working at a more strategic level, and yet are still a part of the greater community of analysts. Connection to the business puts common goals and objectives at the forefront of their minds, making all of their analytic efforts more strategic. Collaboration with the community of analysts opens them up to constant interaction and sharing of ideas. This approach offers the best of both worlds. The centralized analysts provide the business with a growing set of capabilities driven by new technology and innovation. Meanwhile, analysts within the business units use what they learn from the community to transform the way their colleagues do business. This hybrid approach is critical for companies that want to use big data not just to provide better answers to the same questions they ve always asked, but to truly explore the data and discover new insights. By opening big data programs to all participants, organizations can foster a better general understanding of the value of big data and its impact on the way they operate and the services they provide. As part of this reorganization, consider creating an analytics center of excellence, an informal mechanism for bringing business and technology staff together to develop a common vocabulary, share insights, and create opportunities for cross-pollination. The center of excellence provides a forum to enlighten business users on the power and possibility of the data and technology. At the same time, analysts are exposed first-hand to a better understanding of business processes, requirements, and desired outcomes. One of the most promising outcomes of this kind of collaboration is a road map that drives the increase of data-driven decisions and analytics embedded in business processes for real-time action. 4. Deploy infrastructure to support analytics. No matter how many incentives you put in place and how much education you offer, no big data analytics initiative can succeed without a technology infrastructure that supports unconstrained analytics on massive volumes of information. (Even if you eschew the entire big data ana- BUSINESS INTELLIGENCE Journal vol. 18, No. 4 17
20 Data Scientists lyst strategy and decide to hire data scientists tomorrow, you ll still need such a system for them!) Few existing database systems support unconstrained analytics, so most companies will need to add this technology. Instead of ripping and replacing existing systems, however, forward-thinking companies will look to add an analytic platform with the following capabilities: Embedded analytics: A library approach to analytics, with functions embedded in the analytic database, allows the big data analyst to run sophisticated analytics with a simple SQL call. They don t have to develop algorithms; they just need to know how to use them. Agile extension: Users should be able to easily incorporate new mathematical, statistical, and data mining functions into their analytics library without interrupting analyst productivity. Rapid iteration: The typical analyst follows an iterative process of discovery. High-speed execution of complex queries and access to many data sources allow an analyst to quickly tweak an algorithm or bring new data into the mix. Real-time access: Big data analysts frequently need access to new, time-sensitive data as soon as possible. An analytic platform must support some kind of on-demand access to data and high-performance processing. Extreme flexibility: The system must be flexible enough to allow analysts to run queries whenever they want, make changes to the query at will, and easily enrich or alter the data set to support the discovery process. 5. Foster a culture of analytics. After you have completed the previous four steps, you are in a position to foster a culture where analytics is valued by the entire organization. When it comes to making decisions, many enterprises value most a person s title and years of experience whether or not that person makes decisions based on facts. By fostering a culture of analytics, companies can begin to eliminate opinions, emotions, gut feelings, and ego from decision making so that executives base decisions on data. The impact of those decisions can be tracked, so future decisions continue to be based on the evolving reality. In the film Moneyball, the biggest obstacle to accepting data as the foundation for decisions was that it challenged the authority and influence of those with years of baseball knowledge. Today, every baseball team uses data and statistics as the foundation for building their teams, and it hasn t destroyed the game. Experienced baseball managers and scouts are still highly prized they just have more information available to them than ever before. A data-driven company will still need influential executives with years of experience, but it s time to arm them with the facts and insights they need to make the best reality-based decisions possible. The supply of data scientists will never catch up to the demand for data-driven decisions. The answer to what could amount to one of the most strategic challenges of this decade lies in helping existing analysts become big data analysts. References Andrus, Calvin, and Jon Cook [no date]. Data Science: An Introduction, Wikibooks. wiki/data_science:_an_introduction Davenport, Thomas H., and D.J. Patil [2011]. Data Scientist: The Sexiest Job of the 21st Century, Harvard Business Review. Manyika, James, et al [2011]. Big data: The next frontier for innovation, competition, and productivity, report from McKinsey Global Institute. insights/business_technology/big_data_the_next_ frontier_for_innovation 18 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
21 Marketing IT to BI users Marketing IT to BI Users In-House: The Importance of Small Talk Max T. Russell Max T. Russell is the owner of Max and Max Communications. He works behind the scenes to promote individuals and projects in a variety of industries. [email protected] Abstract Users have a legitimate need to know that IT professionals are real humans, not just geeks with an extraterrestrial language and minds absorbed in technological technicalities. People will be much more open to your business solutions when they can tell you re a normal human being. A computer technology student at a Big Ten university once told me, Nobody in the computer science building can hold a normal conversation. In order to matter to your in-house customers, you have to talk about things they care about. At leisure and at work, most people are highly interested in small talk. In fact, small talk is the way into most users hearts. If you want the best chance at adoption of any BI project you attempt, you d better accept the fact that small talk is a big deal. Small talk is not random babbling. It is informal conversation about things that may or may not be important in themselves but are good for interpersonal relationships and for breaking up monotony. Small talk requires skill and practice. If it doesn t come easy for you, or if you have a bad habit of going deep with everything you talk about, this article illustrates the value of small talk to BI professionals who want users to think of them as partners in business solutions. I ll include concrete advice for improving your small-talk skills as well. BUSINESS INTELLIGENCE Journal vol. 18, No. 4 19
22 Marketing IT to BI users Learn What s Important to Users I was 18 before I learned my older brother s special method for making conversation on the telephone with women. He would open the phone book to the Yellow Pages and call a girlfriend. While he talked, he scanned the ads for anything of interest and worked it into the conversation. He could do this for hours, but I m sure he never talked about anything that mattered to the women. He only managed to keep them from hanging up. A smarter approach would ve been to gather clues about their interests. For example, the person I replaced in one organization gave me a priceless tip for staying on the good side of the supervisor: Ask her about her son, she said. So I did. In fact, I asked the supervisor about her son whenever I needed to establish a little more rapport. It worked every time. I learned that her son had bought some exercise equipment. Her son had switched his college major. Her son had superior math skills. Her son found a girlfriend. Her son, her son, her son. The world stopped while she told me the latest. Then we got back to work and accomplished whatever we had to do. After the small talk, everything was far easier and more unified. Others Are Listening and Watching An IT professional at a healthcare facility where a data management system was still under design told a nurse that she was trying to fix a data entry problem incorrectly. He immediately alienated her and another nurse listening in. While the two nurses contemplated their irritation with him, the first decided to talk about a problem she was having at home with her PC. Her problem was the sort of thing that filled her colleagues small talk on any given day. The technologist wisely seized the chance to give her a simple solution, which erased both nurses irritation as quickly as it had set in. More than that, they admired his social awareness to care and talk about things that didn t have anything to do with work. As a result, they and their fellow nurses were more willing to cooperate in the ongoing and dreaded BI advances. As long as IT continues the small talk at that location, the nurses will cooperate more than they will complain. Without them, there can be no adoption. Don t miss the point: these highly intelligent nurses would be just as impressed if IT were to present itself as normal by talking about something fun that happened at the water park, something interesting in a magazine, how much a co-worker s baby weighed four weeks after birth, or what somebody ran over on the way to work. Small talk brightens routine and cements relationships. All kinds of things happen every day. Notice them and share them with your in-house customers even if you have to wait two weeks for the chance to do it. How a Hamster Can Make a Difference A friend (I ll call him Andrew) was working the help desk at an educational institution where IT was installing a data warehouse when he missed a grand opportunity. All employees at that school were under stress from ongoing technology changes and low student enrollment. A user administrator named Susan was trying to make sense of highly structured information in the new data system. She ed Andrew. Would the performance status change the results of the student placement report? He responded: I think the question actually is, Would the performance status you choose in the parameters affect the outcome of the student placement report? Susan couldn t help but wonder how many IT workers ought to be trimmed in the next round of job cuts. Andrew could have sent a powerful professional message about IT if he had gone to Susan s office, told her he was there to answer her question, perhaps first talked about his daughter s hamster that had escaped into the furnace ductwork, and then handled the IT problem. 20 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
23 Marketing IT to BI users This course of action would have been the talk of the entire user management team and their departments. They would still be telling and laughing about their own stories of pets that have escaped, and remembering Andrew s daughter s hamster. Instead, they are still laughing in astonishment at IT s antisocial behavior and trying to figure out how to keep IT out of their hair. By being alert to your surroundings, you will find literally thousands of conversation starters. Small as they may seem, they can play a big role in your BI efforts because they help users relate to you and to look forward to seeing you again. Two Minutes Will Go a Long Way You only need to survive two minutes of small talk to prove you re adequately social. Any reasonable employee knows you have work to do. Excuse yourself and get on with the job. Here are some lighthearted examples with a small-talk feel. They could be in messages or in person: Now that you ve learned those shortcuts for your Android, let me also show you some shortcuts for using our visualization program so you can help me show your department how good it is. My daughter was frightened when her hamster disappeared, but our new BI program shouldn t scare you. Let me show you how your performance data affects your student placement report. It s easy to transition away from small talk. For example, you could say something like, It s good to hear about your trip. Now I need to show you about our visualization program. The images aren t as exciting as your vacation snapshots, but it comes close for the work you do every day. Let s explore some of the latest features of this incredible software. Watch this. All of these conversation snippets are in the spirit of small talk. They show that: You care about people You can listen to their choice of topic or supply one of your own You can remain professional and on task BUSINESS INTELLIGENCE Journal vol. 18, No. 4 21
24 BI training BI Training Closing the Business Analytics Gap at UT Austin Linda L. Briggs Abstract As big data creates a need for more in-depth analytics, companies face a growing need for skilled workers with the right blend of hard quantitative skills and soft business skills. With high-profile sponsors including Walmart and Deloitte Consulting, a new master of science program at the University of Texas aims to help bridge that gap. Addressing the growing need for workers with a combination of data analytics and business skills, a new master of science program at the University of Texas at Austin focuses on business analytics. The program, part of the university s McCombs School of Business, is backed by funding from high-profile corporate sponsors such as Walmart and consulting firm Deloitte; representatives from sponsor firms also serve on the program s executive council to provide insight and advice as the program develops. Fifty-two students, with a wide mix of backgrounds including computer science, engineering, business, economics, and mathematics, completed the two-week introductory boot camp and began classes in early September. About half are coming into the program straight from an undergraduate degree; the remaining students have, on average, just a few years of work experience. They will graduate after a year with a master of science degree in business analytics. With an emphasis on both business and technology skills, the program has been specifically designed to address the needs of companies facing a shortage of skilled workers with the right mix of skills for dealing with in-depth business analytics, especially for big data. According to Mike Hasler, director of the program, the school has increasingly heard from employers and recruiters about the need for skilled workers who can dive deeply into data and gain insights from analysis. It is a talent that s lacking in the industry, Hasler said. We ve had several of our industry sponsors, recruiters, and employers come to us and say, What can we do to help fill that gap? Hasler is also a lecturer in supply chain and operations at the business school and associate academic director for the school s supply chain management center of excellence. He holds a doctorate in resource development and spent years in the private sector focused on supply chain issues and technology in the automotive industry. 22 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
25 BI training The new program includes several corporate sponsors that provide financial support and play a major role in helping develop the curriculum, build faculty, and attract students. The sponsors will also supply real-world data sets and current business problems for students. One such sponsor is Walmart, one of the largest companies in the world and an obvious flag-bearer for the increasing need for workers with strong analytics and business skills. Linda Vytlacil, vice president of global customer insights and analytics for Walmart, leads the development of customer analytics strategy as well as the enablement of Walmart s international markets. She said that although Walmart can and does hire workers with pure science backgrounds (such as mathematicians and statisticians), the company definitely sees a need for those who can integrate business decisions with technology strengths. That s really important for us, Vytlacil said. The company also trains internally, but she noted that Walmart sees great value in partnering with well-designed educational programs. Hard versus Soft Skills Although the job title data scientist has gained tremendous currency recently, Vytlacil said Walmart tends to be more interested in job function than title. What the company is looking for in graduates from programs such as UT Austin s goes beyond data scientists in both title and skills. Workers hired into Walmart for their analytics skills might have titles such as customer analytics professional, manager of customer analytics, or even simply program manager. We do hire under the data scientist job title, Vytlacil said, but it s just one of many titles used. It s quite wide and varied, and frankly, still developing. The UT Austin program is particularly attractive, she added, because Hasler and his team have done a good job of integrating business-skill training along with technical skills. Walmart really, really wants the hard skills, the skills in statistical methods [and] applied analyst techniques, Vytlacil said, but we also need the soft skills... strong business judgment, strategic thinking, communications and influence, and, above all, the ability to recognize that the data doesn t tell a story but that we employ the data to solve a business problem. Another sponsor of the new program is management consulting firm Deloitte Consulting LLP, one of the largest such firms in the world. Deloitte employs about 200,000 workers worldwide and offers a range of services including auditing, consulting, financial and tax advice, and risk management. According to Jonathan Trichel, a principal with Deloitte and the national service line leader of Deloitte s U.S. strategy practice, his company like Walmart needs employees with the ability to tie hard analytics in with business strategy. That s the real challenge a lot of our clients face. Although doing the quantitative work of handling large data sets, including modeling, certainly isn t simple, Trichel said, a bigger need is business skills, including the ability to first inform the problem set. Why does the analysis need to happen, to what end, and what are you trying to find out? Based on the insights, what do you do, how do you execute on them, how do you measure what you re doing as a result of the analytics? That softer side of working with big data is what companies often need help with more than the math side, he said. Clients have lots of questions about how to align [their] business strategy with [their] analytics needs, or the other way around. Another area where Deloitte helps clients and needs specific skills in its workers is in setting up the governance about how analytics functions within a business, Trichel said. That includes answering questions such as, Who gets to decide which project goes first, or how limited resources are spent analyzing data, or whether they ve done a good job of actually executing on the findings and measuring that, and all of the organizational governance around that. Workers with the skills being taught in the new master s program are the employees Deloitte needs to continue growing, Trichel said. As one of the largest MBA recruiters BUSINESS INTELLIGENCE Journal vol. 18, No. 4 23
26 BI training in the country Deloitte hires 300 to 400 MBAs a year the company sees a notable lack of programs that specifically teach business analytics. Although the firm has had luck in finding statisticians and quantitative degrees such as mathematics, Trichel said, the type of master s degree crafted by UT Austin combines the quantitative skills with the business skills in a way that we have had a hard time finding. Mixing Business and Technical Skills Finding the right mix of skills combining both business understanding and technical knowledge is just one of the challenges in filling business analytics spots. The University of Texas program is special, Hasler explained, because it focuses heavily on the business environment. We re really trying to get people with quantitative skills and give them a mindset for business analytics. We re solving problems in the business domain. Students come from a wide range of backgrounds including business, economics, engineering, computer science, mathematics, and even psychology and sociology, but one thing all the students share, he pointed out, is a quantitative background. For those with skills on the business side, that might mean they have focused on mathematical issues such as supply chain management, finance, accounting, and statistics. We re looking at this program as one that can take people with those quantitative skills and quantitative aptitude and enhance and deepen those skills in the analysis part, specifically in the business environment, Hasler said. The program has no specific vertical focus, he said, but rather, We re completely comfortable with the fact that the tools that we re teaching them in this program can be used in a variety of analytical areas, whether that s medicine, the life sciences, geology, or oil exploration. Initial courses offered in the fulltime, three-semester program (which will graduate its first students in August 2014) include programming, decision analysis and optimization, database management, and advanced data analytics focused on predictive modeling. A course in financial management will help teach students the language of business. Along with more of what Hasler called toolbox courses (such as advanced predictive analytics and a higher-level data analytics course focusing on machine learning) will be applications-oriented courses in social media analytics, marketing analytics, and supply chain analytics. The program wraps up with a capstone course that ties everything together and uses large, real-world data sets from sponsors such as Deloitte and Walmart. The aim is to give students real data and real-world business problems to solve using the tools and the applications they ve learned in the program. Chewing on crunchy data problems together at the university with our clients, potentially in a lab, stands to help everyone, Trichel said. Hasler prefers to be cautiously optimistic about employment possibilities, but he has already seen plenty of interest from potential employers. Along with partners Deloitte and Walmart, the university has been in contact with nearly 100 companies who have expressed some interest in hiring or recruiting our students across a variety of different industry groups retailing, oil and gas, other consulting firms, high technology, medicine, healthcare. It cuts across different companies and industries.... I prefer to under-promise and over-deliver to my students when it comes to the recruiting side of things. However, the response we ve received so far gives us, we think, good reason to be optimistic. Linda Briggs writes about technology in corporate, education, and government markets. She is based in San Diego. [email protected] 24 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
27 Data Virtualization Overcoming Data Challenges with Virtualization Nilesh Bhatti Nilesh Dalsukhbhai Bhatti is a technology architect with Infosys Ltd. His focus area is solution architecture and technical consulting in business intelligence and data warehousing issues. [email protected] Abstract In today s competitive business world, the success of any enterprise depends on how quickly and easily it makes the best decisions. Most enterprises rely on business intelligence systems that provide management with a one-stop shop for performing trend analysis and evaluating KPIs against multiple dimensions. Building a BI application is an expensive and time-consuming process. Once built, an application may not be easy to maintain. At the same time, enterprises are under constant pressure to improve processes, reduce costs, and manage increasingly disparate source systems. Data virtualization technology addresses these challenges. The traditional data warehouse system focuses on building a centralized repository of aggregated or summarized historical data, whereas data virtualization builds a direct connection to multiple disparate source systems and provides a virtual environment for accessing integrated data. This approach requires less time to build, is less expensive to develop, and helps keep BI systems agile. This article highlights several use cases, challenges, and best practices for data virtualization. What is Data Virtualization? Data virtualization is a process that provides a unified, virtualized view of the data for a single data store and integrates multiple, disparate source systems in real time or near real time regardless of where the sources exist within the enterprise, outside the enterprise, or in the cloud. Data virtualization provides an abstraction layer that hides the application s technical aspects such as where the data is stored and what format it uses. Because of this abstraction layer, applications do not need to know where all the data is physically stored, how the data should be BUSINESS INTELLIGENCE Journal vol. 18, No. 4 25
28 Data Virtualization Web service, SQL, JDBC/ODBC Virtual view Integrated, abstracted, and virtualized data layer Virtual view Virtual view Data Virtualization Server Centralized data transformation and integration Source 1 Source 2 Source n Cloud XML/flat files Heterogeneous and physically distributed source systems Figure 1: A standard data virtualization architecture. integrated, where the database servers are running, or which database language to use. Why Data Virtualization? There are several reasons why organizations are embracing data virtualization. 1. Data integration after mergers and acquisitions When two companies are combined, they may need to merge their business intelligence and analytical systems to enable a complete, consolidated view of key information. Physical consolidation of this data requires higher storage costs, lengthens implementation tasks, requires additional maintenance effort and staff, and increases delays in data delivery. Data virtualization is the ideal choice to overcome these challenges. 2. Real-time data A typical BI environment consists of a centralized data warehouse and multiple data marts; these store historical and summarized or aggregated data. To make a decision, management often needs to drill down to the detail level of this data, which data virtualization enables by integrating the summarized data from the data warehouse and real-time, detail-level data from operational systems quickly and easily. 3. Increase agility and overcome data integration complexity Enterprises are finding it critical to make their business intelligence systems more agile. Data virtualization improves BI agility by simplifying the complex data integration issues of disparate source systems, virtual data mart creation, and virtual operational data store (ODS) layer creation. 4. Cost and time savings Because of exploding data volumes and the increase in fragmented data sources, organizations are realizing that physical consolidation of data is not the solution of choice for all data integration needs. Physical consolidation of data from disparate source systems involves higher storage and licensing costs, longer response times, increased maintenance efforts, and a greater need for staff resources, making this approach problematic for many critical projects. 26 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
29 Data Virtualization Factors Physical Consolidation Data Virtualization Quicker time to solution Enterprise is ready to invest time to physically integrate Enterprise needs faster time to solution data from heterogeneous systems Cost Enterprise has necessary budget and is ready to make Enterprise is looking for a cost-effective solution an infrastructure investment Source system availability Source systems are often offline Source systems are always up and running Transformation The data transformation is complex Data transformations are straightforward and can probably be achieved by SQL-like functionality Data cleansing Data quality is poor; data requires extensive cleansing Data is in good shape but enterprise may need to perform basic validation checks and reformatting tasks Real-time data End users need only historical data for analysis and reporting End users need real-time data Table 1: Factors for choosing between physically consolidating data and using data virtualization. 5. Improve data quality and governance In a traditional data integration methodology, data is replicated from source systems to a data warehouse, data mart, and sometimes into the ODS. This additional replication makes managing data quality and governance difficult. Data virtualization eliminates the need for data replication, so enterprises can concentrate on data governance at the source system level rather than at each copy of the data store. 6. Big data integration Increases in volumes of data, as well as the complexities of unstructured data across industries, are forcing organizations to look into new opportunities from big data. Businesses need to have a platform that can manage large volumes of different kinds of data, integrate that data, and glean new insights for competitive advantage. This must all be done efficiently and cost effectively. Building a warehouse for such a huge volume of data is not an affordable solution, but data virtualization can overcome this challenge. Most tools promise seamless integration of enterprise data with all types of big data, including NoSQL data stores, MPP-based appliances, and unstructured data stores. 7. Social media With the growing popularity and use of social networking sites, enterprises are struggling to leverage the huge volumes of data these sites generate for business purposes such as managing customer relationships, improving customer retention, managing marketing activities, and monitoring public perception. Enterprises cannot afford to build a warehouse for such large volumes of unstructured data. Most data virtualization tools offer social media integration features so enterprises do not have to physically add social media data to their existing data warehouses in order to work with all the data as one source. Using Data Virtualization Based on key business, data source, and end user considerations, Table 1 provides guidelines for choosing between physically consolidating data and using data virtualization. Let s look at several data virtualization high-value use cases. Case 1: Using data virtualization as a tactical solution to integrate new source systems Problem statement: With the frequent changes in reporting requirements, end users often need additional data not present in the existing enterprise data warehouse (EDW) and must incorporate data from external sources. Traditional solution: In a classic data warehouse architecture (a source system, staging layer, data warehouse, data BUSINESS INTELLIGENCE Journal vol. 18, No. 4 27
30 Data Virtualization Web service, SQL, JDBC/ODBC DM DWH DM Virtual view Integrated, abstracted, and virtualized data layer Virtual view Virtual view Data Virtualization Server ETL Centralized data transformation and integration Source 1 Source n Cloud XML/flat files External source Heterogeneous and physically distributed source systems Figure 2: Data virtualization architecture solution to integrate new source system. mart, and a presentation layer), bringing the external data into an existing system requires an enterprise to change its data model for the staging, data warehouse, and data mart layers. Once modeling updates are complete, data is extracted, transformed, and loaded into the staging, data warehouse, and data mart layers from the new source system(s). Proposed solution: Using a data virtualization tool, an enterprise can make a remote connection to the external source and bring the required tables into the middleware virtualization tool. Abstracted and consolidated virtual views will be created based on the predefined business rule, and these views will be exposed to the end user for querying and reporting. This solution enables quick integration between an existing enterprise data warehouse and new external source systems, and it delivers real-time results without the additional storage costs or delays of complex ETL. Advantages of the proposed solution using data virtualization include: New information is made available faster without bringing it into the existing data warehouse The existing data warehouse remains stable because new information remains outside it Easier integration of heterogeneous data; the enterprise benefits from unprecedented breadth and depth of information Data is instantly available to the company s dashboards, scorecards, and other visualization tools Reduces data replication and improves availability of real-time data Case 2: Using data virtualization as a tactical solution to create virtual data marts and virtual ODS, making the BI system more agile Problem statement: In any enterprise data warehouse implementation, we have a centralized data warehouse and multiple data marts derived from it. The data marts 28 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
31 Data Virtualization Web service, SQL, JDBC/ODBC DM DWH DM Virtual mart Integrated, abstracted, and virtualized data layer Virtual lookup Virtual ODS Data Virtualization Server ETL Centralized data transformation and integration Source 1 Source 2 Source n Cloud Heterogeneous and physically distributed source systems XML/flat files Figure 3: Data virtualization architecture solution to create virtual mart and ODS layers. are the physical subset of the large data warehouse and are used by a subset of users. Similarly, an operational data store (ODS) and lookup data repository are physically created for a specific purpose directly from source systems. Adding physical data marts or an ODS increases project and ongoing costs. Traditional solution: Assuming a classic data warehouse architecture (as in Case 1), data marts and an ODS will be created by extracting, transforming, and loading data from the data warehouse layer and different source systems, respectively. Proposed solution: As part of the solution, a remote connection can be made to the heterogeneous sources and an enterprise can bring in the definitions of the required tables from the existing data warehouse and other source systems to the middleware virtualization tool. Abstracted and consolidated virtual marts or a virtual ODS layer will be created based on the predefined business rules, and these views will be exposed to end users for querying and reporting. The advantages of the proposed solution using data virtualization include: Provides a unified virtual schema across diverse schemas Faster time to solution Saves hardware and storage costs with less effort and maintenance Agile BI reporting and real-time dashboards are supported Case 3: Using data virtualization as a tactical solution for integrating multiple data warehouse systems after a merger and acquisition Problem statement: Company A acquires Company B, and both companies have their own data warehouse implementations that use different technologies. Management wants an integrated view of data from both enterprise data warehouses. BUSINESS INTELLIGENCE Journal vol. 18, No. 4 29
32 Data Virtualization Data Virtualization Server Web service, SQL, JDBC/ODBC Virtual mart Integrated, abstracted, and virtualized data layer Virtual mart Virtual MDM Centralized data transformation and integration DM DM DM DM DWH ETL DWH ETL Source 1 Source 2 Source 1 Cloud XML/flat files Heterogeneous and physically distributed source systems Figure 4: Data virtualization architecture solution to integrate multiple data warehouses after a merger or acquisition. Traditional solution: The new company physically combines the multiple marts and warehouses into a single and complete enterprisewide data warehouse using an ETL tool. This process includes designing an entire data warehouse schema, developing the ETL processes, and loading data onto the warehouse and data marts. Proposed solution: As part of the solution, a remote connection can be made to the multiple data warehouse systems to bring in the required dimensions and fact table definitions to the middleware virtualization tool. Integrated virtual marts will be created per the predefined business rules. The advantages of the proposed solution using data virtualization include: Faster time to solution Provides a holistic view of a single entity (e.g., customer, product, supplier) Less data replication Challenges of Data Virtualization Every technology has its own pros and cons, and it s no different with data virtualization. For example: It is difficult to ensure that a return result set of data combined from multiple and disparate source systems is valid and accurate. Data virtualization implementation means an enterprise introduces an additional layer of software between the data store and the data consumer, which requires CPU time and can hamper query performance. 30 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
33 Data Virtualization An ETL tool transforms the data and stores the result in physical storage; the data is transformed, and the result can be reused many times. However, because data virtualization performs on-demand transformation, results are not used repeatedly. The transformation is executed every time a virtual table is accessed. Every data virtualization tool uses its own proprietary language for specifying the mappings and wrappers. Therefore, switching from one data virtualization server to another is not simple. An ideal use case typically has one of the following characteristics: Data access from multiple, heterogeneous sources Use of real-time data Data requirements change frequently Faster time to solution is required Quick prototyping is needed All the virtualized views will be executed against an underlying remote source system and on-the-fly transformation may lead to performance degradation. All underlying source systems must be up and running when the data virtualization tool accesses them for every request. Data Virtualization Best Practices There are several ways to ensure your enterprise reaps the greatest benefit from data virtualization. Best Practice #1: Use the appropriate tool Any organization investing in a new technology requires a scientific evaluation process, and data virtualization tools are no exception. Every organization must make a short list of vendors based on factors such as market penetration, cost, data access, data delivery to consumers, ease of use, and the breadth of caching options for better performance. Best Practice #3: Focus on data governance, data quality, and performance tuning Organizations must focus on satisfying requirements for data governance, data quality, and performance tuning at a very early stage of data virtualization implementation. Summary Any use case requiring data access from multiple heterogeneous sources, real-time data, dynamic requirements, and a faster time to solution is ideal for data virtualization. Data virtualization delivers agile BI at a fraction of the time and cost compared to conventional data warehouse approaches. Businesses get a return on investment in the form of a significant reduction in hardware, storage, development, and maintenance costs. The best way to select a data virtualization tool is to carry out an evaluation proof of concept that includes all the parameters that will influence the selection. Best Practice #2: Identify the appropriate use case Data virtualization is not a panacea for all data integration issues. As a best practice, evaluate whether the identified use case is an ideal candidate for data virtualization. BUSINESS INTELLIGENCE Journal vol. 18, No. 4 31
34 Big Data Management Big Data Management Platforms: Architecting Heterogeneous Solutions Ravi Chandran Ravi Chandran is the CTO of XtremeData, Inc., which provides massively scalable, highperformance DBMS for big data analytics. The inflection point in computing solutions, initiated by the convergence toward virtualized-x86-linux hardware and public cloud offerings, has been accelerated by big data. Traditional IT budget planning is rapidly becoming obsolete, as enterprises now have the option of swapping high up-front CapEx (buy hardware or software) for more evenly spread OpEx (rent infrastructure or software-asa-service). These options are timely given the challenges associated with the rapid growth in data volume, and this growth is orders of magnitude larger than the growth of the enterprise s traditional operational data. At the same time, no enterprise can afford to ignore the value of big data analysis. Big data has been democratized: where it was once the domain of global B2C enterprises, now businesses of all sizes have easy access to, and can benefit from, big data. The traditional IT model purchasing excess capacity to meet the next few years growth will no longer work. Furthermore, CIOs are being challenged to do more with less. Data volumes are growing much faster than budgets can possibly increase, so incremental evolution of legacy technologies cannot meet demands. In recent years, market pressures have spawned a wide range of new technologies, many of which are narrowly focused on specific applications. This article focuses on structured data sets, essentially the traditional ecosystem of data warehouses and data marts extended to include new big 32 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
35 Big Data Management Operational DBs Hadoop Ingest, process, and store MPP SQL engines Iterative data intensive processing Column/in-memory Specialized reporting 1,000s of nodes 100s of TB 100s of nodes 10s of TB 1 or few nodes ~1 TB Figure 1: A big data flow. data sources. Unstructured or loosely structured data such as text files, documents, audio, and video have their own specialized solutions. The challenge of structured big data analytics is to deploy solutions that scale affordably in the context of loading, joining, aggregating, analyzing, and reporting on billions of records quickly enough to meet business demands. A one-size-fits-all approach is not possible today; consequently, a big data platform strategy needs to take a heterogeneous approach. All components of a heterogeneous solution need to meet one common criterion: the ability to be deployed on today s converged, sharable hardware infrastructure. This is the era of commodity (Linux on x86) hardware, scalable horizontally as clusters of servers and virtualized to enable sharing by multiple OSes and applications. This converged architecture applies equally to public clouds (such as Amazon Web Services and Google Compute Engine) and private clouds within the enterprise data center. The technologies that apply to structured big data analysis may be thought of as falling into three broad classes: Hadoop, massively parallel processing (MPP) SQL engines, and specialized reporting solutions that include in-memory and column-store databases. This is shown in Figure 1. The data flow for big data travels naturally from left to right in a funnel shape, getting smaller in volume as it travels. This diagram illustrates one particular example of data flow; not all processes will require this same sequence, but the three classes of solutions are typically the necessary components. The strengths and weaknesses of each class are described in more detail later in this article. As depicted, Hadoop is meant to include all variants and distributions, and incorporates all the core components: the Hadoop distributed file system (HDFS), MapReduce, Hive, Pig, Zookeeper, and so on. Hadoop is designed for large scale, and it is not unusual for systems to comprise thousands of nodes and store hundreds of terabytes. MPP SQL engines are also designed for scale, but are typically smaller in size: hundreds of nodes and tens of terabytes. The solutions for specialized reporting are the smallest: typically non-mpp, single-node implementations of one terabyte or less. Together, these three classes of solutions form a heterogeneous mix of tools in a big data management platform (BDMP), in contrast to a traditional, homogeneous data warehouse. BUSINESS INTELLIGENCE Journal vol. 18, No. 4 33
36 Big Data Management Problem: MapReduce not easy to use Solution: NoSQL, MapReduce Solution: LiteSQL API, MapReduce Start here Problem: Legacy DB solutions cannot handle big data Problem: LiteSQL on MapReduce is slow Solution: High-performance MPP SQL engine Solution: LiteSQL without MapReduce Problem: LiteSQL not enough Figure 2: The SQL cycle. The SQL Cycle One common thread linking the three classes of solutions is SQL. Except for Hadoop, the other two classes natively support SQL. Even though Hadoop initially kicked off the NoSQL movement a few years ago, things have come full circle, as shown in Figure 2. Hadoop with MapReduce was the initial solution, but the market quickly realized that the skills for MapReduce programming were scarce in comparison to SQL skills. A lite-sql API was then implemented on top of the MapReduce framework in the form of Pig and Hive. Performance became an issue because of the limitations of the underlying MapReduce framework. Today, there are numerous efforts under way to bypass MapReduce, such as Cloudera s Impala. However, lite- SQL is not enough; SQL programmers want more. The bottom line is that the market is asking for a full-featured MPP SQL engine that can complement Hadoop. Understanding Solution Strengths and How They Coexist As we mentioned, SQL skills are plentiful and SQL programmers have certain expectations about their environment, as shown in Table 1. In the context of big data analytics, not all of these skills or expectations may be strictly necessary for a successful solution. In fact, one or more of these constraints may be relaxed in order to achieve better performance and scalability. These expectations are noted as a backdrop to the more detailed analysis of the three classes of solutions Hadoop, MPP SQL engines, and specialized reporting solutions we discuss now. Hadoop Hadoop solutions are ideal for the landing and staging of large data volumes, both structured and unstructured. The underlying HDFS brings excellent scalability and high availability to storage on inexpensive commodity hardware. Loading data into Hadoop is nothing more than writing a file into the file system, as opposed to the onerous checking and validation process involved in a 34 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
37 Big Data Management Skill System safeguards Data quality Data organization Data manipulation Expectation Commits and roll-back transactions Concurrency via locks or MVCC Control over access privileges Audit trails for compliance and monitoring Column constraints checking Referential integrity checking Physical control over distribution on MPP Logical control over partitions and indexes Insert/update/delete operations Fast bulk-load operation Table 1: SQL skills versus expectations of SQL programmers. Hadoop/MapReduce Good at Ingest of loosely structured data Appends to data Reliable storage on commodity hardware Batch-mode flow-through processing Simple SQL Very large data sets Machine learning Not good at Table 2: Hadoop and MapReduce pros and cons. Column-Store Databases Good at Compression Appends to data Static data sets Predefined querying (BI reports) Subsecond responses Simple schemas Complex relationships between tables Metadata management Inserts, updates, deletes Efficient use of hardware Interactive exploration of data Data-intensive processing: joins/ groups/aggregates Not good at Fast ingest of data Inserts, updates, deletes Dynamic data sets: frequent ingests/ updates Ad hoc exploration of data Data-intensive processing: joins/ groups/aggregates Complex schemas Table 3: Pros and cons of column-store databases. load into a relational database management system (RDBMS). The higher layers of the Hadoop stack (MapReduce, Pig, Hive) do not insist on a rigid, predetermined schema. The data is interpreted on demand by the higher layers. This feature makes Hadoop a great environment to rapidly ingest and store large volumes of data. However, Hadoop is not a relational database engine and was never intended to be. A SQL API and/or a SQL execution engine does not transform Hadoop into a true ACID-compliant RDBMS (atomicity, consistency, isolation, durability). Maintaining complex relationships between tables and managing user access permissions and concurrent accesses is difficult in the Hadoop environment. These solutions also struggle when facing data-intensive processing such as grouping, aggregating, and non-trivial joins between large tables. The pros and cons are summarized in Table 2. For more sophisticated metadata management and complex SQL-based operations, true relational DBMS solutions offer a richer environment and much faster performance. Faster performance translates to smaller hardware footprints and, therefore, cost savings. DBMS solutions offer a well-understood, easy-to-use environment for the rapid development of analytics applications. MPP SQL Engines and Specialized Reporting Solutions Both of the last two classes of solutions (MPP SQL engines and specialized reporting solutions) are typically true RDBMS engines, differentiated only by row-store, column-store, or in-memory approaches. Let s look at the last approach first. In-memory solutions offer extremely fast response, but are limited by the amount of physical memory installed. High-performance, in-memory DBMS solutions are typically built on the assumption of using a single shared-memory space. Other specialized solutions offer sharable distributed memory (such as Memcached, Druid, and Hazelcast), but these solutions are non-rdbms and do not easily meet the requirements of the business intelligence reporting ecosystem (presentation and analysis tools such as MicroStrategy, Cognos, BusinessObjects, and so on). BUSINESS INTELLIGENCE Journal vol. 18, No. 4 35
38 Big Data Management Today, in-memory RDBMS solutions are mostly restricted to the size of memory available on a single server. This constrains in-memory DBMS solutions to relatively small sizes. A new twist to the in-memory versus on-disk alternatives is the rise of solid-state devices (SSDs), discussed later. In summary, DBMS solutions that have been specifically engineered for in-memory operation have memory-resident data structures, and make certain assumptions about latency and speed of memory access. If these assumptions are violated, for example, by non-uniform memory access (NUMA) in a distributed system, then performance will tend to be unpredictable and erratic. On the other hand, a natively parallel (MPP) DBMS engine, designed for a shared-nothing, distributed architecture, will much better leverage the resources in a modern cluster. The advantages of a column-store database are obvious when input data has many columns and queries touch only a few. That leaves us with row-store versus column-store database engines. Column-store engines have been around for decades (such as Sybase IQ), but have not succeeded in grabbing major market share. The reasons for this are fairly straightforward. Most incoming data is naturally organized as rows (records) with multiple columns (fields). Most query results return rows (groupings) with multiple columns (aggregates). The advantages of a column-store are obvious when input data has many columns and queries touch only a few: Performance is much faster because only those few columns are accessed from the disk. However, there are costs inherent in the decomposition of the row into columns during load, and the necessity for some mechanism to reconstruct the row when needed. The mechanism for row reconstruction typically involves either maintaining the columns in some known order or maintaining keys/indexes. The overhead of maintaining columns results in column-stores typically being unable to support write-heavy operations, such as rapid, continuous ingests, inserts/updates/deletes, and iterative creation and modification of temporary tables. Column-stores deliver efficient data compression and are best suited for write once/read many workloads with predetermined queries, as shown in Table 3. The flip side to this summary of column-store databases is that row-store databases do well at everything that the column-stores are not good at; the designs complement each other. Many big data analytic processes require data-intensive processing (joins, groups, aggregates), as well as frequent or continuous data ingest and iterative write operations (temporary staging tables). Row-oriented database solutions fit the bill. Traditionally, these types of dataintensive processing were performed either using an ETL tool or via batch SQL within an RDBMS. With the advent of big data, this is becoming increasingly more difficult to perform using tools and environments that are separate from the converged scalable hardware infrastructure. These ETL-like processes are now being ported to Hadoop MapReduce, and when complex processing is required, the processes are directed to an MPP row-based RDBMS that is also deployed on the same hardware environment. Cohabitation Brings Real-World Results As we mentioned, Hadoop-based solutions are great for landing, staging, and transforming massive volumes of structured and unstructured data. Column-oriented databases are ideal for write once/read many data stores that require fast querying. Row-oriented MPP databases are good for mixed read/write workloads, and ongoing, heavy-lifting ELT (extract, load, then transfer) within the database. Consider a real-world example of a rapidly growing firm in the exploding business of online digital advertising. 36 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
39 Big Data Management The company is building out a data management platform that relies on all three of these solutions to maximize productivity at a reasonable cost. This company collects data about online advertising activity, aligns the data with offline customer information, and uses analytics to optimize ad placement, pricing, and campaigns. The data management platform ingests nearly 10 TB of granular data each day and uses Hadoop to stage the data. The platform also uses an MPP row-oriented database for its strengths: complicated data integration that relates online impression and clickthrough activity to offline customer information and performs various roll-ups, both on a daily and an intra-hour basis. The database engine runs on the same inexpensive Hadoop cluster hardware. Finally, the system creates multiple data marts to be published to a column-store database. These data marts exploit their fast query response times to enable external customer access for reporting purposes. Each weekend the column-store data marts are rebuilt with new data. By using best-of-breed heterogeneous tools rather than one-size-fits-all tools, this company runs a near-real-time, analytics-driven business, providing customers with subsecond query responses at a manageable cost. They expect that today s 20 TB data warehouse will be 200 TB in 18 months. As a sign of the times, this digital advertiser is moving portions of its system to a public cloud where some of its data sources also reside. Cloud on the Horizon Cloud-based computing is dramatically changing the enterprise data center landscape. A 2011 survey by North Bridge Venture Partners projected that cloud spending will increase at a compound annual growth rate of 67 percent through 2016 (Skok, et al, 2011). The report forecast that 36 percent of data center budgets will be spent on cloud infrastructures by More than half the respondents in this survey expected to deploy hybrid clouds consisting of both public and private components. According to this study and others like it, the top three reasons reported for cloud adoption are agility, scalability, and cost (in that order). Heavyweights such as HP, Google, and AT&T are making major pushes into the public cloud market. Big data platforms are already embracing public cloud computing in two increasingly common cases. The first is where an intercompany data supply chain (e.g., digital adverting chains linking website publishers to online ad networks or healthcare chains linking providers to payers) involves companies whose data is already cloud-based, so moving it off the cloud creates an extra hop. In another common case, cloud platforms are being used to create agile and economical hot backup environments. These use cases are the tip of the arrow for the growth of cloud computing in big data analytics. Surveys reveal the top three reasons reported for cloud adoption are agility, scalability, and cost (in that order). Freeware at What Price? Amid their efforts to contain costs, today s architects and planners need to be cautious of the allure of freeware and open source software. The major caution is not about the need for support, for which there are many commercial solutions. Instead, there is a hidden cost; most free software solutions were designed to provide functionality rather than high performance. It is not unusual for optimized commercial MPP RDBMS engines to outperform Hadoop-like solutions by factors of 10x to 50x. This translates to needing 10x to 50x the hardware to achieve the same performance. This is often overlooked. Because Hadoop solutions are relatively new, CIOs and other decision makers do not have established yardsticks to measure relative costs. As long as the Hadoop solution functions correctly, the costs go unchallenged. This would be unimaginable in a traditional data warehouse vendor evaluation, where even a 20 percent cost differential would be significant, much less a factor of 10x to to 50x! BUSINESS INTELLIGENCE Journal vol. 18, No. 4 37
40 Big Data Management Technology Trends and Column-Stores As noted earlier, non-volatile memory (SSD) technology is rapidly evolving: storage capacity is going up and prices are going down. In the hardware comparison of volatile memory versus non-volatile memory versus disk, the most dramatic changes are occurring in the middle with SSDs. How will this impact the three DBMS solutions of in-memory, column-store, and row-store? In-memory solutions rely on a uniformly accessible read/write memory space (main memory). SSDs, unlike main memory, have a block structure and different read and write characteristics. Data structures designed for main memory do not port over to SSD easily, making it challenging to leverage SSDs without major effort. Because Hadoop solutions are relatively new, CIOs and other decision makers do not have established yardsticks to measure relative costs. Column-stores were fundamentally designed to work around the penalty of low disk bandwidth and the penalty of random disk read/writes (the seek penalty). Accessing only columns of interest (compressed) reduces both the bandwidth and the random-seek requirements of the disk system. However, SSDs fundamentally remove these same penalties, as they offer very high bandwidth (about 10 times higher than disks) and no penalty for random seeks. bandwidth (about 100 times that of disks) and no penalty for seeks. As main memory sizes continue to increase (modestly) and SSD sizes increase (greatly), the solutions that will benefit enterprises the most are those that can use these storage layers in a distributed, shared-nothing, non-uniform manner. This applies equally to in-memory, row-stores, or column-stores. Conclusion In the time it took to read this article, tens of billions of new business transactions have taken place and many exabytes of machine-readable data have been generated. This phenomenal growth is driving major shifts in the computing marketplace. The market has responded with a wide spectrum of new technologies, and today the arena is still immature and in a state of rapid evolution. Consolidation and convergence is to be expected over the next few years. In the meantime, for structured big data analytics, a heterogeneous architecture is the best approach. References Skok, Michael, et al [2011]. North Bridge Venture Partners, Future of Cloud Computing report; results presented at GigaOM Structure conference, June. Summary at Of course, column-stores can also use SSDs and get even faster performance, but at some point (perhaps sub-second), faster performance may not provide value because microsecond responses cannot be digested by human clients. Furthermore, network latencies will become greater and more significant than query times. This same logic also applies to main memory: very high 38 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
41 BI Experts perspective Experts Perspective Aligning Business Strategy with BI Capabilities Alicia Acebo is president for Rock Paper Data, Inc. Jim Gallo is national director of business analytics for Information Control Corporation. Jane Griffin is managing director for Deloitte. Brian Valeyko is director of EDW, BI, and big analytics for NCR Corporation. Alicia Acebo, Jim Gallo, Jane Griffin, and Brian Valeyko Until his recent appointment as BI director, Bill Carroll was an analyst in the finance department for his company. He gained a reputation for being analytical, bright, hardworking, and a good communicator. The BI team reports to the CFO, so perhaps it wasn t too surprising that Bill was selected for the director s position even though he has limited education and experience with BI. The previous director wasn t able to connect well with users and managers throughout the company, and the thinking was that Bill would be more successful in this regard. Bill inherited a three-person staff that runs the warehouse, as well as three analysts who work with users and the business units on applications. Bill s team does considerable work running queries and developing ad hoc reports. The team is also responsible for the company s dashboards, but managers have complained that the dashboards don t link well to their business strategies and they criticized the previous director about this. Bill has had several meetings with the CFO, who said repeatedly that Bill and his team must be sure that the business and BI strategies are aligned. Bill interprets this to mean that his team needs to support the business better in all that it does. This is fine at a high level, but Bill isn t sure how to translate this directive into action. What, exactly, should he do to make sure that there is alignment? Some of the approaches Bill has thought about involve organization structure, governance, creating a center of excellence, the skills mix of his staff, staff compensation (linking salary to meeting business outcomes, which the CFO would love), how information requirements are determined, and application development methodology. Based on your experiences, what are the most useful and innovative things that Bill can do? ALICIA ACEBO The problem that Bill Carroll is facing is one of the most common problems in business intelligence. The company has data, reports, and dashboards, but the business perception is that they do not have business intelligence. I say perception because the information might be there but it has not been properly presented or used. Bridging this gap is the first thing that Bill Carroll should address. BUSINESS INTELLIGENCE Journal vol. 18, No. 4 39
42 BI Experts perspective Data Governance Subject Area Data discovery What data? Understand what the data means Where is the data? Which is the system of record for the data? How clean is the data? How well-behaved is the data? Data model and data dictionary Build foundation Understand the relationships between the data Model data close to the real world (not the source system) Third normal form and objectoriented techniques Integrated with other subject areas as part of the enterprise model Implement standards for names and formats Transformation Determine transformation right-time load capabilities real-time vs. batch Load the lowest level of detail data available (to ensure any question can be answered) Build data warehouse as a time machine (versioned) Built-in automated monitoring capabilities Implement business rules only in one place Business intelligence Gather detail business requirements for reports, analytics, and dashboards Build reports, analytics, and dashboards User acceptance testing Provide data accessibility independent of BI personnel Architecture Standards Quality Figure 1: A sample process for building subject areas. Mainly BI responsibility Mostly BI responsibility, with some business involvement Both BI and the business are very involved and responsible Mostly the business s responsibility, with BI very involved Bill needs to be able to paint a high-level picture of the data warehouse in business terms so his users can understand what is available from an enterprise view, not just from reports. IT needs to proactively communicate to the business. These are Bill s clients and there is a sales job that needs to happen! As part of this sales effort, Bill should determine which are his best customers and target them with additional support. By best customers, I mean customers that have a need and are willing to work with IT to get business results. By the same token, developers in the BI group must understand the business. No other area in IT is more important. I strongly believe that a technical person who cannot understand the business is not valuable to any IT group. In business intelligence, the technical staff must be fluent in the business language and processes. Many companies share the issue Bill is facing as a result of technology personnel not understanding the business. How can Bill make this crosstraining happen? First, attitude. Attitude is one of the cornerstones of success. The BI team should act like a customer service department. Bill should encourage and reward this behavior and lead by example. Business users success is the success of BI. Bill should then select three business areas to which he can match his three support analysts for at least 50 percent of their time. These should be based on matching best customers with the developers best understanding of the business area. Work with the business users to determine what they need and how they envision using BI. The BI team needs to frame the request in IT-specific terms, and in the process, encourage both sides to learn from each other and work together to build the solution. Then both sides need to stay focused and committed to deliver that solution. At the end of the project, celebrate the success together. Business analysts and IT business intelligence analysts should be interchangeable. Encourage the business to write their own ad hoc queries and do their analysis. BI should build a business view layer to facilitate the business users work. Hire someone from the business side into BI to facilitate business understanding on the IT side. Help the business 40 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
43 BI Experts perspective hire analysts with technical skills. Blur the lines between BI, IT, and business. Bill s reputation for being analytical, bright, hardworking, and a good communicator will facilitate building that bridge. At this point, staff compensation based on meeting business outcomes would be a great idea for both the business and the BI teams working together. You must address several basic technical requirements to build a successful data warehouse. You need the data, and the data must be accessible, timely, and of sufficient detail to support the business s needs. I assume there is enough data to provide value to the business today. I also assume that Bill has a technical team he can depend on to provide the technical advice he needs. This is only the start of aligning business and BI strategies. To ensure success in the long term, some changes will have to be made. Bill needs to define a process that ensures business participation, support, and ownership. Figure 1 illustrates such a process for building the different subject areas. Put the right people and processes in place and start building the environment. For such a small, new team, Bill s involvement is vital, at least at the start. The new organization structure will fall into place later. He also needs to build relationships with the business decision makers to ensure their requests are heard and expectations and priorities are set. JIM GALLO It seems as though Bill s company is playing the guess what I m thinking game a game that I learned a long time ago cannot be won. Bill s first order of business is to solve the core issue of aligning his team s efforts with the business strategies. To do this, business executives not the BI team need to take ownership and responsibility for providing clarity by first decomposing the strategies to a more discrete level so Bill s team can build solutions that align with business needs. To this end, Bill should partner with the CFO to create a BI governance program comprised of business leaders whose mission is to: 1. Clearly articulate business strategies 2. Deconstruct the strategies into a set of goals and objectives 3. Identify the core measures or key performance indicators (KPIs) for each goal and objective 4. Prioritize the list of measures 5. Create a delivery road map and release a plan based on the priorities 6. Estimate the level of funding needed to deliver each part of the road map 7. Provide a continuous funding vehicle for Bill s team 8. Continuously assess and adjust the priorities based on business results and shifting goals In this way, the organization can create a level of specificity that Bill s team can identify with and support in the next layer of an aligned delivery model. Once the company s goals and measures are understood, Bill s team can engage with the business to drive the needs to a finer level of detail. They can use the target measures and KPIs to have a purposeful dialogue with their business constituents and work with them to describe the ways in which the measures are to be consumed (e.g., by time, by customer, by product, and so on). In other words, they need to identify the dimensions. This, in turn, will help them pinpoint the core set of data that supports the business s reporting, analysis, and other needs along with identifying the candidate source systems. What they re really doing is taking a business-driven development approach, working their way backward from business needs to source system data acquisition. Once the discrete measures and dimensions have been incorporated into a conceptual data model, dimensional attributes can be added to create a preliminary logical data model. I use the word preliminary purposefully because the next steps should neither be heads-down design and development nor the creation of a perfect data model. BUSINESS INTELLIGENCE Journal vol. 18, No. 4 41
44 BI Experts perspective Bill s team should turn their attention to creating a working prototype rapidly using a small data set culled from the candidate sources using the simplest methods possible to load the data into the target database. Similarly, they should not yet focus on the quality of the source data. Rather, their immediate goals are twofold: 1. Identify the data stewards and work with them to create the business names, definitions, and code and value sets to be used as the corporate standard 2. Use the prototype to validate business needs By working with the data stewards, Bill can create a set of metrics and conformed dimensions that can be used across the enterprise and be reused to accelerate future projects. Allowing business users to work with a prototype at the beginning of the project eliminates the risk that the final deliverable will not meet their needs. Expectations can be brought into alignment with a high degree of certainty because the users will have had an opportunity to create sample queries and reports and to exercise the dimensional hierarchies. This also helps minimize the more costly and risky part of BI solution delivery, namely data integration, because the minimal data set needed to satisfy the requirements will have been identified. The most useful thing that Bill can do, then, is to leverage his relationship with his executive sponsor (the CFO) to create a governance organization responsible for translating broad strategic initiatives into a discrete set of measures (strategy goals and objectives KPIs) for setting priorities and for providing a continuous funding source. The measures become the nexus for business needs and project delivery, both of which are validated and solidified by employing rapid prototyping techniques and data stewardship as precursors to design and development. What they re really doing is taking a business-driven development approach, working their way backward from business needs to source system data acquisition. These practices will help to create a harmonious alignment between the business and Bill s group and to change Bill s role from mind reader to valued partner. JANE GRIFFIN Bill certainly has a complex challenge before him. For many companies, the business environment is constantly evolving, and competitive pressures are greater than they ve ever been. Bill has been given the seemingly impossible job of meeting and exceeding everyone s expectations, and he s doing it without much experience. He has the people skills and he has the smarts to succeed, but he s going to need lots of help. The first thing Bill should do is realize that he is on a journey; it s unlikely he ll reach some sort of Ozlike destination where the company s BI needs are solved once and for all. Instead, Bill s attitude must be that his group will undertake the BI journey with the business, and that they will in a sense be the navigator on the trip. They ll help get the business where it needs to go in a timely, efficient manner. Bill has several good ideas. His notion to build a BI center of excellence is spot on. Tying compensation to outcomes is good, too, with the caveat that the goals must be realistic and measurable. His realization that he probably doesn t have the right mix of skills or a solid understanding of the business needs is correct, too. He shows that he has a good understanding of the issues he faces and a desire to tackle them head-on. The first step on the analytics journey is to get a clear picture of 42 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
45 BI Experts perspective the opportunities that exist for BI especially analytics to impact the business and to define an overarching strategy that can help deliver BI capabilities. Bill and his team must explore the business challenges in all departments and functions and paint a clear picture of them. Next, they ll need to peg the BI strategy to meet those challenges. Then, they ll need to assess their current BI capabilities and perform a gap analysis between where they are and where they need to be. Finally, they must develop a road map to get there. The first stop on this road map should be to build a BI/analytics center of excellence (COE). The COE is not simply a central reporting function. Instead, it serves as the governance and innovation center for the organization s BI efforts. It helps companies: Align their technology and corporate missions and leverage the power of their data to improve organization agility and flexibility via complex analytics and information access Provide quality information delivery via maintainable and scalable systems Guide wise, innovative investments that deliver on their promise To build and maintain the COE, however, Bill will have to meet one critical need right away. He will need people with the right set of skills on his team. Twenty even 10 years ago, there wasn t much interaction between IT and business users except for requirementsgathering exercises and report requests. A center of excellence is not simply a central reporting function. Instead, it serves as the governance and innovation center for the organization s BI efforts. That s changed. Now, business leads the technology effort, and management should have an almost symbiotic relationship with IT especially when it comes to the BI/analytics needed to provide the insight to move the organization forward. Bill s staff will need two qualities: they ll need to be highly competent technically in the latest BI tools and techniques, and they ll need to speak the language of the business. It s critical that those who staff the BI COE are able to understand the business and translate technical concepts into a language that business people can understand. That way, there will be minimal miscommunication about what the business needs are and how they will be met. Finally, Bill must make his BI team and their development techniques flexible and agile. The operating environment for most companies changes constantly, especially because globalization is now the norm. Also, competitive pressures are forcing businesses to change on a dime, just to keep up with often-fickle customer wants and product requests. Companies whose application development life cycles are extended may not be nimble enough to produce information systems that will provide the foresight the business needs to respond quickly to an ever-changing market. Such flexibility will require an agile development cycle that can quickly produce solutions to complex questions or issues that arise, often unforeseen, as well as systems that can be fine-tuned quickly to meet the future needs of the business. Bill is in for quite a ride. He ll experience ups and downs, and he ll have to backtrack more than once to get it right. In the end, though, if he gets good people around him, and he builds a COE to deliver BI/ analytics excellence to the organization, he can accomplish what he set out to do: he ll help IT form a successful partnership with the business and grow the bottom line for years to come. BUSINESS INTELLIGENCE Journal vol. 18, No. 4 43
46 BI Experts perspective BRIAN VALEYKO Bill brings fresh perspective to the BI team in this organization with his strong grasp of what the finance side of the organization requires from a data analytics perspective. This is a positive move and Bill can use his skills and knowledge to provide immediate benefits to the CFO by directing the team in how best to support finance. However, to be truly successful, Bill needs to quickly learn a few major things: What are the current technical capabilities of his team from a platform and tool perspective? Does the team have a strong cross-functional data model within the warehouse, a businessfriendly metadata layer, and a set of tools to provide all needed functions? What are the organizational goals outside of finance that need to be addressed, and who are the major constituents in these areas? What are the key metrics that drive business? (Keep in mind that a metric must be measurable and actionable in order to provide value to the organization s information consumers.) Armed with answers to these questions, Bill can begin to act upon the ideas he s expressed: Organization structure. Does he have the right skills on the team to provide the necessary services to all areas of the company? If not, he must determine what training or hiring is required to get to the goal state. Center of excellence. The current mode of operations within the organization is to have the BI team do ad hoc report creation. Bill must decide if this is the best, most scalable model or if his team should work to develop and provide training to business users across the company, effectively teaching the strong analysts how to fish versus providing them fish upon request. Staff compensation. Proving the value of a BI team to the company can be difficult. When successful, a BI team elevates the capabilities of the entire organization by enabling more effective, timely decision making in a tactical mode and better strategic planning as well. Rather than tie success to just the BI team, Bill should look at the revenue and margin growth in each area of the organization before and after the implementation of new BI capabilities (tools and training) and determine the impacts. If both the developers and users compensation are tied to leveraging information more effectively, it encourages the partnership needed to foster a strong BI strategy. Requirements gathering process. Understanding the roles and the data processes associated with each of those roles within a company is crucial to developing successful analytics. I ve found that developing process maps for each business role and the decision points within them is a great way to represent the areas to attack for a BI team. Using the appropriate format for the role and point in the process is key to user adoption. Care in this area will help to avoid future failures (i.e., dashboards that don t link to the business strategies). Another useful construct is the development of metric hierarchies, mapping the decision-making metrics used by each member of the executive team and understanding how these metrics are derived from lower-level operational details. By understanding how granular details from operations are aggregated into the metrics used for executive decision making, the BI team can quickly determine gaps, overlaps, and potential misunderstandings due to definition conflicts across business areas. Bill must also understand any manually supported processes needed to derive key metrics. Once identified, Bill should work to automate the processes and eliminate the manual effort and potential data quality risks involved in generating and maintaining each metric. 44 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
47 BI Experts perspective Application development methodology. Given that the existing team has a history of creating ad hoc reports for users and was able to build an executive dashboard (albeit unsuccessfully representing the organization s strategy), the implication is that the warehouse has a fairly complete set of data available for analysis. I suggest Bill concentrate his efforts on building businessfriendly metadata, creating data standards across the enterprise, and conducting training in the models and tools available to the user community. Self-service is much more scalable and allows the BI experts to inject their expertise throughout the organization in a manner that will effectively create demand as required by business users. business area leaders. Bill should also have his team train the business analyst community across the organization to be able to leverage the tools and data sets available to improve decision-making within the company. The BI team should also move away from the build it and they will come philosophy of report and dashboard delivery and toward a teach, guide, and enable mode of analytic capabilities across the entire platform of tools. By taking this approach, I believe Bill will succeed in bringing BI into alignment with the business s goals. Some areas will quickly adapt and adopt technologies to suit their needs and will leverage the BI team to help. Other areas may not change as quickly and thus will require less of the team. In time, leaders of each area will be judged by their contributions to corporate success. Those areas that have effectively used data to guide their strategic and tactical decisions will lead the pack. Bill has a great opportunity to bring financial perspective to the BI team and process. He needs to work with his team to understand what is possible given their technology stack and data models so he can communicate this information to BUSINESS INTELLIGENCE Journal vol. 18, No. 4 45
48 Enterprise Data Quality Implementing an Enterprise Data Quality Strategy Nancy Couture Nancy Couture is vice president, business intelligence at SquareTwo Financial, an asset recovery and management firm. Abstract This article focuses on the steps and key considerations for implementing an enterprise data quality strategy. Key concepts include: keep it simple, start with the basics, share results, maintain visibility, and identify opportunities for improving user confidence. Once a program produces visible results, it can then be expanded over time. Introduction Enterprise data management is a broad discipline that entails many concepts and capabilities, including data governance, master data management, data architecture, data quality, data security, and data integration. The goal of enterprise data management is to ensure that data across the organization is well defined, understood, available, consistent, secure, and usable. Over the years, the value of high-quality and consistent data has become recognized as an important strategic goal for many organizations. Thus, enterprise data quality has become an important part of good data management. To be truly successful, you must address data quality at the enterprise level. You must be able to trace issues back to root causes and fix them as close to the source as possible. This means that data quality programs must encompass the data warehouse, data marts, and BI (which are more visible to the company s data consumers) as well as the strategic data sources at the operations level. Data is interconnected across the organization, and it is vitally important to document and share the flow of data from sources to targets across any data governance group. These conditions are reflected in many of the deliverables of a robust and effective data governance program. A successful data quality program will include data governance, ongoing monitoring and measuring of the 46 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
49 Enterprise Data Quality Systems of record (per data domain) Collect and organize Future-State Vision Applications Data consumers Common, agreed-upon business definitions state of the data, publication of data quality metrics, and a commitment to continuous improvement. Embarking on such a program can be done simply, without significant fanfare and expense. Once your data quality program is mature enough to enable you to publish data quality metrics and trends, the program s value will become self-evident. The Role of Data Governance Data governance models can be simple or complex. I have had much success with simple data governance models that entail a working group of knowledgeable decision makers representing the major data consumer areas of the company. There are also complex models with identified data stewards and data governance tools that can be implemented. Regardless of your approach, the keys to successful data governance include the following processes and deliverables, which all feed into a successful enterprise data quality program: Stage EDW Business stakeholder representation Shared future-state vision Enterprise systems road map Business conceptual model Standardize Integrate Figure 1: A future-state vision. Reducing complexity means relationships between systems become clear and consistent. Identification of key data quality indicators Prioritization of data management initiatives Business stakeholder representation No matter who drives the data governance process for the company, the data governance group must include representation from all areas of the company that will consume data. These representatives must be committed to making decisions, communicating these decisions to their groups as appropriate, and bringing back their groups feedback. Ideally, the business stakeholders should drive the governance program. Often, however, IT facilitates the process to ensure that decisions are made and communicated across the organization. Future-state vision and enterprise systems road map These deliverables include the identification of strategic systems of record. Every organization should have an understanding of the current state of systems and data, an agreed-upon future-state vision, and a road map that identifies how to get from the current state to the future state. These current-state and future-state representations should include transaction processing systems and any associated data assets that they may feed by data domain. In a previous article in the Business Intelligence Journal (Couture, 2012), I described an approach to develop a future-state vision and road map iteratively, starting with the documentation of the current state. This agreed-upon future-state vision will reveal where to focus data quality efforts. It is important to focus development on those assets that have been identified in the future-state vision and thus are strategic for the organization. The future-state vision in Figure 1 clearly identifies systems of record that are needed for data integration and analytics (see the left side). As a result, the data warehouse and associated systems of record should be a BUSINESS INTELLIGENCE Journal vol. 18, No. 4 47
50 Enterprise Data Quality Dimensions Account Business Capabilities Customer Date Figure 2: A sample business conceptual model. primary focus for data quality efforts, as defined by the governance group. Although data quality issues are often found in the data warehouse or resulting reports, the goal should be to fix the data at the source thus the need for an enterprise view. Identifying strategic systems of record helps make this goal more achievable. The systems and data assets that are in the current state but do not make it into the future-state vision will require active management and decommission goals within the enterprise systems road map. A business conceptual model Your model should include the major business capabilities across the organization as well as key attributes to support these capabilities. Once the governance group agrees on the business conceptual model, you can focus on key areas. Your review of the conceptual model will make it easier to identify key data components that require high levels of quality and determine their priority in any data quality assessment program. A set of common, agreed-upon business definitions Once the business conceptual model is defined, the governance group must develop definitions for each business capability and all supporting attributes. At this point, the members of the governance group are starting to use the same data language in their discussions. This also enables the group to identify, select, and prioritize the focus areas of your data quality program. Identification of key data quality indicators It is extremely difficult, if not impossible, to assess and report on the data quality of every component of enterprise data. However, once the governance group has agreed upon the conceptual model of the business and the business definitions of the model components, they will be able to identify the most important focus areas for your data quality assessment program, as well as how these areas will be measured and assessed. Prioritization of data management initiatives, including data quality This is an ongoing governance responsibility. The governance group can start by prioritizing projects and initiatives: data to be put in the data warehouse, BI capabilities to enable, enhancements to be implemented, and so on. Eventually, once the key deliverables have been developed and agreed upon, the governance group will be 48 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
51 Enterprise Data Quality able to start to prioritize the steps associated with developing a robust data quality assessment program. This includes identifying what dimensions of data quality to address as well as what key attributes to measure, as applicable. Data Quality Program Implementation There are many dimensions of data quality that can be addressed as part of a data quality assessment program. Data quality itself can be defined as fitness for use, a very broad definition that entails many aspects of quality. Data and information quality thinkers have adopted the word dimension to identify those aspects of data that can be measured and through which its quality can be quantified. While different experts have proposed different sets of data quality dimensions almost all include some version of accuracy and validity, completeness, consistency, and currency or timeliness among them. (Sebastian-Coleman, 2013) Rather than trying to focus on every dimension, start by focusing on the basics of completeness and timeliness, then move on to validity and consistency. These four dimensions can truly enhance the quality of enterprise data as well as stakeholders confidence in the data they consume. These four are basic dimensions that can be expanded upon over time. Completeness is first and foremost. It s an absolute necessity for any enterprise data warehouse. Stakeholders need to know that what s in the source is accounted for in the target. You can ensure completeness in a variety of ways. For example, a record-balancing capability could be developed that records a count at the end of one flow and at the beginning of another to ensure all records are accounted for (number of records in = number of records out). The ultimate goal is to validate that every record and its corresponding information from a source is handled appropriately during processing. This source-to-target validation must be monitored and reported to the organization s data consumers. The set of results from record balancing is one measurement. Another could be to compare the summarized data in a quantity field to the summarized amount provided in a control report. Regardless of the approach, it should be one that the governance group approves, and the results should be measured and shared to data consumers as part of the data quality metrics. Timeliness must be a component of service-level agreements (SLAs) and must identify such criteria as acceptable levels of data latency, frequency of data updates, and data availability. SLAs should be reviewed and approved by the governance group and broadly published to data consumers. Timeliness can then be measured against these defined SLAs and shared as part of the data quality metrics. Validity is a key data quality measure that indicates the correctness of the actual data content; for example, confirming that all the characters in a telephone number field are digits, not alphabetic characters. This is the concept that most data consumers think about when they envision data quality. Validity can be assessed through data profiling, data cleansing, and inline data quality checks. Data profiling can be used as a starting point for measuring validity. Data profiling is a specific kind of data analysis used to discover information about a particular set of data. The process can uncover potential issues and provide valuable insight into your data. It can summarize details about large data sets from different angles. To support the concept of validity, data profiling includes the inspection of data content through column profiling or value distributions. Data profiling is often used at the beginning of a data project. However, periodic re-profiling of source data can also be useful. Data cleansing can also be used to address data validity. Data cleansing may include identity resolution, deduplication, and name-and-address standardization. This process is usually developed as part of the source-to-target ETL processing for a data warehouse. A thorough data quality program includes a mechanism that provides feedback to the source of the data. To ensure a truly robust data quality program, inline data quality checks should be developed. Inline data quality monitoring entails ongoing measurement of data BUSINESS INTELLIGENCE Journal vol. 18, No. 4 49
52 Enterprise Data Quality Common business definitions (MDM) (a recommended first step) Data dictionary Timeliness Defined SLAs Reviewed and approved Monitored and reported Consistency key to continued confidence Inline data quality Trended Monitored and reported Governance Governance Completeness a basic necessity Source-to-target validation Monitored and reported Governance Governance Validity let the business define the focus businesscritical quality indicators Data profiling Data cleansing Inline data quality checks Monitored and reported (SPC) Start with one or two and build design patterns Figure 3: Data quality dimensions. as it passes through the ETL (extract, transform, load) processes that prepare the data to be loaded into the target data warehouse. The checks can be: Comparisons between incoming values and expected, valid values Comparisons of incoming data values to values defined within a stated range Validity checks based on specific algorithms Inline data quality checks should be developed incrementally. One or two key attributes should be identified and implemented as top priority by the governance group. This is where the future-state vision and business conceptual model can assist. The inline data quality checks should be developed as reusable modules and expanded over time. For example, the governance group can identify two key attributes to monitor. One may compare valid values to expected values; the other may check that a field s value is within a defined range. Once these checks are in place and being actively monitored, the governance group may identify two additional key attributes to monitor. The initial two modules should be developed so they can be reused and implemented quickly for the two additional checks. Over time, additional validity checks can be developed (using business rules, for example, or comparing multiple attributes within the rule). These checks should also be developed to be reusable. Regardless of the type of inline data quality check, the measures should be recorded at the earliest possible point in the data flow to ensure that major issues can be caught and addressed before processing is complete. You can set alerts depending on the validity checks used; for example, messages can be sent to an operations team for follow-up assessment. Likewise, certain identified events may even cause an ETL process to stop. In one situation many years ago, our inline data quality monitoring process found an issue and automatically stopped an ETL process because the number of defaults for a particular key business attribute was much larger than expected. This was an indicator of an issue with a new data source we were processing and loading into the warehouse. We were able to identify the root cause and fix the issue before data was actually loaded into the data warehouse. As a result, we avoided several weeks of re-work, not to mention the ensuing downgrade in 50 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
53 Enterprise Data Quality stakeholder confidence in the data that this issue could have caused. The results of the inline data quality checks should be measured and shared as part of the data quality metrics. Consistency is crucial to continued consumer confidence. Once data quality metrics are being monitored and reported to the business stakeholders for completeness, timeliness, and validity, then consistency can be measured by assessing changes in these patterns over time. One way to do this is to track changes in the completeness, timeliness, and validity assessments, and to identify overall quality trends. These results should be added to the data quality metrics reporting that is shared with business stakeholders. In a prior data warehouse project, our team measured and reported several data quality metrics and trends. Over time, we were able to show the continuing improvement in the quality of the data we were measuring. This, of course, improved business stakeholder confidence in the quality of our data. Data Quality Metrics Lead to Data Quality Confidence Complete transparency of data quality metrics and reporting to your organization s data consumers will lead to greater confidence in the quality of the underlying data. Often, data consumers hear of a data quality issue and exaggerate that into general negativity about the quality of the data as a whole. I knew a data consumer who used to say in every meeting that the data was unusable. As we provided him with the actual data quality metrics, he eventually realized that there were a few issues that needed to be addressed but that the overall quality of the data was acceptable. We were able to counteract his comments and eventually change his beliefs with facts. assess and address them as appropriate. A third goal is to proactively identify opportunities for improvement that can be presented to the governance group and prioritized. Stakeholder confidence will continue to increase if you are able to proactively identify issues before the data consumers find them. This is one of the greatest achievements of a robust data quality program. The Journey Continues Once completeness, timeliness, initial validity checks, and consistency metrics are in place, you can continue to identify key indicators, add them to your inventory, and monitor them using profiling, inline data quality checks, or other data quality assessment tools. One enterprise data warehouse initiative I was involved in started with a handful of metrics. Over the data warehouse s lifetime, several hundred measures were recorded and automatically assessed. The automated process identified variances to expectations and required follow-up only on those variances. What began as a grassroots initiative grew to a visible and important metric for the company, with very little cost or complexity at the start. References Couture, Nancy [2012]. Reducing Data Management Complexity in the Enterprise, Business Intelligence Journal, Volume 17, Number 1. Sebastian-Coleman, Laura [2013]. Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework, Morgan Kaufmann. In addition to supporting confidence in the quality of data, metrics can support other goals. One goal is to monitor the quality of data to ensure that it continues to meet expectations. Another goal is to ensure that changes in the data that might indicate a data quality issue are detected as early as possible so we can quickly BUSINESS INTELLIGENCE Journal vol. 18, No. 4 51
54 instructions for authors Editorial Calendar and Instructions for Authors The Business Intelligence Journal is a quarterly journal that focuses on all aspects of data warehousing and business intelligence. It serves the needs of researchers and practitioners in this important field by publishing surveys of current practices, opinion pieces, conceptual frameworks, case studies that describe innovative practices or provide important insights, tutorials, technology discussions, and annotated bibliographies. The Journal publishes educational articles that do not market, advertise, or promote one particular product or company. Editorial Topics Journal authors are encouraged to submit articles of interest to business intelligence and data warehousing professionals, including the following timely topics: Agile BI Architecture and deployment (including cloud computing, software-as-a-service, Hadoop, MapReduce) BI adoption and use BI and big data Data analysis and delivery Data design and integration Data management: MDM, data quality, and data governance Editorial Acceptance All articles are reviewed by the Journal s editors before they are accepted for publication. The publisher will copyedit the final manuscript to conform to its standards of grammar, style, format, and length. Articles must not have been published previously either online or in printed form. Submission of a manuscript implies the authors assurance that the same work has not been submitted elsewhere, nor will be submitted elsewhere during the Journal s evaluation. Authors will be required to sign a release form before the article is published; this agreement is available upon request (contact [email protected]). The Journal will not publish articles that market, advertise, or promote one particular product or company. Submissions For more information and complete submissions guidelines, please visit tdwi.org/journalsubmissions. Materials should be submitted to: Jennifer Agee, Managing Editor [email protected] Data warehouse and database technologies Mobile BI Project management and planning Selling and justifying the data warehouse Upcoming Submissions Deadlines Volume 19, Number 2 Submission deadline: February 21, 2014 Distribution: June 2014 Volume 19, Number 3 Submission deadline: May 16, 2014 Distribution: September BUSINESS INTELLIGENCE Journal vol. 18, No. 4
55 Data Variety Data Variety: The Spice of Insight David Stodder David Stodder is director of TDWI Research for business intelligence, focusing on insight and best practices for organizations implementing BI, analytics, performance management, and related technologies and methods. With big data, it s easy to obsess over sheer volume. A hospital is developing electronic health records (EHRs) that will ultimately contain 20 petabytes of patient data; a leading social network is logging and analyzing nearly 200 petabytes of raw customer data per year. Amid no shortage of controversy, the U.S. National Security Agency reports that it is touching 29 petabytes of data daily, which, the agency points out, does not even amount to 2 percent of the 1,826 petabytes of information flowing every day over the Internet. Yet, hiding in this volume is an even more awesome big data V : variety. Acquisition of new and nontraditional data types is a major challenge; it is one of the primary drivers behind today s incredible growth in data volumes. TDWI s Technology Survey of attendees at the August 2013 World Conference in San Diego bears this out. The 120 BI and data warehousing professionals responding to the survey identified data variety and complexity as their most intense big data challenges. Data volume was next, followed by data distribution. Of course, volume, variety, and distribution often go together. Different data types are typically sourced from a diverse array of applications, data files, multimedia, machine data streams, content stores, and more, which taken together can add up to sizeable volumes. Healthcare providers EHRs, for example, will evolve to include such items as patient DNA, X-rays, biometrics, and physicians notes to go along with transactions, claims, and billing data. Right now, many of these sources are held in separate systems that are not integrated into EHRs. Providing a single view of information across diverse and distributed BUSINESS INTELLIGENCE Journal vol. 18, No. 4 53
56 Data Variety structured data sources can be tough enough; doing so for a wide variety of data types will be even more difficult. Expanding Role for Text Analytics and Search For most organizations, textual content still accounts for the lion s share of what lies beyond the realm of database and data warehouse systems that manage primarily structured relational data. The imperative to analyze massive data files is often what brings Hadoop, MapReduce, and NoSQL key-value store technologies into the environment. These enable data professionals to access raw data without the interference of predetermined schema. Customer satisfaction surveys, for example, are the most common data sources monitored for customer analytics, according to TDWI Research, with call or contact center interaction records almost as prevalent (sales transactions are the second most common). Text analytics implementations have been maturing rapidly to enable organizations to increase the speed, depth, and consistency of analysis for such content, far beyond what could be done manually. Analysis of interaction sources is giving organizations insights into how they can improve the quality of customer experiences, discover sooner why service calls may be increasing, and learn how to improve loyalty. Leading firms are using text analytics and data mining to uncover patterns by looking at data from multiple sales and customer engagement channels, not just one call center. With the importance of online commerce and interaction, organizations need to analyze website log files and clickstreams so they can see where visitors are coming from, what they are doing while visiting a site, and what actions they are taking before a purchase. Using data from cookies, they can observe where visitors went when they left and what they do when they return. The imperative to analyze these massive data files is often what brings Hadoop, MapReduce, and NoSQL key-value store technologies into the environment. These enable data professionals to access raw data without the interference of predetermined schema. Also in the content mix are JSON data and documents; developers increasingly favor this text-based standard over XML for humanreadable data interchange. Along with text analytics, search tools and interfaces are vital for exploration and navigation of content. If integrated with BI systems, search tools can employ indexes to help users locate BI reports or other objects in structured data more efficiently than through queries. Search-based data discovery, using tagging or labeling to describe the data, is becoming an important capability in tools alongside functionality for structured, metadataenabled examination. In addition, organizations are examining how online customers are using search on their online sites; they are looking at customers search behavior to measure how efficiently they are able to find what they are looking for and whether refinements need to be made to taxonomies and other classification systems. Making Data Variety Valuable for Users Access to a greater variety of data can potentially enable data scientists, analysts, and business users to derive insights that they would not have uncovered in a single or small number of structured data sources. However, organizations need to avoid swamping users with yet more data that is even harder to decipher. Here are three 54 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
57 Data Variety steps data professionals can take to improve chances for success: Understand the information supply chain. Find out how various types of data enter your organization. Look at how data flows through your organization and what users typically do with it. You will get a better idea of where and when different types of data are most important to users processes and what quality issues may exist. Set user expectations. Users generally know what to expect from reports based on structured data. Text, content, and other multimedia sources are different; analysis of these sources is less precise and results depend on the variables selected. Make sure users know what they are getting. Improve data s relevance by working closely with users. Through tighter collaboration, data professionals can shorten the path to insight by knowing which sources and types of data are most relevant to users concerns. The Spice of Data Variety: Use It Wisely Data variety is essential to performing advanced analytics and realizing uncommon insights that can drive business innovation. If organizations cannot access and analyze the variety of data available to them, then amassing volumes of it will provide little value. However, technology implementation practices must be carefully aligned with users needs to avoid overwhelming users with the powerful spice of data variety. BUSINESS INTELLIGENCE Journal vol. 18, No. 4 55
58 BI Statshots StatShots TDWI Technology Survey: Agility and Business/IT Collaboration David Stodder, TDWI Research Director Implementation of agile software development methods was covered extensively in educational sessions at the TDWI World Conference in San Diego. Agile method implementation offers the added benefit of improving business/it collaboration. Participation in agile teams gives business sponsors a continuous presence in development, and therefore the ability to more fully control a project s direction and results. The Technology Survey that TDWI distributed in San Diego focused on questions about business/it collaboration and the role of agile methods. We received some great input; here s a sampling of responses: In the TDWI community, alignment between business users and IT is better than average. On a scale of 1 5, with 1 being poor and 5 being the best, the majority of survey respondents (43%) selected 3 and 27% chose 4. Although these results show there is room for improvement, our sampling of attendees suggests that in this community, business/it alignment is not bad. Good alignment and collaboration are vital to establishing the cultural context for improving agility. Shadow BI and analytics systems are a relatively high IT concern. With users in marketing, product development, and other functions and lines of business urgently needing data insights to drive strategy and operations, they are pursuing alternatives to joining the IT development backlog. After years of user-driven data mart and spreadmart development, it comes as little surprise that shadow IT systems growing up outside of IT governance remain a concern. The largest share of respondents (35%) rated it a 4 on a scale of 1 5; another 19% rated it a 5 (see Figure 1). Increasing data quality, reducing time-to-value, and enabling self-service BI and analytics are top priorities for improving agility. We asked attendees to rate the importance of making improvements in a dozen key areas for giving users higher information agility. Data quality came out as the highest priority; 65% said it was very important and 24% said it was somewhat important. At the other end, enabling Hadoop/NoSQL data access and analysis appears to be the least important, with 34% deeming it not important. More than half of respondents say they are currently implementing agile methods. Supporting anecdotal research at the conference, our survey finds that agile method implementation is spreading. Just over half (53%) are currently implementing agile methods, and 12% plan to do so. About a quarter (23%) indicated interest, while just 5% have no interest in agile. How would you rate the degree of IT concern in your organization about shadow BI and analytics systems that is, those that line-of-business users have built and are implementing outside of IT approval from a data governance, quality, security, and/or management standpoint? (Please rate 1 5, with 1 being the least amount of concern and 5 being the highest.) N/A 19% 35% 3% Figure 1. Based on 121 respondents % 3 12% 26% 56 BUSINESS INTELLIGENCE Journal vol. 18, No. 4
59 CERTIFIED BUSINESS INTELLIGENCE PROFESSIONAL TDWI CERTIFICATION Get Recognized as an Industry Leader Advance your career with CBIP Professionals holding a TDWI CBIP certification command an average salary of $113,500 more than $8,200 greater than the average for non-certified professionals TDWI Salary, Roles, and Responsibilities Report Distinguishing yourself in your career can be a difficult yet rewarding task. Let your résumé show that you have the powerful combination of experience and education that comes from the BI and DW industry s most meaningful and credible certification program. Become a Certified Business Intelligence Professional today! Find out how to advance your career with a BI certification credential from TDWI. Take the first step: visit tdwi.org/cbip. tdwi.org/cbip
60 TDWI Partners These solution providers have joined TDWI as special Partners and share TDWI s strong commitment to quality and content in education and knowledge transfer for business intelligence and data warehousing.
Ten Mistakes to Avoid
EXCLUSIVELY FOR TDWI PREMIUM MEMBERS TDWI RESEARCH SECOND QUARTER 2014 Ten Mistakes to Avoid In Big Data Analytics Projects By Fern Halper tdwi.org Ten Mistakes to Avoid In Big Data Analytics Projects
DATA VISUALIZATION AND DISCOVERY FOR BETTER BUSINESS DECISIONS
TDWI research TDWI BEST PRACTICES REPORT THIRD QUARTER 2013 EXECUTIVE SUMMARY DATA VISUALIZATION AND DISCOVERY FOR BETTER BUSINESS DECISIONS By David Stodder tdwi.org EXECUTIVE SUMMARY Data Visualization
The 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
BI Dashboards the Agile Way
BI Dashboards the Agile Way Paul DeSarra Paul DeSarra is Inergex practice director for business intelligence and data warehousing. He has 15 years of BI strategy, development, and management experience
Understanding Your Customer Journey by Extending Adobe Analytics with Big Data
SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction
WHITE PAPER. The 7 Deadly Sins of. Dashboard Design
WHITE PAPER The 7 Deadly Sins of Dashboard Design Overview In the new world of business intelligence (BI), the front end of an executive management platform, or dashboard, is one of several critical elements
How To Use Social Media To Improve Your Business
IBM Software Business Analytics Social Analytics Social Business Analytics Gaining business value from social media 2 Social Business Analytics Contents 2 Overview 3 Analytics as a competitive advantage
Big Data Integration: A Buyer's Guide
SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology
A Road Map for Advancing Your Career
CERTIFIED BUSINESS INTELLIGENCE PROFESSIONAL TDWI CERTIFICATION A Road Map for Advancing Your Career Get recognized as an industry leader. Get ahead of the competition. Advance your career with CBIP. Professionals
Delivering new insights and value to consumer products companies through big data
IBM Software White Paper Consumer Products Delivering new insights and value to consumer products companies through big data 2 Delivering new insights and value to consumer products companies through big
W H I T E P A P E R C l i m a t e C h a n g e : C l o u d ' s I m p a c t o n I T O r g a n i z a t i o n s a n d S t a f f i n g
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R C l i m a t e C h a n g e : C l o u d ' s I m p a c t o n I T O r g a n i z a
BIG DATA + ANALYTICS
An IDC InfoBrief for SAP and Intel + USING BIG DATA + ANALYTICS TO DRIVE BUSINESS TRANSFORMATION 1 In this Study Industry IDC recently conducted a survey sponsored by SAP and Intel to discover how organizations
A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY
A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing
Big Data. Fast Forward. Putting data to productive use
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
PDF PREVIEW EMERGING TECHNOLOGIES. Applying Technologies for Social Media Data Analysis
VOLUME 34 BEST PRACTICES IN BUSINESS INTELLIGENCE AND DATA WAREHOUSING FROM LEADING SOLUTION PROVIDERS AND EXPERTS PDF PREVIEW IN EMERGING TECHNOLOGIES POWERFUL CASE STUDIES AND LESSONS LEARNED FOCUSING
Big Data at Cloud Scale
Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For
Architected Blended Big Data with Pentaho
Architected Blended Big Data with Pentaho A Solution Brief Copyright 2013 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information,
SEYMOUR SLOAN IDEAS THAT MATTER
SEYMOUR SLOAN IDEAS THAT MATTER The value of Big Data: How analytics differentiate winners A DATA DRIVEN FUTURE Big data is fast becoming the term keeping senior executives up at night. The promise of
Data Virtualization: Achieve Better Business Outcomes, Faster
White Paper Data Virtualization: Achieve Better Business Outcomes, Faster What You Will Learn Over the past decade, businesses have made tremendous investments in information capture, storage, and analysis.
Using and Choosing a Cloud Solution for Data Warehousing
TDWI RESEARCH TDWI CHECKLIST REPORT Using and Choosing a Cloud Solution for Data Warehousing By Colin White Sponsored by: tdwi.org JULY 2015 TDWI CHECKLIST REPORT Using and Choosing a Cloud Solution for
Big Data and Your Data Warehouse Philip Russom
Big Data and Your Data Warehouse Philip Russom TDWI Research Director for Data Management April 5, 2012 Sponsor Speakers Philip Russom Research Director, Data Management, TDWI Peter Jeffcock Director,
and Analytic s i n Consu m e r P r oducts
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.988.7900 F.508.988.7881 www.idc-mi.com Creating Big O p portunities with Big Data and Analytic s i n Consu m e r P r oducts W H I T E
Evolving Data Warehouse Architectures
Evolving Data Warehouse Architectures In the Age of Big Data Philip Russom April 15, 2014 TDWI would like to thank the following companies for sponsoring the 2014 TDWI Best Practices research report: Evolving
CONNECTING DATA WITH BUSINESS
CONNECTING DATA WITH BUSINESS Big Data and Data Science consulting Business Value through Data Knowledge Synergic Partners is a specialized Big Data, Data Science and Data Engineering consultancy firm
VIEWPOINT. High Performance Analytics. Industry Context and Trends
VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations
BI-based Organizations 4 Hugh J. Watson. Beyond Business Intelligence 7 Barry Devlin
Volume 15 Number 2 2nd Quarter 2010 THE LEADING PUBLICATION FOR BUSINESS INTELLIGENCE AND DATA WAREHOUSING PROFESSIONALS BI-based Organizations 4 Hugh J. Watson Beyond Business Intelligence 7 Barry Devlin
Getting Started Practical Input For Your Roadmap
Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson
Big Data & the Cloud: The Sum Is Greater Than the Parts
E-PAPER March 2014 Big Data & the Cloud: The Sum Is Greater Than the Parts Learn how to accelerate your move to the cloud and use big data to discover new hidden value for your business and your users.
Unlock the business value of enterprise data with in-database analytics
Unlock the business value of enterprise data with in-database analytics Achieve better business results through faster, more accurate decisions White Paper Table of Contents Executive summary...1 How can
Integrating SAP and non-sap data for comprehensive Business Intelligence
WHITE PAPER Integrating SAP and non-sap data for comprehensive Business Intelligence www.barc.de/en Business Application Research Center 2 Integrating SAP and non-sap data Authors Timm Grosser Senior Analyst
Agenda Overview for Digital Commerce, 2015
G00270685 Agenda Overview for Digital Commerce, 2015 Published: 18 December 2014 Analyst(s): Jennifer Polk Marketing is making a greater impact on, and taking more responsibility for, digital commerce.
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
Apache Hadoop Patterns of Use
Community Driven Apache Hadoop Apache Hadoop Patterns of Use April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Big Data: Apache Hadoop Use Distilled There certainly is no shortage of hype when
Successful Outsourcing of Data Warehouse Support
Experience the commitment viewpoint Successful Outsourcing of Data Warehouse Support Focus IT management on the big picture, improve business value and reduce the cost of data Data warehouses can help
TDWI: BUSINESS INTELLIGENCE & DATA WAREHOUSING EDUCATION EUROPE
TDWI: BUSINESS INTELLIGENCE & DATA WAREHOUSING EDUCATION EUROPE TDWI In-Depth Courses 1st Half 2016 In-Depth course: Data Visualization In-Depth course: Big Data In-Depth course: Hadoop CBIP Preparation
!!!!! BIG DATA IN A DAY!
BIG DATA IN A DAY December 2, 2013 Underwritten by Copyright 2013 The Big Data Group, LLC. All Rights Reserved. All trademarks and registered trademarks are the property of their respective holders. EXECUTIVE
Making confident decisions with the full spectrum of analysis capabilities
IBM Software Business Analytics Analysis Making confident decisions with the full spectrum of analysis capabilities Making confident decisions with the full spectrum of analysis capabilities Contents 2
Streamline your supply chain with data. How visual analysis helps eliminate operational waste
Streamline your supply chain with data How visual analysis helps eliminate operational waste emagazine October 2011 contents 3 Create a data-driven supply chain: 4 paths to insight 4 National Motor Club
End Small Thinking about Big Data
CITO Research End Small Thinking about Big Data SPONSORED BY TERADATA Introduction It is time to end small thinking about big data. Instead of thinking about how to apply the insights of big data to business
Disrupt or be disrupted IT Driving Business Transformation
Disrupt or be disrupted IT Driving Business Transformation Gokula Mishra VP, Big Data & Advanced Analytics Business Analytics Product Group Copyright 2014 Oracle and/or its affiliates. All rights reserved.
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
WHY IT ORGANIZATIONS CAN T LIVE WITHOUT QLIKVIEW
WHY IT ORGANIZATIONS CAN T LIVE WITHOUT QLIKVIEW A QlikView White Paper November 2012 qlikview.com Table of Contents Unlocking The Value Within Your Data Warehouse 3 Champions to the Business Again: Controlled
Agenda Overview for Marketing Management, 2015
G00270720 Agenda Overview for Marketing Management, 2015 Published: 18 December 2014 Analyst(s): Richard Fouts Increased participation in strategic business decisions and an evolving organization put new
The Necessary Skills for Advanced Analytics 4 Hugh J. Watson. BI Dashboards the Agile Way 8
EXCLUSIVELY FOR TDWI PREMIUM MEMBERS Volume 17 Number 4 4th Quarter 2012 The leading publication for business intelligence and data warehousing professionals The Necessary Skills for Advanced Analytics
Analytics in Days White Paper and Business Case
Analytics in Days White Paper and Business Case Analytics Navigating the Maze Analytics is hot. It seems virtually everyone needs or wants it, but many still aren t sure what the business case is or how
Real People, Real Insights SAP runs analytics solutions from SAP
Real People, Real Insights SAP runs analytics solutions from SAP Michael Golz CIO Americas, SAP Responsible for both IT service delivery and innovative solutions, Michael discusses the changing role of
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
Introducing Oracle Exalytics In-Memory Machine
Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle
Five Technology Trends for Improved Business Intelligence Performance
TechTarget Enterprise Applications Media E-Book Five Technology Trends for Improved Business Intelligence Performance The demand for business intelligence data only continues to increase, putting BI vendors
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Make Analytics Pervasive in Your Organization
Make Analytics Pervasive in Your Organization A focused, interactive event to help you expand the impact of analytics across your enterprise and fuel data-driven innovation. February 1 2, 2016 Learn Discover
Business Intelligence Solutions for Gaming and Hospitality
Business Intelligence Solutions for Gaming and Hospitality Prepared by: Mario Perkins Qualex Consulting Services, Inc. Suzanne Fiero SAS Objective Summary 2 Objective Summary The rise in popularity and
IBM Analytical Decision Management
IBM Analytical Decision Management Deliver better outcomes in real time, every time Highlights Organizations of all types can maximize outcomes with IBM Analytical Decision Management, which enables you
CIOs & BIG DATA. CIOs & BIG DATA WHAT YOUR IT TEAM WANTS YOU TO KNOW
S CIOs & BIG DATA CIOs & BIG DATA WHAT YOUR IT TEAM WANTS YOU TO KNOW Foreword Today enterprises face an increasingly competitive and erratic global business environment, and Big Data has become more than
5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK
5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK CUSTOMER JOURNEY Technology is radically transforming the customer journey. Today s customers are more empowered and connected
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Sources: Summary Data is exploding in volume, variety and velocity timely
1 Sources: The Guardian, May 2010 IDC Digital Universe, 2010 IBM Institute for Business Value, 2009 IBM CIO Study 2010 TDWI: Next Generation Data Warehouse Platforms Q4 2009 Summary Data is exploding
Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA
Applied Business Intelligence Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA Agenda Business Drivers and Perspectives Technology & Analytical Applications Trends Challenges
Business Analytics and the Nexus of Information
Business Analytics and the Nexus of Information 2 The Impact of the Nexus of Forces 4 From the Gartner Files: Information and the Nexus of Forces: Delivering and Analyzing Data 6 About IBM Business Analytics
Integrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
Kea Influencer Relations and Marketing for High-Tech & Technology Providers
Kea Analyst Relations Industry analysts play a key role in defining markets and educating buyers. We work with clients to identify and track the most influential and relevant industry analysts, and advise
Using Predictive Analytics to Increase Profitability Part III
Using Predictive Analytics to Increase Profitability Part III Jay Roy Chief Strategy Officer Practical Intelligence for Ensuring Profitability Fall 2011 Dallas, TX Table of Contents A Brief Review of Part
The Top 9 Ways to Increase Your Customer Loyalty
Follow these and enjoy an immediate lift in the loyalty of your customers By Kyle LaMalfa Loyalty Expert and Allegiance Best Practices Manager What is the Key to Business Success? Every company executive
Accelerate BI Initiatives With Self-Service Data Discovery And Integration
A Custom Technology Adoption Profile Commissioned By Attivio June 2015 Accelerate BI Initiatives With Self-Service Data Discovery And Integration Introduction The rapid advancement of technology has ushered
The 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
Business Intelligence Maturity Model. Wayne Eckerson Director of Research The Data Warehousing Institute [email protected]
Business Intelligence Maturity Model Wayne Eckerson Director of Research The Data Warehousing Institute [email protected] Purpose of Maturity Model If you don t know where you are going, any path will
The Definitive Guide to Data Blending. White Paper
The Definitive Guide to Data Blending White Paper Leveraging Alteryx Analytics for data blending you can: Gather and blend data from virtually any data source including local, third-party, and cloud/ social
Achieving Business Value through Big Data Analytics Philip Russom
Achieving Business Value through Big Data Analytics Philip Russom TDWI Research Director for Data Management October 3, 2012 Sponsor 2 Speakers Philip Russom Research Director, Data Management, TDWI Brian
Three proven methods to achieve a higher ROI from data mining
IBM SPSS Modeler Three proven methods to achieve a higher ROI from data mining Take your business results to the next level Highlights: Incorporate additional types of data in your predictive models By
Effective Workforce Development Starts with a Talent Audit
Effective Workforce Development Starts with a Talent Audit By Stacey Harris, VP Research September, 2012 Introduction In a recent survey of CEO s, one in four felt they were unable to pursue a market opportunity
Using Predictive Analytics to Increase Profitability Part II
Using Predictive Analytics to Increase Profitability Part II Jay Roy Chief Strategy Officer Practical Intelligence for Ensuring Profitability Fall 2011 Dallas, TX Table of Contents A Brief Review of Part
Big Data: How can it enhance your strategy?
7 Big Data: How can it enhance your strategy? Practice Area: IT Strategy Topic Area: Big Data Connecting the data dots for better strategic decisions Data is essential for organisations looking for answers
Five Reasons Spotfire Is Better than Excel for Business Data Analytics
Five Reasons Spotfire Is Better than Excel for Business Data Analytics A hugely versatile application, Microsoft Excel is the Swiss Army Knife of IT, able to cope with all kinds of jobs from managing personal
Predictive Analytics for Donor Management
IBM Software Business Analytics IBM SPSS Predictive Analytics Predictive Analytics for Donor Management Predictive Analytics for Donor Management Contents 2 Overview 3 The challenges of donor management
IBM Global Business Services Microsoft Dynamics CRM solutions from IBM
IBM Global Business Services Microsoft Dynamics CRM solutions from IBM Power your productivity 2 Microsoft Dynamics CRM solutions from IBM Highlights Win more deals by spending more time on selling and
DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases
DATAMEER WHITE PAPER Beyond BI Big Data Analytic Use Cases This white paper discusses the types and characteristics of big data analytics use cases, how they differ from traditional business intelligence
Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator
Accelerate your Big Data Strategy Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Enterprise Data Hub Accelerator enables you to get started rapidly and cost-effectively with
How To Get Started With Customer Success Management
A Forrester Consulting Thought Leadership Paper Commissioned By Gainsight April 2014 How To Get Started With Customer Success Management Table Of Contents Four Actionable Steps To Setting Up Your Customer
What s Trending in Analytics for the Consumer Packaged Goods Industry?
What s Trending in Analytics for the Consumer Packaged Goods Industry? The 2014 Accenture CPG Analytics European Survey Shows How Executives Are Using Analytics, and Where They Expect to Get the Most Value
White Paper 8 STEPS TO CLOUD 9. How a hybrid approach can maximise the business value of cloud and what you can do to make it happen
White Paper 8 STEPS TO CLOUD 9 How a hybrid approach can maximise the business value of cloud and what you can do to make it happen Introduction Today, we re seeing IT s place in the enterprise evolving
DEFINITELY. GAME CHANGER? EVOLUTION? Big Data
Big Data EVOLUTION? GAME CHANGER? DEFINITELY. EMC s Bill Schmarzo and consultant Ben Woo weigh in on whether Big Data is revolutionary, evolutionary, or both. by Terry Brown EMC+ In a recent survey of
Create and Drive Big Data Success Don t Get Left Behind
Create and Drive Big Data Success Don t Get Left Behind The performance boost from MapR not only means we have lower hardware requirements, but also enables us to deliver faster analytics for our users.
Big Data Comes of Age: Shifting to a Real-time Data Platform
An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) White Paper Prepared for SAP April 2013 IT & DATA MANAGEMENT RESEARCH, INDUSTRY ANALYSIS & CONSULTING Table of Contents Introduction... 1 Drivers of Change...
The Intersection of Big Data and Analytics. Philip Russom TDWI Research Director for Data Management May 5, 2011
The Intersection of Big Data and Analytics Philip Russom TDWI Research Director for Data Management May 5, 2011 Sponsor 2 Speakers Philip Russom TDWI Research Director, Data Management Francois Ajenstat
W H I T E P A P E R B u s i n e s s I n t e l l i g e n c e S o lutions from the Microsoft and Teradata Partnership
W H I T E P A P E R B u s i n e s s I n t e l l i g e n c e S o lutions from the Microsoft and Teradata Partnership Sponsored by: Microsoft and Teradata Dan Vesset October 2008 Brian McDonough Global Headquarters:
Turning Big Data into a Big Opportunity
Customer-Centricity in a World of Data: Turning Big Data into a Big Opportunity Richard Maraschi Business Analytics Solutions Leader IBM Global Media & Entertainment Joe Wikert General Manager & Publisher
Redefining Infrastructure Management for Today s Application Economy
WHITE PAPER APRIL 2015 Redefining Infrastructure Management for Today s Application Economy Boost Operational Agility by Gaining a Holistic View of the Data Center, Cloud, Systems, Networks and Capacity
Northrop Grumman White Paper
Northrop Grumman White Paper Business Analytics for Better Government Authors: Patrick Elder and Thomas Naphor April 18, 2012 Northrop Grumman Corporation Information Systems Sector 7575 Colshire Drive
Big Data Are You Ready? Jorge Plascencia Solution Architect Manager
Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data: The Datafication Of Everything Thoughts Devices Processes Thoughts Things Processes Run the Business Organize data to do something
