Top 3 Ways to Use Data Science THE DATA SCIENCE DIVIDE Ask the average business user what they know about Business Intelligence (BI) and data analytics, and most will claim to understand the concepts. Few, however, will profess to know how analytics works or to have the skills needed to put it into practice. Despite being knowledgeable about their industry and experienced in running their organizations, the majority of business users lack expertise in analytics and visualization techniques but that doesn t stop them from wanting to have a go. Described by Forbes magazine as the sexiest job of the 21st century, 1 data scientists are in high demand as enterprises across the globe look to gain maximum value from the data at their disposal. Unfortunately, because they have to be five parts statistician, two parts business analyst, one part graphic designer, and four more parts programmer, data scientists are an incredibly scarce commodity. A McKinsey Global Institute study predicts a personnel shortfall of up to 190,000 by 2018 in the US alone, together with a need for around 1.5 million more managers and analysts able to understand and make decisions using data analytics. 2 One way of addressing the shortage in expert talent is to empower general business users to discover and unlock value in data themselves. But, just making tools easier for users is only half of the answer. A better approach is to work both sides to close the gap. To make tools that can empower business users to discover and unlock value in their data and that also extend capabilities for experts, so they can share the analytics workload, improve efficiency, and focus on higher level work. This paper describes how the data science gap evolved and suggests a solution that combines the strength of business users and data scientists. DATA ANALYTICS EVOLUTION AND PRACTICE It wasn t that long ago that data was viewed as a necessary evil, something that had to be collected and stored in order for companies to do business: fulfill contracts, process transactions, and make a profit. The emphasis just a few years back was very much on the technologies of data access (spreadsheets and relational databases), with data viewed more as a cost center than an asset. 1 Davenport, Thomas H. and D. J. Patil. Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, October 2012. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century 2 Manyika, James et al. Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute, May 2011. http://www.mckinsey.com/insights/business_technology/big_data_ the_next_frontier_for_innovation
WHITEPAPER 2 After invention of the Web and the digital revolution that followed, information became more easily accessible, which changed the way we think about data altogether. The change led to a data explosion, data warehousing, and a realization that there might be hidden relationships in all those terabytes of information. If relationships could be uncovered and used to identify trends, predict future outcomes, and empower businesses, it might also be used to alter outcomes to great advantage. WHAT DATA ANALYTICS CAN DO Here s an example of the value of data analytics: A company sells a service with high turnover; customers often leave at the end of their contracts. The company collects all kinds of customer data: contact information, records of every transaction, logs of conversations on message boards, and more. No obvious cause for the high level of churn can be found in any one of these data sources. However, a deeper analysis across all communication channels reveals that customers engaging on message boards with others who have recently terminated their contracts are the most likely to leave. This relationship was not specifically looked for or recorded in any of the available data sources. It was only uncovered using analytic tools, making it possible to build a model that could enable predicting potential leavers and take steps to prevent that loss, for example, through targeted special offers and discounts at renewal. THE DATA SCIENCE DIVIDE Today s approach to data analytics is to employ specialists to do most of the work: quantitative analysts, data analysts, and in the past few years, data scientists, reflecting the need for in-depth statistical expertise needed for more complex predictive analysis to fully exploit big data. In addition to an understanding of statistics, data scientists typically need to be proficient in coding to develop analytical models and applications. They should also possess business skills and be good communicators because they will be required to build applications to meet the needs of general business users. Design flair, and an understanding of how best to visualize data, are other prerequisites, making the data scientist role one of the hardest staff vacancies to fill. HERE S AN EXAMPLE OF WHAT TIBCO TYPICALLY LOOKS FOR IN A CANDIDATE: Data Scientist: Skills and Experience Experience in data analysis and some involvement in delivering analyses as part of demonstrations, projects, or software applications. Comfortable speaking with senior personnel, and the ability to provide compelling presentations and demonstrations of analytics software and to document the business value of analytics projects. Two or more years experience with R and some knowledge of SQL. Experience with other software environments such as SAS, Matlab, Spotfire, Tableau, Qlikview, SPSS, KNIME, or other data mining software is a plus. Experience with software components for data preparation and integration such as composite, programming, or scripting environments using.net, Java, Python, or Javascript is a plus. A Master s degree with classes in statistics. Graduate classes in time series analysis, longitudinal methods, and/or data mining is a plus. Proficiency in the Windows environment; Linux experience is a plus.
WHITEPAPER 3 DUMBING DOWN VS. ENLARGING THE CAPABILITIES Some 75% of knowledge workers are unwilling or simply unable to use data analytics to make critical business decisions. Dumbing down analytics software products to help non-specialist users do nonspecialist type analyses ignores strategic business needs: Analytics supporting better decision-making across the organization Greater self-service empowerment Smarter use of expert resources BRIDGING THE GAP Despite being knowledgeable about their industry and experienced in running their organizations, the majority of business users lack expertise in analytics and visualization techniques, but that doesn t, necessarily, stop them from wanting to have a go. I ll know it when I see it is what many users say when digging through data for an interesting trend, seeking the root cause to a problem, or maybe just trying to get to know their data as they define business metrics or set targets. However to see it, the average business professional still has to rely on specialists to translate their requirements into analytics dashboards and applications. This can be a lengthy process, and one that may or may not deliver the desired results. Then, even when they have the dashboard or application they asked for, to make changes and explore the data in other ways usually requires going back to specialists for help, basically restarting the process. Data analytics workflow Review results Business users (clients) raise requests for data analytics application Request changes Data infrastructure specialist arranges access to appropriate data sources Data scientist (expert) builds analytic model Business data analyst applies analytic model to create dashboards and apps Typical workflow needed to develop an analytics dashboard or application. With all the back and forth and effort needed, it s little wonder that some 75% of knowledge workers are unwilling or simply unable to use data analytics to make critical business decisions. It just takes too long, involves too many people, and is wide open to failure unless the workflow is precisely and expertly managed. Addressing these time and skills issues to come up with ways of reducing the circuitous, time-consuming development process has become a top priority for data analytics vendors. Most, however, have opted to concentrate on dumbingdown their products to enable non-specialist users with limited expertise in statistics, coding, and data visualization, to handle data discovery and visualization tasks for themselves. The dumbing down approach may help non-specialists do non-specialist type analyses, but it ignores the strategic needs of better analytics across the organization. So here now are three ways that TIBCO Spotfire can be used to close the data science gap:
WHITEPAPER 4 Recommendations reduces or eliminates the need to know how to build a visualization, which frees up specialists to concentrate on more complex tasks and gives users the ability to explore data through pointand-click and quickly move them up the learning curve. 1 RECOMMENDATIONS Spotfire Recommendations is a built-in analytics wizard that enables anyone, with no real expertise required, to create best practice visualizations or entire data dashboards. We don t claim that Spotfire Recommendations does everything. You still have to connect Spotfire applications to the data to be analyzed, and it doesn t build complex predictive models. However, leveraging best practice rules about what charts to use for different types of data, when to use aggregations, how to use time series correctly, and so on, it can help choose how best to visualize the results of a data analysis. Figure 1. Recommendations 1: After loading US Department of Housing and Urban Development data on the homeless population, clicking on the Recommended visualizations icon, and selecting homeless and state in the data panel (left side), Spotfire returns these visualizations, any of which can be saved to a canvas. Selecting other options in the data panel provides additional visualizations. Because the visualizations fully render the actual data, users can browse for insights instead of clicking or dragging to configure plots. As the user chooses from the suggestions, Spotfire will build a complete dashboard of linked, configurable graphics with supporting filters and controls to discover and explore the data in more detail. Recommendations can have significant and far-reaching implications. It reduces or eliminates the need to know how to build a visualization, which in turn, frees up specialists to concentrate on more complex tasks. Giving users the ability to explore data through point-and-click allows them to quickly move them up the learning curve and begin sharing insights. Recommendations can also help analysts and data scientists fill gaps in their knowledge and expertise, while dramatically accelerating the creation of more fully featured data dashboards and applications for users.
WHITEPAPER 5 CLOSING THE DATA SCIENCE DIVIDE WITH SOFTWARE Today s approach to data analytics is to employ specialists to do most of the work quantitative analysts, data analysts, and in the past few years, data scientists reflecting the need for in-depth statistical expertise needed for more complex predictive analysis to fully exploit big data. But, it doesn t really take a data scientist to use the three software features described in this paper. Figure 2. Recommendations 2: In just a few minutes, a user can assemble their chosen visualizations to build a dashboard for further analysis and reporting. This one includes a map of homeless shelter utilization by state, trends of homeless and available beds, beds by shelter type, top states for bed utilization, and tables of homeless and bed utilization. The dashboard addresses the question: Do we have enough shelters for the homeless? 2 STATISTICAL FEATURES AND VISUAL ANALYTICS Another key Spotfire strength, and advantage over less capable BI products, is its data visualization and exploration features. While most other products concentrate on tools for running analytic models and visualizing the results, Spotfire helps bridge the skills gap with easy-to-use statistical features. One example is how K-means Clustering works in Spotfire to help users explore data. The line chart showing trends over the a six-month period is difficult to read (Figure 3). By right clicking and applying a K-means Clustering algorithm, the lines are assigned to cluster groups, and similar patterns among the groups emerge (Figures 4 and 5). Figure 3. Line chart: Six-month trend lines of adjusted stock closing prices. Rightclicking and choosing K-means Clustering groups data so that similarities can be further explored, as shown in Figure 4.
WHITEPAPER 6 Figure 4. Cluster groups: Trends emerge, but data is still fuzzy due to the noisy aspect of daily close prices. Adjusting the Y axis by applying a moving average and an interval of 30 smooths the trend lines, shown in Figure 5. Figure 5. Increasing trend: Cluster 6 shows stocks with a reasonable increasing trend over time. Dragging a window around this cluster brings up the list of stocks. The resulting analysis can be easily exported.
WHITEPAPER 7 EMBEDDED PREDICTIVE ANALYTICS R. Lacy, an independent oil and gas exploration and production company, has integrated predictive analytics into its Spotfire processes. The engineering department is using Spotfire to put our fingertips on the data we need to help the company maximize its assets. Combining our mapping, decline curve analysis, and data analytics into one package in Spotfire has helped us become more efficient in our acquisition efforts and target the highest performing areas. It s really having an impact on overall profitability. Brent Haas Vice President of Engineering R. Lacy 3 PREDICTIVE ANALYTICS Predictive analytics can be used to increase confidence in decision-making by discovering meaningful patterns, anticipating emerging trends, managing risk, and forecasting behavior to increase upsell rates or decrease churn. Spotfire helps business users use predictive analytics. It starts with a data analytics professional prototyping and testing analytics in their environment of choice, but, instead of building an application, they upload the analytic to Spotfire Statistics Services, which makes it generally available to other Spotfire developers and analysts. Without the need for any coding or deep understanding of the details of the functions involved, developers can quickly integrate the analytic into a Spotfire application and share it to a wide community of business users across the organization. To learn more about how Spotfire supports greater self-service empowerment, saves time, and increases analytics and decision-making capabilities, look for Empowering the Masses with Analytics. Global Headquarters 3307 Hillview Avenue Palo Alto, CA 94304 +1 650-846-1000 TEL +1 800-420-8450 +1 650-846-1005 FAX www.tibco.com TIBCO Software empowers executives, developers, and business users with Fast Data solutions that make the right data available in real time for faster answers, better decisions, and smarter action. Over the past 15 years, thousands of businesses across the globe have relied on TIBCO technology to integrate their applications and ecosystems, analyze their data, and create real-time solutions. Learn how TIBCO turns data big or small into differentiation at www.tibco.com. 2016, TIBCO Software Inc. All rights reserved. TIBCO and the TIBCO logo, and Spotfire are trademarks or registered trademarks of TIBCO Software Inc. or its subsidiaries in the United States and/or other countries. All other product and company names and marks in this document are the property of their respective owners and mentioned for identification purposes only. 01/25/16