1 The current issue and full text archive of this journal is available at : the role of predictive analytics Joe F. Hair Jr Kennesaw State University, Atlanta, Georgia, USA 303 Abstract Purpose The purpose of this paper is to provide an overview of predictive analytics, summarize how it is impacting knowledge, and suggest future developments in and predictive analytics for both organizations and researchers. Design/methodology/approach Survival in a knowledge-based economy is derived from the ability to convert information to knowledge. To do so, researchers and managers increasingly are relying on the field of predictive analytics. Data mining identifies and confirms relationships between explanatory and criterion variables. Predictive analytics uses confirmed relationships between variables to predict future outcomes. The predictions are most often values suggesting the likelihood a particular behavior or event will take place in the future. Findings Data mining and predictive analytics are increasingly popular because of the substantial contributions they can make in converting information to knowledge. Marketing is among the most frequent applications of the techniques, and whether you think about product development, advertising, distribution and retailing, or research and business intelligence, data mining and predictive analytics increasingly are being applied. Originality/value In the future, we can expect predictive analytics to increasingly be applied to databases in all fields and revolutionize the ability to identify, understand and predict future developments, data analysts will increasingly rely on mixed-data models that examine both structured (numbers)and unstructured (text and images) data, statistical tools will be more powerful and easier to use, future applications will be global and real time, demand for data analysts will increase as will the need for students to learn data analysis methods, and scholarly researchers will need to improve their quantitative skills so the large amounts of information available can be used to create knowledge instead of information overload. Keywords Predictive process, Data handling, Multivariate analysis, Market research, creation Paper type Viewpoint A cornerstone of innovation in the knowledge-based economy is information. But to use information to improve decision-making and stimulate innovation, we must convert it to knowledge. The challenge of converting information to knowledge was first identified by Tom Peters in his book Thriving on Chaos (1991). He said we are drowning on information and starved for knowledge! The challenge has increased dramatically in recent years, with experts estimating that in the next three years ( ) humanity will generate more data than it has in the past 1,000 years (Mark, 2006). Until the last few years, most information just disappeared. It either was not collected, or overlooked as a resource. But was the information discarded because it was not useful? No. Much of it was potentially valuable for organizations and researchers. The information was not collected or simply thrown away because it was not economical to collect, store, analyze or interpret. In the past decade, more than one-half of the human race has moved its work, shopping, playing and chatting online, European Business Review Vol. 19 No. 4, 2007 pp q Emerald Group Publishing Limited X DOI /
2 EBR 19,4 304 creating mountains of digital data that once would have languished on scraps of paper or vanished as forgotten conversations (Goldman, 2006). Today, virtually all organizations can convert what was a waste by-product into a resource that improves organizational decision-making, creates knowledge and provides added value to customers. And researchers increasingly have tremendous opportunities to identify and examine patterns in information to create valuable knowledge for society. Survival in a knowledge-based economy is derived from the ability to convert information to knowledge. To do so, researchers and managers increasingly are relying on the field of predictive analytics. The purpose of this paper is to provide an overview of predictive analytics, summarize how it is impacting knowledge, and suggest future developments in and predictive analytics for both organizations and researchers. What is predictive analytics Predictive analytics is not a revolutionary approach. The statistical techniques underlying the field were developed in the 1920s, while the concept of exploratory data analysis was proposed by the statistician John Tukey of Princeton University in the mid-1970s. Predictive analytics uses confirmed relationships between explanatory and criterion variables from past occurrences to predict future outcomes. The predictions are most often values suggesting the likelihood a particular behavior or event will take place in the future (http://en.wikipedia.org/wiki/predictive_analytics). Predictive analytics and data mining are sometimes viewed as one and the same. But in reality they are separate interacting processes. Data mining first searches for data patterns and identifies promising relationships. The relationships are based on searching data (numbers), text (words or phrases), web movements (click through and time-spent patterns), visual images and so on. Predictive analytics then uses confirmed relationships to predict future trends, events and behavior patterns. Data mining searches large volumes of data for patterns (http://en.wikipedia.org/ wiki/data_mining). It is generally thought of as involving two phases. The first phase is considered exploratory because the objective is to discover interesting, non-obvious relationships hidden in a database that have a high potential for creating knowledge, improving decision-making and providing more value to customers. This is an automated process that uses computer-based methods, referred to as machine-learning methods, to identify patterns of information in data. Many of the methods, such as neural networking and genetic algorithms, emerged from artificial intelligence and are used because they are efficient in exploring large databases. The second phase of data mining involves testing and confirmation of relationships revealed through the discovery process. It is a semi-automatic, human-learning process as opposed to a machine-learning approach. Hypotheses developed in the initial discovery phase are assessed. Even weak or not well-understood relationships can be examined, revised, confirmed or rejected. When relationships are identified and confirmed through data mining, they can then be used in predictive analytics. Predictive analytics is most often thought of as predictive modeling. But increasingly the term includes descriptive and decision modeling as well. All three modeling approaches involve extensive data analysis, but have different purposes and rely on different statistical techniques.
3 Predictive models analyze past performance to assess how likely a customer is to exhibit a specific behavior in the future. The models also seek out subtle data patterns to answer questions about customer performance, such as fraud detection models. Predictive models often perform calculations during live transactions, for example, to evaluate the risk or opportunity of a given customer or transaction when an ATM or credit card is used. Descriptive models describe relationships in data and are used to classify customers, prospects, events or activities into groups. Unlike predictive models that focus on predicting a single customer behavior (such as credit risk), descriptive models identify many different relationships between customers, products or activities. But descriptive models do not rank-order customers by their likelihood of taking a particular action the way predictive models do. Instead, descriptive models are used offline, for example, to categorize customers by product or service preferences, behavior or stage in their life cycle. Decision models describe the relationship between all the elements of a decision. This includes what is known about the data and relationships, the decision itself, and the predicted outcome of the decision. The objective is to predict the results of decisions involving many variables. The models are used to maximize certain outcomes while minimizing others, i.e. to achieve optimization for a particular situation. Decision models are generally used offline to develop a set of business rules that are likely to produce the desired outcome for customers, events, activities or organizations. By using predictive analytics, researchers and data analysts help organizations solve problems or pursue opportunities, and identify relationships as they currently exist or as they are likely to exist in the future. The models developed using predictive analytics are very useful in knowledge creation. But to properly apply the knowledge created by predictive analytics requires a great deal of expertise. 305 Factors stimulating application of predictive analytics Several developments have stimulated the application of data mining and predictive analytics. First, as noted earlier there is increased data from many sources, including retail point-of-sale purchases, online transactions, medical, educational and governmental records, global positioning systems, RFID (radio frequency identification devices), wireless electronic data sensors, and so on. Indeed, in many instances it costs little to collect data today since it is a by-product of information technology developments that are an integral part of modern organizations, including government agencies and most all types of businesses. And once data are collected, it also costs very little to store in data warehouses. For example, the Teradata division of NCR notes that as recently as the early 1990s the cost of storing 1 megabyte of data was $15 whereas today it costs less than a penny (London, 2002). In fact, a hard drive for home use holding a terabyte of information costs only about US$ Another factor stimulating the application of predictive analytics is the lower cost of electronic communications. Historically data stored in different locations was seldom shared it was examined on site which limited its value. But today with the internet and related communications technology the cost of transferring data is minimal. For example, UPS, the global package shipper, handled approximately 600,000 phone inquiries about package shipping status during the 1997 Christmas season costing about $3 each, for a total cost of $1.8 million. But by 2001 the company had migrated to
4 EBR 19,4 306 Figure 1. Traditional research approach an internet-based system that handled over ten million inquiries costing about three cents each, for a total cost of $300,000 (Lynch, 2006). Improved client/user interfaces have increased accessibility to software that executes predictive analytics. It is no longer necessary to write computer instructions and hope no mistakes are made. With windows-based, menu-driven software users can easily point-and-click their way to a solution. Moreover, the latest versions of software include many diagnostics that enable users to evaluate the findings and make needed adjustments. These features improve accessibility but open the door for incorrect application of the methods. Thus, extensive training is needed to ensure the appropriate tool is applied, decision rules are properly selected, and findings are correctly interpreted. Traditional research versus predictive analytics Controversy has arisen over the application of these new analytical tools. Traditional quantitative researchers use theory to develop hypotheses that are then tested. But the newer tools do not start with theory. Instead they are data driven. As shown in Figure 1, traditional research is based on a theoretical foundation. But data mining and predictive analytics begin by identifying relationships in data (Figure 2). Hypotheses are then developed and tested as part of model building and validation. Once a model is validated, it is used in predictive analytics. Theory may or may not be part of model building. Researchers who use data mining and predictive analytics claim that data warehouses are so large it is difficult to fully examine the data and relationships in order to apply theories to hypothesis development and testing. In fact, today s largest data warehouses have a capacity of 300 or more terabytes, and many organizations have 100 terabytes of storage capacity. For example, Vodafone, Tesco, IBM and WalMart all claim to have data warehouses with a capacity of terabytes. As a basis of comparison, a single terabyte is equal to 1,024 gigabytes, or one trillion bytes (a gigabyte is often informally referred to as a gig, e.g. my computer has a 100 gig hard drive). Another way of understanding the concept of a terabyte is a single terabyte is equal to 50,000 trees made into paper and printed, or 10 terabytes is equal to all the print collections in the USA Library of Congress (London, 2002). However, you evaluate the capacity of data warehouses, one certainly must conclude they contain a tremendous amount of information enough to overwhelm the data analyst unless the proper software and statistical tools are used. The sheer size databases can be amazing. Ford Motor Company has 50 million names in its customer database. Citicorp has 30 million names, Kimberly Clark 10 million, and Kraft Foods 2.5 million customer records. General Motors has 12 million Theory Develop Hypotheses Test Hypotheses Figure 2. Data mining and predictive analytics Data Relationships Develop Hypotheses Model Building Test Hypotheses Model Validation
5 GM credit card holders in a database containing detailed data on customer buying habits. For each name in the database, the companies have an average of 20 separate pieces of information that can be translated into effective strategies to better serve customers. As an example, Nokia examined six billion data points from customer surveys to create its do-it-all multimedia phones under the Nseries brand (Edwards and Ihlwan, 2006). Similarly, Vodafone typically examines 20 or more predictive models every month in an effort to better understand and predict likely market developments in the telecommunications industry. Traditional researchers criticize data mining and predictive analytics as unscientific because the techniques are data driven instead of theory driven. They say research based on data mining may search for relationships to support preconceived notions, or researchers may develop stories to fit relationships found in the data. But even if this does not happen, traditional researchers argue that judgment may not be used in applying the findings of these newer tools because too often inexperienced researchers or naïve managers believe if the numbers suggest a believable relationship they must be correct. The debate over data driven or theory driven approaches to conducting research will not be resolved quickly. But looking beyond these issues most researchers agree that data mining and predictive analytics have many advantages. Among the most important advantages is relationships that would remain hidden are identified by the tools, and decision-making is therefore more informed and better. The reason for improved decision-making is more complete understanding of the data and its underlying patterns, and more accurate predictions. As a result decisions are more consistent and thus more accurate. When researchers and managers increase the accuracy of their decision-making the outcome is increased customer satisfaction and retention for the organization, and therefore lower costs and higher profits. 307 Predictive analytics applications Data mining and predictive analytics are increasingly popular because of the substantial contributions they can make in controlling costs and growing revenue. Marketing is among the most frequent applications of the techniques. Many organizations are using the techniques to help manage all phases of the customer life cycle, including acquiring new customers, increasing revenue from existing customers, and retaining good customers. By determining characteristics of good customers, companies can target prospects with similar characteristics. Profiling customers who have bought a particular product can focus attention on similar customers that have not bought a product or service (cross-selling). Identifying customers that have left enables a company to understand customers that are at risk for leaving (reducing churn or attrition), because it is usually far less expensive to retain a customer than acquire a new one. Whether you think about product development, advertising, distribution and retailing, or research and business intelligence, data mining and predictive analytics increasingly are being applied. Marketing and sales Using customer data from their large databases, organizations can help individuals in their and sales divisions to better understand customer behavior and predict future purchasing patterns. Among the most frequent applications in are
6 EBR 19,4 308 assessing lifetime customer value and future customer profitability, determining the optimum sales message to attract attention, stimulate interest and motivate purchase, and to identify cross-selling opportunities. For example, overnight package shipper FedEx uses predictive analytics to develop models that predict how customers will respond to price changes and new services, which customers are likely to switch to a competitor, and how much revenue will be generated by new storefront or drop-box locations. In the past retailers have used customer data to determine their merchandise mix and predict sales. But today retailers have a lot more data from point-of-sale scanners, web-enabled kiosks, digital signage, electronic shelf tags, RFID tracking devices, click-stream patterns on their web sites, and so on and with predictive analytics more complex models are possible. These models are able to evaluate data that were unavailable or unusable in the past and include it to more accurately plan their merchandising strategies. Areas where predictive analytics show the most promise for retailers include predicting merchandise assortment and depth, deciding store layout, determining pricing strategies, controlling inventory and shrinkage, effectiveness of promotions and coupons, determining the best direct mail approaches, and understanding how sounds (music in clothing stores) and scents (food smells in supermarkets) influence purchase likelihood. Sales managers use data mining to identify sales performance by geographic area, product type and buying characteristics, as well as channel strategies. Then demographics, lifestyle variables and purchasing behavior are used to define what new products should be introduced into the market and where, as well as which supply chains are more efficient and how to keep store shelves stocked with the right items. What-if analyses enables organizations to identify how profits will be affected based on changes in inflation and pricing patterns as well as the impact of increasing the number of employees in one or more divisions of the company. Finally, behavioral metrics developed using predictive analytics models can graphically reflect the selected sales information and create what-if scenarios to define and confirm the right combinations of new product distribution. Advertising Traditional advertising media include television, radio, newspapers, magazines and outdoor. Other than personal selling, a little more than ten years ago these mediums represented over 98 percent of how businesses communicated with customers. Today, non-traditional mediums such as the internet, podcasting, blogs, product placement, wireless, video games, event, video on demand and search ads exceed 20 percent of many companies annual ad budget and are the fastest growing. What is driving this trend? Certainly it is the increased amount of time spent interacting with non-traditional media by the marketplace. For example, individuals generally spend about eight hours a week on the internet compared to about 20 hours a week watching television. But equally important is data mining and predictive analytics can more accurately analyze and predict the impact of ad spending on these new media versus traditional media. Expenditures are growing rapidly for all non-traditional media, but the growth of search advertising is the highest. Consider this number 800 million people a day. A big number indeed, a very big number! What does that represent? Many things,
7 but it certainly could be described as a market opportunity! But for what? Perhaps, the most obvious opportunity is advertising. And thinking more broadly, it really is an opportunity for communicating better with the market, building brand awareness, growing relationships, and many other communication related objectives. What is 800 million? It is the number of searches per day globally on general purpose search engines, with Google at 400 million plus searches per day (about 550,000 per minute!), Yahoo 250 million, and the rest of the search engines combined about 150 million searches per day. Google, thus, has about a 50 percent share, Yahoo about 25 percent, MSN 9 percent, AOL and Ask 5 percent each, and the others very little (Sullivan, 2006). You may think Google is running away with the search engine market, and the company is doing well. But Yahoo has almost 200 million active registered users, and its users are the most engaged of any portal and Yahoo knows the online habits of all these users. In fact, as a benchmark of its impact, the New York Times averages about 4.6 million readers a day and USA Today about 5.6 million readers, but about nine million people a day get their news online from Yahoo (Vogelstein, 2005). Why is searching such a good opportunity to communicate with customers? Because increasingly customers are searching online before they make a purchase. Indeed, recent studies show that almost 60 percent of customers search online before making a purchase, and of those 66 percent say they regularly use TV and the internet simultaneously. That is, they sit in front of their televisions and use their laptops. Studies also show that once online, 80 percent of internet traffic begins at a search engine. Thus, search engines are where the customer is connecting the dots and if companies are not there, customers will not see them and they will be less likely to become engaged with the companies offline (Oser, 2006). Marketers can create awareness and stimulate interest using traditional media, but if the customer cannot find the brand online they likely will buy a competitor s product or service. Marketers are realizing the value of online advertising, particularly search ads. Search advertising is about 40 percent of online advertising worldwide but in the USA and Britain it is more than 50 percent (Pfanner, 2006). Moreover, search advertising is expected to more than triple by 2010, to almost $80 billion annually worldwide. The big driver in the growth of online ad spending is the metrics the ability to measure customer response more precisely using predictive analytics. In addition to developing demographic profiles of customers, both Yahoo and Google can predict the probable response rate to ads, the time of day and day of the week the ads are likely to be the most effective, and through click-stream analysis, who the potential buyers are at various stages of the consideration process for a particular product or service. Let us consider two examples. During the 2002 Super Bowl, telecommunications company AT&T spent millions of dollars to introduce a product called M-life (a new mobile initiative). Thousands of viewers went online because there was almost no awareness of M-life outside the company. But AT&T overlooked search engines and did not invest in any key words related to the ads. The result consumers remained clueless about the new mobile concept. In contrast, during the 2006 Super Bowl, fast food company Burger King was highly successful using search advertising in conjunction with its TV ads. The company first bought extensive keyword coverage across both Google and Yahoo, including everything from Burger King to Whopper to Whopperette the words used most 309
8 EBR 19,4 310 often in response to the TV ads they ran. But in addition, the company bought the phrase Super Bowl commercials. But that was not all. The pages web site traffic landed on when the searches were run took people to fun, interactive content, including a game, behind-the-scenes video for the ads, and so on. The impact was tremendous because they knew what they were doing! (Oser, 2006). Search engines not only are changing, but changing our lives. And of course, the success of the search engines is based on the information they collect and what they do with that information, which increasingly is to process it using data mining and predictive analytics. Marketing research and business intelligence The descriptive, decision modeling and predictive capabilities of data mining and predictive analytics enable researchers to provide decision-making insights that are moving from an art to a science. For example, businesses are using predictive analytics to determine likely responses to advertising messages, distribution alternatives, and pricing strategies. This includes not only whether customers are likely to show an interest in a product or service, but the rate at which inquiries, web site visits or store visits are converted into actual sales. This is particularly true as researchers learn how to utilize both structured and unstructured data in their models. Marketing researchers are using technology to collect data in new ways that enhance their ability to understand and predict customer behavior. For example, the Portable People Meter is a pager-like device that clips to the belt of individuals who participate in panel-based research. The device listens to its environment and processes signals inserted into the audio channels of radio and television broadcasts. The time and duration of the signals is recorded and when the device is returned to its charger base each night it transmits that data to a database owned by the New York companies Arbitron and Nielsen Media Research. The result is the ability to accurately determine what messages people are exposed to and their reactions (McQuivey, 2004). The data are then combined with other information and used in developing more accurate predictive analytic models. While companies of all sizes are accumulating data and the skills to analyze it, this does not mean it is always effectively utilized. Unfortunately the author has worked with many companies that have huge amounts of unused data. For example, one global insurance company the author met with recently had the following information stored in a Microsoft Access database: overall satisfaction with the customer service department; why the customer contacted the company; how many times the customer contacted the company regarding a particular issue; was the customer service representative (CSR) courteous; did the CSR treat the customer as a valued customer; did the CSR take the time to listen and help the customer; was the CSR knowledgeable, was the CSR empowered to make decisions; if the company responded in writing, was the letter clear; overall satisfaction with the company; will the customer continue to do business with the company; and would the customer recommend the company to a friend or relative. The data is only analyzed to ensure that customer response percentages are within a tolerable range. Management has determined the acceptable ranges through years of experience as well as by benchmarking other companies numbers. These percentages are reviewed not only by division management, but also by management at corporate headquarters. At this point, this is the extent of the analysis being performed on the data obtained in these customer satisfaction surveys.
9 One of the most important metrics used to assess customer satisfaction for the company is the Loyalty Metric. The construct is made up of a combination of responses to the three questions regarding overall satisfaction with the company, likelihood of continuing to do business with the company, and likelihood of recommending the company to friends and relatives. This metric is one of the most highly scrutinized numbers by the powers-that-be at the company. The focus is constantly on ensuring that this Loyalty Metric is within the defined tolerable range in fact in the upper end of the defined acceptable range. But in spite of the importance given to the Loyalty Metric no analysis was being performed to understand the impact of the other data on the Loyalty Metric. Yet many of the other survey answers would significantly impact how customers rate the company on the three factors making up the Loyalty Metric. Currently, no statistical analysis is being performed to understand if any of these relationships exist or are statistically relevant, even though this is a prime opportunity to utilize predictive analytics. 311 Applications in other areas In addition to, data mining and predictive analytics offer value across a broad spectrum of fields. Telecommunications and credit card companies are two of the leaders in applying predictive analytics to detect fraudulent use of their services. Insurance companies and stock exchanges are also interested in applying this technology to reduce fraud. Medical applications are another fruitful area. For example, predictive analytics is being used to predict the effectiveness of surgical procedures, medical tests or medications, and to identify and profile individuals likely to develop specific medical conditions, such as cancer, heart disease or high blood pressure, as well as the severity of a particular illness. As electronic health records become more prevalent, healthcare databases will provide a rich source of information for predictive analytics. Blue Cross Blue Shield, a US-based healthcare insurer, has used a neural net-based predictive model to analyze claims data and predict which health care resources individual members will need months and even years into the future. Similarly, Children s Memorial Research Center in Chicago, IL has applied data mining to classify pediatric brain tumors. Then, using predictive analytics, genomic research, and tools that search electronic medical sources for relevant information, doctors determine the best therapy and predict the probability a tumor will recur. In law enforcement, predictive analytics has been used to determine where and when crimes are most likely to occur using a database of past calls to police, arrests, and crime incidents. The models consider weather data as well as local festivals, sporting, and other events. Law enforcement officials can inquire about specific types of crimes, such as determining which neighborhoods are most likely to experience robberies or auto thefts. For example, the police have been able to predict the likelihood of robberies in specific nightclub parking locations near closing time robbers consider inebriated club-goers to be easy marks. Current law enforcement applications focus on time, place, and type of crime, but type of weapon used in past crimes, types of drugs being sold on the streets, and other relevant details are other possible variables. Finally, experts in the USA and the UK are using predictive analytics and behavioral analysis to examine trends and visual pattern recognition to detect signs of an impending terrorist attack. The thought is terrorists could be photographed on security
10 EBR 19,4 312 cameras and identified through their behavior if they loiter in a specific place, for example before they have a chance to carry out an attack (Whiting, 2006). Companies in the financial markets use the techniques to determine market and industry characteristics, currency exchange rates, trends in the stock and bond as well as real estate and capital markets, and to predict individual company and stock performance. Similarly, pharmaceutical firms are mining large databases of chemical compounds and genetic material to discover substances that might be candidates for development as medicines for the treatment of diseases. Government tax authorities in the USA, UK and many countries in Europe use predictive analytics to identify individuals and companies that appear to be illegally avoiding taxes (Finucane, 2004). Finally, predictions of volcanic eruptions, tropical storms, earthquakes, tsunamis, global warming, and similar weather related events increasingly depend on these new analytical techniques. Likely future developments So what does the future hold for the emerging field of predictive analytics? Data will continue to increase exponentially, but data quality will need to improve. There is no indication the amount of data will decrease. Indeed, it appears evident that it will continue to increase substantially. The task ahead then is to ensure the quality of data increases. Much progress has been made in recent years, but the pressure to cut costs and make quick decisions will slow efforts to improve the quality of data. If used properly, the huge databases created by businesses worldwide will revolutionize the ability to identify, understand and predict customer needs and market developments. Indeed, access to the information captured and made manageable by modern databases is the business equivalent of the scientific breakthroughs of the last century when the fundamental building blocks of matter were cracked open and understood (Mitchel, 1998). To date most predictive analytics applications have concentrated on structured data (numeric and categorical) such as date last paid, geographic location, average payment, days delinquent or average balance, and number and type of products and services purchased. Numeric values such as these are readily understandable and computable. For example, analyses easily recognize that a customer who uses a product or service ten or more times a year is in a different category than one who seldom or never uses the product. Predictive analytics will increasingly rely on mixed-data models that examine both structured and unstructured data. For example, unstructured text that accompanies many customer records such as customer hung up the phone, customer said she has been sick and customer said the check was in the mail has not been usable. Customer service reps often record these comments as well as brief shorthand notes. But while the comments are understandable to those who record them, they are not immediately understandable to a computer in the same way structured data is. Recently, however, unstructured data and analysis techniques have been incorporated into predictive modeling. For example, collections centers for banks and credit card companies are beginning to use mixed-data models that are as much as 60 percent more accurate than structured data only models used previously. This increased accuracy equates to millions of dollars in increased collections. Similar results have been developed in predicting customer losses in financial services, churn in wireless telecommunications, and fraud in insurance and healthcare (Rubel, 2006).
11 Unstructured data including video and graphics images from social network web sites like YouTube and MySpace increasingly will be incorporated into data mining and predictive analytics models (Goldman, 2006). Finally, blog search engine Technorati.com tracks 55 million blogs worldwide applying data mining technology to identify patterns (Salzer, 2007). Statistical tools will be more powerful and easier to use, thus facilitating more widespread application of predictive analytics. Graphical interfaces with software have made data mining and predictive analytics much more accessible and user-friendly. At the same time, most universities have upped their requirements in statistics and data analysis. The combination of more accessible software and more individuals with advanced analytical training, along with the pressure for more accurate decisions, will drive the spread of predictive analytics. Many developments will lower the cost per contact of advertising and improve customer service. Hardware and software improvements will enable marketers to more effectively and efficiently identify and target customers, thus lowering costs. The major obstacle will be the upfront cost of acquiring and implementing the technology and hiring qualified data analysts. But many companies, particularly medium to larger ones, can afford the investment. As companies learn to use this new technology to identify and target customers, they can match customer needs better in product and service development, and communicate more effectively with targeted segments, spending media budgets in a way that reduces wasted dollars and enhances customer experiences. This process will be further enhanced because customers will be in a much better position to provide feedback on how to improve products and services. Companies need to consider customers reactions to these technologies as they impact, and not just the cost savings. Not all market segments react favorably to innovative technologies, particularly technology solutions driven more by cost savings than relationship building. As an example, voice mail has enabled most firms to save substantial amounts of money answering their phones, but has created substantial frustration among customers having to deal with multiple levels of voice mail, wait times in actually talking to a customer service representative, and outsourcing of phone answering to low-cost countries where the native tongue is different from the caller, thus causing communication difficulties. Another area then must be resolved is the privacy issue. Data collection and storage technology is advancing faster than laws and regulations to govern its use. Many individuals consider the use of their data as an invasion of their privacy. On the other hand, when the data are used properly is provides many benefits. Organizations and individuals need to find a reasonable compromise regarding which information can be used and how. Future applications will be global and real time. The internet has facilitated not only global trade, but quick and easy collection and sharing of information. Manufacturers and service companies throughout the world are gathering more information, examining it using more sophisticated analytical techniques, and using the knowledge gained to make better decisions and thereby create a truly globally competitive market. Demand for data analysts will increase as will the need for students to learn data analysis methods. The foundation of data mining and predictive analytics is statistics and mathematics. In the past many students have avoided course work that included 313
12 EBR 19,4 314 quantitative and statistical methods. This has been particularly true in the USA and Europe. As more organizations begin using these techniques the need for individuals with advanced training in this field will increase dramatically. Scholarly researchers will need to improve their quantitative skills. But this will not diminish the role of qualitative skills. In fact, successful application of quantitative techniques will be based upon improved qualitative research. Academic and industry researchers will need to understand and apply the most rigorous scientific methods to ensure the reliability and validity of their approaches, and achieve the predictive accuracy necessary to be globally competitive. By doing so, the large amounts of information available can be used to create knowledge instead of information overload. References Edwards, C. and Ihlwan, M. (2006), The future of tech, Business Week, December 4, p. 78. Finucane, M. (2004), States increasingly using data mining to find tax scofflaws, Baton Rouge Sunday Advocate, April 4, p. 6D. Goldman, N. (2006), Math will rock your world, Business Week Online, January 23. London, S. (2002), Customer information is the new commodity, Financial Times, p. 12. Lynch, D. (2006), Thanks to its CEO, UPS doesn t just deliver, USA Today, July 24, p. B1. McQuivey, J. (2004), Technology monitors people in new ways, Marketing News, September 15, p. 23. Mark, E. (2006), Fragile digital data in danger of fading past history s reach, Atlanta Journal Constitution, June 7, p. A1. Mitchel, A. (1998), Information, databases and, Marketing Week, available at: Oser, K. (2006), Why marketers need to take search seriously, Advertising Age, New York, NY, available at: ¼ (accessed August). Pfanner, E. (2006), Search ads: spreading the word, International Herald Tribune, September, available at: p. 3. Rubel, T. (2006), The next wave in customer relationship analytics, DM Review Special Report, available at: ¼ (accessed December). Salzer, J. (2007), Politicos, readers interact via blogs, The Atlanta Journal-Constitution, January 14, p. D1. Sullivan, D. (2006), Searches per day, available at: (accessed August). Vogelstein, F. (2005), Yahoo s brilliant solution, Fortune, August 8, pp Whiting, R. (2006), Businesses mine data to predict what happens next, Intelligent Enterprise, available at: ¼ (accessed December 2006). Further reading Peters, T. (1979), Thriving on Chaos, Harper & Row, New York, NY.
13 About the author Joe F. Hair Jr is Professor of Marketing at Kennesaw State University. He formerly held the Copeland Endowed Chair of Entrepreneurship at Louisiana State University. He has published more than 40 books, including market leaders Multivariate Data Analysis, Prentice-Hall, 6th edn, 2006, which has been cited more than 6,500 times, and Principles of Marketing, Thomson Learning, 9th edn, 2008, used at over 500 universities globally. In addition to publishing over 100 refereed manuscripts, he has presented executive education and management training programs for numerous companies, been retained as consultant and expert witness for a wide variety of firms, and is frequently an invited speaker on challenges and strategies. Joe F. Hair Jr can be contacted at: 315 To purchase reprints of this article please Or visit our web site for further details: