A Revolution that Will Transform How We Live, Work and Think. By Viktor Mayer-Schönberger and Kenneth Cukier

Similar documents
Statistical Challenges with Big Data in Management Science

LARGE-SCALE DATA-DRIVEN DECISION- MAKING: THE NEXT REVOLUTION FOR TRADITIONAL INDUSTRIES

Big Data. Fast Forward. Putting data to productive use

Data Monetization in the Age of Big Data 1

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

BRANDLOGIK. an Introduction to. BIG data. the future is now

The Promise of Industrial Big Data

Big Trouble. Does Big Data spell. for Lawyers? Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado

Ridiculously Good Outsourcing. The Monetization of Big Data: Made Possible By Humans. (888) TASK

UNCLASSIFIED. Open Data User Group (ODUG) Driver and Vehicle Licensing Agency (DVLA) Bulk Data August 2013

Data Analytics for Healthcare: Creating understanding from big data

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

How To Understand The Power Of The Internet Of Things

What happens when Big Data and Master Data come together?

Bruhati Technologies. About us. ISO 9001:2008 certified. Technology fit for Business

Will You Be a Digital Sensation or a Digital Catastrophe?

Connecting things. Creating possibilities. A point of view

The Social Impact of Open Data

US-China Internet Industry Forum

Social Media for Automotive Dealers. A Look at How Social Media Empowers Dealers Through Increased Exposure and Interaction With Consumers.

Six Opportunities for Travel Companies to Transform the Customer Experience

EXCLUSIVE INTERVIEW A BEHIND THE SCENES LOOK AT TELEFÓNICA S EVOLVING BIG DATA EXTERNAL MONETISATION MODEL

TIPPING POINT: HOSPITAL RESILIENCE IN A PERFECT STORM

Big Data Buzzword or Real Opportunity?

The role of big data in medicine

Agile speech analytics: a simple and effective way to use speech analytics in contact centres

Customer Segmentation in the Age of Big Data

At a recent industry conference, global

BUY BIG DATA IN RETAIL

Capitalizing on The Internet of Things

Collaborations between Official Statistics and Academia in the Era of Big Data

The Future of Customer Experience

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

A future career in analytics

Sources: Summary Data is exploding in volume, variety and velocity timely

Population Analytics. Population Analytics: A New Opportunity for Mobile Operators. » Mobile Operators POPULATION ANALYTICS BENEFITS AT A GLANCE

The evolution of the internet Welcome to the internet of things. enterprise.bcs.org

News in a Mobile World

ANALYTICS A FUTURE IN ANALYTICS

Understanding the impact of the connected revolution. Vodafone Power to you

Probes and Big Data: Opportunities and Challenges

5 - Low Cost Ways to Increase Your

I D C A N A L Y S T C O N N E C T I O N. C o g n i t i ve C o m m e r c e i n B2B M a rketing a n d S a l e s

IoT market analysis: Sizing the opportunity

Big Data better business benefits

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Multichannel Attribution

Big Data. How it is Transforming Learning and Talent Development

Big Data how it changes the way you treat data

The Impact of Big Data on Social Research David Rhind Sharon Witherspoon

Big Data. What is Big Data? Over the past years. Big Data. Big Data: Introduction and Applications

How to Plan a Successful Load Testing Programme for today s websites

BANKING ON WILL BIG DATA TRANSFORM THE CUSTOMER EXPERIENCE? A Retail Banking perspective

Better Insurance Lead Gen Without the Form:

Building a Database to Predict Customer Needs

A New Foundation For Customer Management

CONNECTING DATA WITH BUSINESS

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Intelligent Systems: Unlocking hidden business value with data Microsoft Corporation. All Right Reserved

Cleaned Data. Recommendations

Introduction to Inbound Marketing

A Future Without Secrets. A NetPay Whitepaper. more for your money

Measure Social Media like a Pro: Social Media Analytics Uncovered SOCIAL MEDIA LIKE SHARE. Powered by

Statistics for BIG data

Louis Gudema: Founder and President of Revenue + Associates

Research Note What is Big Data?

Fight fire with fire when protecting sensitive data

Analytics For Everyone - Even You

In-Vehicle Infotainment. A View of the European Marketplace

BIG DATA. WHAT S YOUR STRATEGY?

Data Aggregation and Cloud Computing

What is the number one issue that Organizational Leaders are facing today?

Forensic accounting. Data analytics

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

What You Need to Know About the Future of Data-Driven Marketing

Streaming Analytics and the Internet of Things: Transportation and Logistics

Impact of Internet of Things (IoT) on Industry and Supply Chain

One thing everyone seems to agree with is that Big Data reflects the geometric growth of captured data and our intent to take advantage of it.

You Rely On Software To Run Your Business Learn Why Your Software Should Rely on Software Analytics

Introduction to Data Mining

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Big Analytics: A Next Generation Roadmap

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Intro to Bioinformatics

INTRODUCTION. IoT AND IP STRATEGIES

HIPAA and Big Data Twenty Third National HIPAA Summit. March 17, 2015 Mitchell W. Granberg, Optum Chief Privacy Officer

Best Practice Search Engine Optimisation

Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich

Social Media Technology Thought Leader Interview Series

The Evolving Internet of Things Market

How To Use A Gps Tracking System

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

INSURANCE MADE SIMPLE

Telecommunications Point of View October 2014

Empowering the Digital Marketer With Big Data Visualization

HOW BIG DATA IS IMPROVING MANAGEMENT

THE 2014 THREAT DETECTION CHECKLIST. Six ways to tell a criminal from a customer.

The Internet of Everything: Ecosystems and the Disconnected User

Transcription:

A Revolution that Will Transform How We Live, Work and Think By Viktor Mayer-Schönberger and Kenneth Cukier Even though computers have been around for decades, it s only in the last 10 years or so that computing power and storage have advanced far enough to enable the emergence of big data. Viktor Mayer-Schönberger, professor of Internet governance and regulation at the Oxford Internet Institute, Oxford University, and Kenneth Cukier, data editor at The Economist, are leading authorities on big data. At this moment, they say, we re merely at the dawn of the big data revolution. The transformation and its impacts will be huge - no less than an important step in humankind s quest to quantify and understand the world. At its heart, big data is surprisingly simple: it s about applying math to huge quantities of data to infer probabilities. But it s only in the very recent past that we ve had access to both the vast amounts of data and the enormous computing power required to apply math in this way. The idea of big data came first from the sciences of genomics and astronomy, which experienced data explosions in the 2000s. Now Internet companies, like Google, are taking the lead with the latest processing technologies, powered by their access to stores of data - as well as a strong profit motive. How much data exists today, and how fast is the volume of data growing? It s estimated that there are about 1,200 exabytes of stored information globally. To understand how much information this represents, think of a full-length movie - that would represent about one gigabyte. An exabyte is composed of one billion gigabytes. And the amount of digital information in existence doubles every three years. Compare that with the time of the invention of the Gutenberg press, circa 1439, when it took about 50 years for the stockpile of information in Europe to double. A good example of how big data is changing our world is the way the 2009 H1N1 flu virus was tracked. Health officials using traditional tracking methods couldn t keep up with the spread of the flu. Google, on the other hand, could. Google engineers used the topics of Internet searches to pinpoint the regions where the flu was hitting, and to indicate the volume of sufferers - providing the health care system with invaluable real-time data. Along with increases in the volume of data (now growing four times quicker than the world s economy) and computing power (increasing nine times more quickly), we re 1 To purchase a personal subscription or corporate license, please visit us at www. or fill out the simple request form at www./contact-us. All Rights reserved. Reproduction or redistribution in whole or in part without prior written permission of The Business Source is strictly prohibited.

seeing a change in the way that data is perceived and used. It s now, a raw material of business, a vital economic input, used to create a new form of economic value. In Big Data, the authors explore three major shifts that are transforming how we analyze information. First, we no longer have to make do with a small sample and hope it s representative - instead we can analyze all the relevant data. Second, we aren t restricted to only using the data that is easy to classify and use now we can also incorporate a huge volume of messy data to give us a whole new level of understanding. Third, we can move from focusing on finding causality to analyzing correlations to reveal causality. We will also embrace the concept of datafication. By transforming information on just about everything into a quantifiable data format, we ll be able to use information in many new ways. Big data is changing society, business and markets. And data will become a corporate asset, much like brands, or intellectual property. The authors also explore the ethical implications of big data, including privacy and the potential for certain institutions (like insurance companies and health care providers) to base services on predictive algorithms, instead of uniformly across all customer types. More - Moving From Data Samples To Using All of The Data Big data doesn t necessarily mean huge volumes of data. Instead, it refers to using all of the data, rather than a sample. In the past, we ve had to extrapolate meaning from small data samples, due to the difficulties in collecting and managing larger data amounts. Sampling was first introduced more than 300 years ago, by John Graunt of London, England, who used it to infer the size of the population of London at the time of the plague, instead of counting each person. In the U.S., inventor Herman Hollerith was tasked with finding a way to use his punch cards and tabulation machines to conduct the 1890 census. He was able to reduce the tabulation time from eight years to one, launching automated data processing and sowing the seeds of what would later become IBM. Statisticians quickly found that random sampling, rather than just increasing the sample size, resulted in more precise results. This opened the door to cheaper, more manageable data collection. But an analysis of the full data set is far superior, and allows you to drill down into smaller and smaller datasets. Google Flu Trends is based on billions of Internet search queries. Using all this data rather than a small sample improves the analysis down to the level of predicting the spread of flu in a particular city, say the authors. Looking at all of the data, rather than a sample, makes it possible to identify details and patterns. For money transfer company Xoom, an analysis of all data relating to its money transfers showed an unexpected pattern of transactions that turned out to be a fraud being carried out by a criminal group. 2

And sometimes you don t know what you re looking for until you find it. A complete dataset offers the ability to assess data from different perspectives, or examine parts of it more closely. Messy - Embracing Inexactitude in Exchange for Greater Insights We humans are naturally inclined to measure things - time, space, and so on - as precisely as possible. Scholars and astronomers were the first to try and capture knowledge through exacting measurements. But when it comes to big data, it s necessary to accept a degree of messiness, or imprecision, in order to include greater amounts of data. It s a tradeoff - exchanging accuracy for a bigger, broader picture. Algorithms are developing too. Moore s Law says that the number of transistors that can be squeezed onto a chip doubles every two years. The capability of the algorithms that power our systems is also increasing. But interestingly, higher data volumes contribute more strongly to results than do improved algorithms. Researchers at Microsoft proved this in 2000, when they found that the performance of four different grammar checkers improved based on how many words, or data, were entered, with the most impressive results seen with the highest volume of data. The evolution of computer translation shows how increased volumes of data lead to a better result. Computer scientists in the 50s first tried programming grammar rules and dictionaries. In the 80s, engineers tried having computers use statistical probability to translate from one language to another. And in the 90s, IBM used 10 years of French and English parliamentary transcripts to lay the groundwork for statistical machine translation, using math to calculate the appropriate translation. In 2006, Google entered the translation field, and used the entire Internet as its data source. The huge success of Google Translate (now at more than 60 languages) is more due to the volume of data in its training set, estimated at a trillion words, than it is due to the sophistication of the algorithm it uses. The way that we categorize content is also becoming more messy. Historically, we ve had hierarchical systems, such as the Dewey Decimal system for classifying books. This type of taxonomy doesn t work very well with data that s ever-changing. The photo-sharing site Flickr, for example, uses user-defined tags to label its more than six billion photos. Database design is evolving beyond the records and preset fields that compose hierarchies of information. Companies like Visa and ZestFinance are taking advantage of open-source systems to process large amounts of data quickly, but more messily. Only five percent of digital data fits neatly into a traditional database. By accepting messiness, researchers can access the other 95 percent that includes videos and web pages, and gain a much broader view. ZestFinance, a loan provider, analyzes a large number of variables in determining whether to provide a loan, and fills in where data is missing. As a result, its loan 3

default rate was a third less than the industry average in 2012. Correlation - Finding Meaning in The Relationship Between Two Data Values The statistical relationship between two data values represents a correlation. If the correlation is strong, a change in one of the data values is likely to result in a change in the other. A good illustration is Google Flu Trends, where an increased number of Internet searches for terms relating to flu in a particular area indicates that more people have the illness in that location. To analyze a trend, we need to identify a proxy for it. In the case of flu, the proxies are the terms used in the Internet search. When we successfully identify the right proxy, we can accurately assess the present - and in many ways foretell the future. The correlation concept was introduced in 1888 by Sir Francis Galton. He discovered a relationship between height and forearm length, which was easily supported with mathematics. For a long time, experts trying to prove something started with a hypothesis, picked the proxies that they thought were involved, and then worked backwards to collect and analyze the data to support the hypothesis. Now, it s not necessary to come up with a hypothesis. Instead, we can let correlation analysis tell us what Internet terms people are searching for, to unveil other insights. Amazon pioneered the approach of recommending books and products to its customers, based on their shopping preferences. When it compared the sales results of recommendations written by its staff of expert book reviewers to those generated by computer, the computer recommendations were far more likely to result in sales. Other companies are successfully monetizing correlation analysis of big data. A U.S. company called Experion analyzed its database of credit histories against the anonymous tax data from the Internal Revenue Service. The cost to confirm someone s income through government sources is about $10, but based on its correlation analysis, Experion can offer an estimate for $1. A predictive model, developed by Deloitte Consulting for insurance company Aviva, will use consumer-marketing information and credit reports to stand in as proxies for actual blood and urine testing for some insurance applicants. Health risks can be accurately identified by analyzing lifestyle information including hobbies, number of hours spent watching TV, etc. The savings will be considerable - while lab tests cost about $125 per person, the correlation analysis costs about $5 apiece. Predictive analytics foresees events, or trends, before they actually happen. Sensors on bridges, motors, or machines, can monitor and identify data patterns that point to trouble so that problems can be fixed before there s a catastrophic failure. Data analysis is also being used to predict crimes. Many parole boards use predictive analysis to decide whether or not to let someone out of jail. Health care is also benefitting from correlation analysis. When all the data generated 4

by patients is collected, and analyzed - heart rate, temperature, blood oxygen level, and more it s possible to detect the tiny changes that signal a change in condition, well before it s possible for doctors to see it. Datafication - Quantifying Everything So It Can Be Analyzed Data is something that can be documented, analyzed and reorganized. Datafication means putting something in a quantifiable format, which can then be tabulated and analyzed. It s different from digitization - the conversion of analog data into the ones and zeros understood by computers. The roots of datafication go back to the earliest civilizations. For example, keeping records helped farmers predict weather and crop yields. With the advent of mathematics, data could be analyzed, rather than simply recorded and reviewed. Google s goal to put all the books it could on the Internet illustrates how data is different from datafication. First, Google scanned all the pages of all the books it intended to put on the Internet. This resulted in a one page file of each page in a book - an image, that could only be read by a person. Then, Google employed optical character-recognition software to read the words and paragraphs on the page. Once the material in the books was datafied, all of the content could be analyzed, by computers and algorithms, to discover all sorts of new things, like when a phrase was first used, or how certain words have changed over time. Google also used this datafied material to enhance its translation solution. Location can also be datafied. UPS uses geo-loco data to locate its vehicles, track employees, and improve routes. By analyzing this location data, UPS managed to save three million gallons of fuel in 2011. Data from cell phones that collect geo-loco information (either knowingly or unknowingly) can be used to promote local businesses or restaurants. But the possibilities of collecting geo-loco data from cell phones are bigger. For example, traffic jams can be inferred from the number and speed of phones moving along a highway. AirSage is a company that turns 15 billion geo-loco records into real-time traffic reports for U.S. cities. Others, such as Sense Networks and Skyhook use geo-loco data to determine which parts of the city have the best nightlife, or how many people are taking part in a protest. GreenGoose is a startup that sells small sensors that can be put on just about anything to track usage. The datafication of consumer use of certain products is of huge interest to many companies. The Evolution of Data Into a Good That is Sold In the world of big data, data is evolving from supporting transactions to becoming the good that is sold. And thanks to technology, data is available at a much lower cost than ever before. It never wears out, and it can be used for multiple purposes. Many companies collect data to learn about a specific trend or product, but the 5

beauty of data is that is can be re-used, by the company that collected it, or by countless others, to discover many other things. The option value of data represents all of its potential uses. In the big-data age, data is like a magical diamond mine that keeps on giving long after its principal value has been tapped, say the authors. There are three powerful ways to extract the option value of data: basic reuse, merging datasets, and extensible data. Reuse of Data: Data collected for one use can be re-used for another. An example is Hitwise, a webtraffic-measurement company that analyzes Internet searches to provide its clients with information on consumer preferences. Another example is the Bank of England, which analyzes property search queries to see if housing prices are going up or down. These companies are obtaining and re-using data from Internet searches to tease out the facts that are relevant to their businesses. The re-use of data represents a valuable opportunity for organizations which collect data, but don t use it widely. For McKinsey & Company consultants, the data collected by a shipping company client was the inspiration for a new business division that sells business and economic forecasts. Merging Datasets: Combining two sets of data can provide even more information. A Danish study combined cell phone records with lists of cancer patients to see if there was a link between cell phone use and cancer (there wasn t). Internet mashups like Zillow combine different sources of information to present a visual representation of home values in specific neighborhoods. Extensible Data: By ensuring that data can be used for multiple purposes from the outset, it s possible to gain more use from it. Google does this while collecting data for Google Street View - as its cars are collecting images, they re also collecting GPS, wifi information and more. It s usually inexpensive to add data points, so it makes sense to collect - and keep - more data than you might currently need. Who s Making Money From Big Data? So far, three kinds of big data companies have emerged: 1. Companies that have the data. An example is Twitter, which has the data, but uses outside resources to license it to others. 2. Companies that specialize in analyzing the data. Companies like Teradata do data analytics for a wide variety of companies including Walmart. 3. Companies that have the ideas and the big data mindset. Decide.com has the big data vision. Its founders came up with the idea and process to analyze online data to showcase the lowest pricing and pricing trends for 6

consumers - in real-time. Who s Who in the Value Chain Data holders control access to the information, either for their own use or to license to others. Data specialists like Accenture conduct complex analysis for their clients. Then, there are individuals and companies with big-data mindsets. For now, these visionaries are in the power position. Their skills and ideas currently hold the most value. But a new category is emerging - data intermediaries. These companies collect data from numerous sources, and aggregate it into something of value. Inrix collects geoloco and mobile phone data from 100 million vehicles in Europe and North America. It combines this with information on weather, traffic patterns and events to anticipate how traffic will move. The end product on traffic conditions is sent to car navigation systems, and also to auto fleets. One of the biggest impacts of big data will be the diminishing of instinct. Big data will show businesses the way to go, not the expert who operates on gut feeling. Researchers at MIT have found that companies that make decisions based on data are as much as six percent more productive than their non-data-driven counterparts. The structure of entire industries will be reshaped, say the authors, as big data evolves into a competitive advantage. The Risks to Privacy The increasing loss of privacy is one downside of big data. Another danger is the possibility of organizations using big data to punish people based on what they believe they may do, instead of what they have done. Police are already using algorithm models to plan their patrols. People who shop at online retailers may consent to data collection for the purpose of that particular purchase. But when that retailer licenses the data to someone else, what happens to the consent? In many cases, data is anonymized before handing it over to a licensee, or before it is released publicly. This posed problems for Netflix and AOL, though, when researchers took their publicly released information and were able to correlate with other data available online to successfully identify people in the anonymized data. Traditional ways of guaranteeing privacy - like opting out, anonymization, and consent - don t work in the context of big data. It s not just corporations that are amassing data. Governments are doing it too. Perhaps, say Mayer-Schönberger and Cukier, the users of the data should be accountable for ensuring its confidentiality. And if they fail to protect anonymity, they should be censured - with laws, fines or criminal prosecutions. The authors also suggest that a new profession, that of algorithmist, be created, and that algorithmists be charged with monitoring the way that organizations are using big data, and its impact on individuals. 7

Big Data Will Bring About Big Changes The possession of knowledge, which once meant an understanding of the past, is coming to mean an ability to predict the future, say Mayer-Schönberger and Cukier. They prophesy that the changes wrought by big data will be no less important than those that resulted from other pivotal developments, like the invention of the Gutenberg press and the launch of the Internet. Climate change, health care and economic development are all going to be impacted by developments in big data. But the human contribution - ingenuity and ambition - will still be what compels us, as a society, to move forward. 8