CIPHER Briefing The Importance of Analytics July 2014 Renting 1 machine for 1,000 hours will be nearly equivalent to renting 1,000 machines for 1 hour in the cloud. This will enable users and organizations to rapidly accomplish complex tasks that were previously prohibited by cost or time constraints Microsoft Research, The Economics Of The Cloud, 2010 90 Long Acre London, WC2E 9RA +44 (0) 20 7420 0221
CIPHER Briefing The importance of analytics 1 CIPHER Briefings CIPHER Briefings provide more information about the data and computer science techniques used to produce CIPHER. This briefing reviews the development of computer-based analytics and explains why analytics is not about removing human insight from the process, but isolating and executing tasks that are better done by machines. Steve Harris, CTO. AISTEMOS Limited July 2014 Executive Summary Computer-based analytics lies at the heart of CIPHER. It is fundamental to CIPHER s ability to aggregate, organise and analyse vast quantities of patent data in real time, to deliver simple visualisations and actionable insight. With that in mind, it is useful to understand what analytics is, and why recent advances in everything from analytics techniques to computing power mean that the time is right to apply powerful analytics to patent data. 1. Different types of analytics There are a number of different types of analytics, which can be employed to solve various problems, including: Statistics The one most people recognise. It s familiar, but still very powerful. Most people would say that modern statistics started in the 17th Century, and is a field that is still developing. Machine learning Training computers to recognise patterns. The software learns either by a pre-prepared example, or spots patterns in data presented to it in that instant. Foundational work was done in the 20s and 30s, building on advances in statistics, though machine learning didn t become a significant field in its own right until the 1960s with the emergence of increasingly practical digital computers. Before that point a computer was a person who performed calculations for a living. The field only 90% of the world s data was created in the last two years. 80% of the data is unstructured. - Gartner Research, 2013
CIPHER Briefing The importance of analytics 2 became truly practical in the 00s, and affordable to apply in real time over significant datasets relatively recently. Natural Language Processing Trying to extract meaning or structure from text sources. This can be used for indexing, matching, or spotting key data. The first systems arrived in the 1960s, but only became practical relatively recently as cheaper computing power became available and using results and input from machine learning algorithms. There are also specialised types of analytics for dealing with certain types of data, e.g. Graph Theoretic approaches which focus on algorithms relating to graph structures. Note that in this sense, graph means a network of nodes and edges (lines), rather than a chart. 2. The power of analytics Scalability Analytics is important because it can be called on at any time, with no notice, to produce results humans cannot be scaled so easily. Software cannot give the same kind of insight that you can get from a person, but you can do a lot with brute force mathematics. If a single computer can process every patent published in the last 20 years (AISTEMOS currently has close to 50m publications in its database) in 15 minutes, 1000 computers can do it, and aggregate the results, in under a second. By contrast, a person could give you insight into a single patent in, say, 10 minutes, but it would take 1000 people four years to read them all, and no individual would be able to compare notes with the other 999 efficiently. Cost efficiency Because of computing as a service (also known as cloud computing) throwing 1000 computers at a problem is now economical. In mid 2014 a virtual machine suitable for number crunching analytics costs in the region of 20 per hour, so for around $200 you can process every patent over 3000 times. Until recently, having 1000 computers around for occasional bursts of activity was totally uneconomical, especially because of the hardware specification required to run machine learning algorithms. Now you can rent them for an hour, then put them back in the pool. By comparison, $200 might get you a junior lawyer / consultant for an hour (substantially less time if requiring a patent attorney / lawyer / consultant with long and international experience in the field) in 42 Number of price reductions for cloud storage and computing for Amazon Web Services since 2008
CIPHER Briefing The importance of analytics 3 which time they could give you deeper insight into a handful of patents, but could not do the millions of simple, far reaching tasks that computers are good at. Computer versus human Some of the things analytics algorithms do seem very clever to humans, because humans are not good at using brute force mathematics to solve problems, e.g. the algorithm we use to find our Comparable organisations looks at a very large amount of data, and does a vast number of complex computations in order to arrive at the answer, but this is exactly the sort of tasks that computers excel at. What appears clever is really just a very, very large number of simple operations done on a very large amount of data. A human asked to solve the same problem would never attempt to solve it that way, as it would take many lifetimes. Therefore, we should use computers for what computers are good at; number crunching, brute force, perfect recall, and humans for what humans are good at; insight, analogy, referencing back to past experience. 96% Of companies feel that analytics will become more important to their organizations in the next three years, but they also felt that a great deal of data is still not being used for decisionmaking. - Deloitte Analytics Survey, 2013 3. Unlocking new opportunities 1 Analytics reveal facts and patterns that give cause to wonder. By aggregating the world s patent data and related events such as licensing and litigation, CIPHER enables the world s business community to visualise the drivers of risk and value across the innovation landscape. While CIPHER can reveal the what, it will always remain the domain of human intelligence to investigate the why, and to implement the appropriate response to the analytics and insight presented. 1 Terminology adopted in Big data solutions to determining IP risk and value, Nigel Swycher, IAM Magazine, July/August 2014
CIPHER Briefing The importance of analytics 4 Glossary Algorithm: An algorithm is a step-by-step procedure for calculations. Algorithms are used for calculation, data processing, and automated reasoning. Cloud computing: A cloud computing system is one where computers are made available for rental by the hour or day from a large pool of hardware. The computers are virtualised in such a way that they appear to be separate machines connected on a simple network, where in fact they are drawn together from slices of much larger machines that are spatially distributed. The advantage of this is that, rather than facing the upfront purchase, and ongoing maintenance cost of a large cluster of computers, they can be rented for only the time required, and returned to the pool when no longer needed. Comparables: A CIPHER Comparable is an organisation who owns a patent portfolio that is substantially similar to the Target s portfolio. We consider both the degree of overlap in technologies between the portfolios, and also the overall size of the portfolio, in order to find organisations that can be most usefully compared to the Target. Graph theory: Graph theory deals with the processing of mathematical graphs (i.e. networks). Graphs are composed of nodes, or vertices, connected by edges, or lines. Graphs can be used to represent many real-world phenomena, but a classic example in the IP world is to model citations between patents. Sources & Further Reading Wikipedia References: http://en.wikipedia.org/wiki/natural_language_p rocessing http://en.wikipedia.org/wiki/cloud_computing http://en.wikipedia.org/wiki/graph_theory Big Data: A Revolution That Will Transform How We Live, Work and Think Viktor Mayer-Schonberger, Kenneth Cukier (2013) Big Data Solutions To Determining IP Risk And Value Nigel Swycher, IAM Magazine July/August (2014) The Analytics Advantage Deloitte Research (2013) Available online at: http://deloitte.wsj.com/cfo/files/2014/06/2013analytics-advantage-survey.pdf The Economics Of The Cloud Rolf Harms, Michael Yamartino Microsoft Research (2010) Available online at: http://www.microsoft.com/enus/news/presskits/cloud/docs/the-economicsof-the-cloud.pdf Machine learning: Machine learning systems are algorithms that can learn from data. Either data provided with human input (supervised learning), or by spotting patterns in the data (unsupervised learning). Natural language processing: Natural language processing systems attempt to process text written in a natural language, e.g. English. They either take a statistical approach, or they attempt to understand the grammar and sentence structure. NLP techniques can be applied to text to try and get to the underlying meaning, summarise, or determine the topic of the text. 2014 Aistemos 90 Long Acre London, WC2E 9RA +44 (0) 20 7420 0221