A Proposal for the use of Artificial Intelligence in Spend-Analytics



Similar documents
Data Mining Solutions for the Business Environment

Customer Classification And Prediction Based On Data Mining Technique

OUTLIER ANALYSIS. Data Mining 1

Business Intelligence. Data Mining and Optimization for Decision Making

DATA MINING TECHNIQUES AND APPLICATIONS

Introduction. A. Bellaachia Page: 1

Short-Term Forecasting in Retail Energy Markets

Introduction to Data Mining

Direct Marketing of Insurance. Integration of Marketing, Pricing and Underwriting

Learning is a very general term denoting the way in which agents:

An Overview of Knowledge Discovery Database and Data mining Techniques

Data Mining Applications in Higher Education

Bayesian networks - Time-series models - Apache Spark & Scala

COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES

DATA PREPARATION FOR DATA MINING

Data Mining System, Functionalities and Applications: A Radical Review

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Building a Database to Predict Customer Needs

SPATIAL DATA CLASSIFICATION AND DATA MINING

not possible or was possible at a high cost for collecting the data.

Content. Management Summary... 3

THE PREDICTIVE MODELLING PROCESS

Comprehensive Business Budgeting

PROGRAM DIRECTOR: Arthur O Connor Contact: URL : THE PROGRAM Careers in Data Analytics Admissions Criteria CURRICULUM Program Requirements

University of Gaziantep, Department of Business Administration

Information Management course

Comparison of K-means and Backpropagation Data Mining Algorithms

HIGH PRECISION MATCHING AT THE HEART OF MASTER DATA MANAGEMENT

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Social Media Mining. Data Mining Essentials

Sales and Invoice Management System with Analysis of Customer Behaviour

CHAPTER 6 FINANCIAL FORECASTING

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Data, Measurements, Features

Data Mining Part 5. Prediction

This has been categorized into two. A. Master data control and set up B. Utilizing master data the correct way C. Master data Reports

Database Marketing, Business Intelligence and Knowledge Discovery

DEMAND FORECASTING METHODS

Finance sector application of the SATURN Intelligent Data Analytics and Visualisation Platform

GETTING AHEAD OF THE COMPETITION WITH DATA MINING

SINGULAR SPECTRUM ANALYSIS HYBRID FORECASTING METHODS WITH APPLICATION TO AIR TRANSPORT DEMAND

Mimicking human fake review detection on Trustpilot

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Software Engineering of NLP-based Computer-assisted Coding Applications

Rule based Classification of BSE Stock Data with Data Mining

Advanced Ensemble Strategies for Polynomial Models

Chapter 20: Data Analysis

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith

IBM's Fraud and Abuse, Analytics and Management Solution

Telecommunication (120 ЕCTS)

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

How To Use Neural Networks In Data Mining

Data Mining and Analytics in Realizeit

Why is Internal Audit so Hard?

How the Internet is Impacting Revenue Management, Pricing, and Distribution. E. Andrew Boyd Chief Scientist and Senior Vice President PROS

Chapter ML:XI. XI. Cluster Analysis

Advanced analytics at your hands

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

White Paper February IBM Cognos Supply Chain Analytics

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

Data Mining Approach For Subscription-Fraud. Detection in Telecommunication Sector

TDS - Socio-Environmental Data Science

Position Classification Flysheet for Computer Science Series, GS Table of Contents

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

Acquiring new customers is 6x- 7x more expensive than retaining existing customers

IT services for analyses of various data samples

INCORPORATING PREDICTIVE ANALYTICS

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED

Supply chain intelligence: benefits, techniques and future trends

BUSINESS INTELLIGENCE E ANALYST Business Unit:

Learning outcomes. Knowledge and understanding. Competence and skills

Measurement Information Model

Contact centre Performance and Key Performance Indicators

Navigating Big Data business analytics

How To Solve The Kd Cup 2010 Challenge

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Search and Data Mining: Techniques. Introduction Anna Yarygina Boris Novikov

Spend Enrichment: Making better decisions starts with accurate data

Data Warehouse: Introduction

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

University of Plymouth. Programme Specification. M.Eng. Mechanical Engineering

Study Plan for the Master Degree In Industrial Engineering / Management. (Thesis Track)

The Importance of Analytics

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

NEURAL NETWORKS IN DATA MINING

Cross-Validation. Synonyms Rotation estimation

Zoho CRM. Getting Started. Guidelines for Beginners

Introduction to Pattern Recognition

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

EMPOWER WITH DATA YOUR BUSINESS AND KEEPING IT SAFE. maximizing data s business value

Artificial Intelligence in Retail Site Selection

How To Find Influence Between Two Concepts In A Network

The Masters of Science in Information Systems & Technology

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Experiments in Web Page Classification for Semantic Web

Transcription:

A Proposal for the use of Artificial Intelligence in Spend-Analytics Mark Bishop, Sebastian Danicic, John Howroyd and Andrew Martin Our core team Mark Bishop PhD studied Cybernetics and Computer Science at the University of Reading. He is Professor of Cognitive Computing at Goldsmiths, University of London and between 2010-2014 was Chair of the society for the study of Artificial Intelligence and the Simulation of Behaviour (AISB), the largest Artificial Intelligence Society in the United Kingdom. He has published widely in areas of Artificial Intelligence, Machine Learning and Neural Computing. John Howroyd PhD studied Mathematics at Oxford University and University College, London. As well a being an expert mathematician John has published widely in computer science in particular in program analysis. He has also worked as Head of Research in a major project developing a Spend-Analytics system for NHS trusts. The particular problems solved for this system involved dealing with noisy and incomplete data. John and his team devised new techniques for automatically enriching this data in a structured way using external sources. He has in-depth knowledge of Bayesian Networks, classification and clustering methods and is also an experienced database engineer specialising in efficiency. Sebastian Danicic PhD studied Pure Mathematics at Queen Mary College, London, and Computer Science at University of Oxford and Imperial College London. He is Reader of Computer Science at Goldsmiths, University of London. He is a vastly experienced researcher with publications in Program Analysis, Theoretical Computer Science, Complexity of Algorithms and Software Watermarking. He is Director of the Program Analysis and Transformation Group at Goldsmiths. 1

Andrew Martin MSc studied Computer Science and Cybernetics at the University of Reading and Cognitive Computing at Goldsmiths, University of London under Mark Bishop. He is a current PhD Student at Goldsmiths, University of London researching Artificial Intelligence in the context of 4E s Cognitive Science, a Software Contractor, and the current Secretary of the AISB. Together we have broad experience of many aspects of Mathematics, Computer Science and Artificial Intelligence. We have had considerable success in working together as a team developing both new research ideas and deliverables to customers. Background In our document entitled The Centre for Intelligent Data Analytics: research goals of Dec 2013 we identified the following research areas where advanced Artificial Intelligence (AI) techniques can assist the delivery of medium and long term strategic goals for our partner s Analytics. Semantics At the heart of spend analysis is the general problem of forming an accurate, detailed semantic understanding of items from the raw text information that is available to the system (e.g. product descriptions). This data must be analysed using the existing knowledge base; there may, however, sometimes not be enough current context to unambiguously understand this data; in such circumstances it may be necessary to enrich information via additional user interaction and/or web spidering. To help solve such semantic issues there is scope for application of new AI techniques; for example, deep learning and reservoir computing and the newly emerging area of quantum linguistics 1 1 Maruyama reports: Quantum linguistics emerged from the spirit of categorical quantum mechanics, integrating Lambek pregroup grammar, which is qualitative, and the vector space model of meaning, which is quantitative, into the one concept via the methods of category theory. It has already achieved, as well as conceptual lucidity, experimental successes in automated synonymity-related judgement tasks (such as disambiguation). For a brief introduction see Jacob Aron, (2010), Quantum links let computers understand language, New Scientist December 2010. 2

Identification of similar suppliers and products Previous work by the team has already demonstrated the need to build contextually sensitive ontologies for product descriptions. These can aid both in the core classification of both products and suppliers. Improvements in this technology will lead to better identification of equivalent products; such improvements can be envisaged as applying in two distinct ways: 1. To [better] identify as the same a particular entity originally made by one manufacturer. 2. To [better] identify products that fulfil the same functional role but which are but subsequently sold on by different suppliers, made by different manufacturers. Clearly the abstract notion of equivalent functionality opens up further questions regarding the relative quality of one product as compared to another etc. Access to our partner s huge database offers exciting new opportunities for state-of-art data-mining and quantum linguistics to help make useful progress in this domain. Different learning algorithms for classification based on clean data Access to the our partner s database opens up new opportunities to research state-of-art machine learning techniques (e.g. deep learning 2 ; reservoir computing; echo-state networks) which potentially could also offer a significant improvement in classification performance. Automatic ontology generation Ontologies are structural frameworks for organising information and are used in artificial intelligence as a form of knowledge representation about the world (or some part of it). An ontology formally represents knowledge as a set of concepts within a domain, using a shared vocabulary to denote the types, properties and interrelationships of those concepts. Automatically 2 Deep learning is part of a broader family of machine learning methods based on learning representations. A field (e.g. a product) can be represented in many different ways (e.g. different sentences), but some representations make it easier to learn tasks of interest (e.g. Is this drill the same as that one?) from examples. 3

developing contextually sensitive ontologies will significantly improve the classification system. Trend analysis: prediction of future price fluctuations To explore the use of our partner s database to identify economic trends in purchasing via the application of advanced machine learning techniques; the expectation is that with access to the huge database, new learning algorithms could be trained to make commercially useful time-series predictions (e.g. to highlight strategic opportunities for investment etc.). The Way Forward Informed by the demonstration, it appears that there are two separate, but inter-linked, pathways to be developed: 1. Spend-analytics for buyers (SAB) 2. Global Spend Analytics (GSA) GSA is an entirely new yet-to-be-specified system. Perhaps a good way to summarise the concerns of SAB is to provide a system for purchasing managers which allows them to find the most easily achievable savings from their data with least effort spent, so maximising their return on time invested; the team highlight some of the general considerations that a SAB system might address in Appendix 1. In the system demonstrated, the production of spend-analytics from the buyers perspective only uses transactions between the buyer using the system and its suppliers. (In each case, a tiny proportion of the all the data). All of the rest of the vast data set is ignored in performing this analysis. We term the calculation of price variance at the individual supplier level as local spend analytics as it pertains to a local subset of the total data set pertaining to a specific supplier. Local spend-analytics The system demonstrated by out partner at the meeting on April 17th demonstrated computation of local price variance on the same product 4

supplied by the same supplier; a product-identity relationship. Because the data possessed by our partner is clean, product-identity is a relatively straight-forward function to calculate requiring no application of Artificial Intelligence methods 3. It is clear, however, that if the current system is extended to include more general analysis the application of AI-based techniques cannot be avoided. E.g. As it is possible that different suppliers may use [subtly] different text to describe the same product, a simple identity relationship between text descriptors may no longer hold; in this case we need to class as the same text strings that are [by a suitable metric] similar. We note that even the relatively simple sounding task of comparing prices of the same product supplied to the buyer by different suppliers defines a problem whose solution is considerably more difficult than the example demonstrated. Global spend-analytics Global spend analytics, on the other hand, will take advantage of the whole dataset and other external data to allow us, for example, to observe general trends, and to predict strategic risks and opportunities. Although the use of artificial intelligence can improve local spend analytics - in the example we highlighted by allowing the application of a similarity metric for product identification - for global spend analytics, the use of advanced AI will be essential. Proposed improvements to spend-analytics Although the system demonstrated on April 17th highlights an immediate and exciting potential revenue stream for our partner, we are concerned that [future] competitors could realise similar functionality relatively easily. In this context the team have identified the following broad research pathways by which the Analytics might significantly, and non-trivially, be improved (we expand and appropriately outline these ideas in Appendix 2); furthermore we suggest that the use of appropriate advanced AI techniques could offer more clearly delineated intellectual property rights to our partner: 1. Real time processing 3 Because of the clean nature of the data it is likely that local product-identity can be established by a simple comparison 5

2. Better price variance analysis 3. Adding reporting dimensions 4. Knowledge enrichment from external sources 5. Clustering or classification of products 6. Improved search functionality 7. User behaviour to improve results 8. Modelling the market for better statistics 9. Trend analysis for predictive forecasting Concluding remarks If our partner seeks to fully monetise their data assets, more effective, deeper analytics will inevitably be required. In order to achieve this, some or all of the nine areas identified above need to be investigated (not least to develop and delineate long term intellectual property across the domain). It is in these areas that the team would seek to apply powerful new AI methodologies to leverage strategic advantage in the medium and long term. Appendix 1: Some considerations of spend analysis In our experience - in the context of spend analytics - the following kinds of issues are often of concern to purchasing managers: Price Variance The same product bought from the same supplier at various prices. This suggests areas where contracted prices may be considered. Supplier Consolidation The same product bought from differing suppliers at various prices. This suggests where preferred suppliers may help to reduce overall costs. Product Consolidation Differing products with the same functional role bought at various prices. This has many difficulties as it raises questions of quality and cost of utility but could result in overall reductions in expenditure. 6

Order Consolidation Products bought frequently in small amounts where savings could be achieved by placing fewer bulk orders. Contract Adherence Products bought off contract when one exists but at a higher price. This would require comparing invoice lines against a database of contracted pricing. Order Adherence Products supplied which were not requested. This requires matching invoices with the relevant orders where they exist and raising concerns with the supplier in good time. Peripheral Cost Savings Reducing peripheral charges such as VAT, Invoicing, Delivery, and Credit. Internal Cost Savings Reducing internal expenses such as Storage, Stock control, Cost of accounting, and delivery to point of use. Spend Forecasting Given current market trends what is the likely expenditure in the future for the various parts of the business. NB. It is unlikely that any spend analysis system can fully resolve these problems (as this will always require the application of problem specific knowledge and experience from purchasing managers), however by appropriately analysing current and past data, a strong spend analysis system can provide appropriate information to purchasing managers, from which they can make good purchasing decisions more easily. Appendix 2: Potential improvement pathways for The Analytics Real time processing There is a commercial advantage to report in real time for spend analysis. This will allow purchasing managers to raise concerns on particular invoices prior to payment being made. With appropriate consideration of the information architecture this can be achieved allowing incremental improvements to be reflected in the reporting as they arise. 7

Better price variance analysis As an example of the system s functionality we were shown how it reports potential savings to buyers on the same product supplied by the same supplier. At the prototype demonstration it appeared that the system effectively estimates potential savings by computing how much would have been paid if all products had been bought at the minimum price (and then subtracting that amount from that which was actually paid). The team remain concerned that such an approach may [at least occasionally] give rise to an exaggerated view of potential savings to buyers: for example, the data set may contain a single outlier, representing, say, a special offer out of many transactions with a much cheaper price and it will normally be unreasonable to use this singleton as a basis for comparing all other purchases. Furthermore, the price of items may fluctuate seasonally and it would be unreasonable to expect to pay the summer price for tomatoes in the winter. We suspect that if a SAB system merely highlighted variations from the minimum price, this feature might eventually be ignored by its users. We suggest that for customers to take price variation seriously a more sophisticated approach is required; one that can take all of the above factors into account. In much the same way that the Google page-rank algorithm gave rise to a better reflection of the importance of specific web pages (and hence prompted the long term shift of web search services from Alta-Vista to Google) we believe that a similarly clever algorithm for ordering possible savings could offer a much better reflection of the importance of individual price variation to the user. As soon as the SAB system is extended to include less specific analysis (e.g. the task of comparing prices of an identical product supplied to the buyer by different suppliers), the application of advanced artificial intelligence techniques (from areas such as quantum linguistics, data mining, machine-learning and clustering ) cannot easily be avoided 4. 4 E.g. Instead of simply reporting variances of minimum prices, more sophisticated algorithms could inform buyers which products were most likely to yield the largest savings (taking into account seasonal fluctuations etc.) and offer the user the chance to ignore outliers in performing the analysis. In addition, we suggest that inflation and other market forces should also be taken into account in presenting more accurate estimated potential savings to buyers. 8

Allowing buyers to add reporting dimensions It would also seem natural to allow the users to influence the overall reporting of possible savings. This could include, for example, the ability to up load cost centre codes, accounting codes, their own product classifications, or contract data (agreed pricing of products from various suppliers). In this context the team suggest investigating the extent to which AI technology could be used in a predictive manner to help reduce the burden of maintaining such dimensions as new product items are supplied or new suppliers are engaged. Knowledge enrichment from external sources The @UKplc SpendInsight system was designed to incorporate noisy data from many different sources; for example, order lines, supplier catalogues, contract databases, account systems, and the web; in this respect the clean database of invoices is now the base start point. Where there is information to support the underlying data this could also be linked. AI technologies (as deployed in the SpendInsight system) can offer mechanisms to do this in a way that keeps the data sources distinct and thus enables complete control over what is shown to specific users. Clustering or classification of products Clustering is essential in useful spend-analytics. The task of moving from identical to similar items is very difficult and requires a variety of techniques many of which fall under the general heading of artificial intelligence. For example a SAB system may be required to perform a more general analysis about pens. In order to do this, we need to find all products in our system which come under that category. This is a very hard task and can never be performed to 100% accuracy except with very small data sets. In fact in order to help solve the problem we may have to look outside our local dataset possibly even resorting to spidering the Web in a search for hints about how to classify products whose internal descriptions are not sufficiently helpful. Without clustering, the same product supplied by a different supplier will be regarded as a different product; to identify them as the same is a very difficult problem. When are two products produced by different suppliers in fact the same? This sort of question is solved using algorithms from artificial intelligence and can only be answered probabilistically. Furthermore clustering is essential whenever we want to ask questions in a more general 9

way. Without clustering, we may be able to ask questions like How is supplier X performing this month? but if we want to ask questions like: How is supplier X performing this month compared to other similar suppliers? things become much more complex. We need to be able to find ways of clustering similar suppliers. Presumably, inter-alia, similar suppliers sell similar products. Deciding if two different products are similar, however, is an even more difficult problem than deciding whether they are identical. This problem may require the use of external data produced as the result of spidering and state of art semantic text analysis such as quantum linguistics. Improved search functionality A nice feature demonstrated was the search functionality when filtering the result set by product. However, this relied purely on selecting products containing the search terms in their invoice descriptions. The system has many possibilities for improvement but these require a degree of semantic understanding (e.g. that the word transit should be treated synonymously with carriage ). The search functionality would also be improved by using enriched data from external sources such as fuller product descriptions from supplier catalogues. Similarly a fine-grained clustering of classification of products could be used to broaden searches over specific types of product. We see this as a series of incremental steps to provide the buyers with the search functionality that they require. User behaviour to improve results We can also add knowledge by analysing user supplied data and user behaviour. User supplied data allows for a more tailored interface to the user, but also when aggregated across all users gives semantic information from the human perspective which may be leveraged in many ways. Similarly user behaviour can also be mined providing an important feedback loop for the relevant learning algorithms. For example, noting which products are most frequently grouped together for comparison gives an additional mechanism for addressing which products are similar. This can then be used to adjust the parameters for the ranking of products in a search. 10

Modelling the market for better statistics The data for one product item from one supplier to one buyer is generally so sparse that accurate analysis and predictions are not possible (a problem statisticians might call over-fitting). Of course, something is probably better than nothing, but the value will be limited and without care expectations could be artificially raised. The notion of similarity as discussed under clustering or classification of products allows for hierarchical modelling of products. High level groupings with lots of members have lots of data and smoother behaviour giving rise to better models. Lower level groupings have fewer members and less data, but their models should be influenced (and smoothed) by the models of the higher level groups to which they belong. This enables better predictions to be made at these lower levels by allowing influence from above Trend analysis for predictive forecasting The market modelling will enable comparative analysis of the various products and product groupings. Thus building up a network of correlations over the market place, with strong correlation between much more related parts of the market but also some which are more distant (and perhaps unexpected). Temporal properties may also be examined; for example where growth in one is usually followed by growth in another. This together with standard time series techniques should provide a rich toolbox for trend analysis and predictive forecasting. Contact Us If you have follow up questions contact Andrew Martin at a.martin@gold. ac.uk who will answer your question directly, or pass it on to the other members of the team. We look forward to receiving your enquiries. 11