Predictive Analytics, Data Mining and Big Data

Similar documents
Marketing in Context

Rethinking Peacekeeping, Gender Equality and Collective Security

Gendering the International Asylum and Refugee Debate

This page intentionally left blank

Mediatized Worlds. Copyright material from - licensed to npg - PalgraveConnect

The History of Human Resource Development

Palgrave Macmillan Studies in Banking and Financial Institutions

Copyright material from - licensed to npg - PalgraveConnect

Young Shakespeare s Young Hamlet

Praise for Changing Employee Behavior

NEXT GENERATION TALENT MANAGEMENT

Net Work. Palgrave. macmillan. Ethics and Values in Web Design. Helen Kennedy Senior Lecturer, University of Leeds

The Social Life of Connectivity in Africa

The Palgrave Macmillan The Welfare State as Crisis Manager

The Clinical Nurse Specialist: Issues in Practice

Muslim Moroccan Migrants in Europe

Computer Security Within Organizations

Philosophical Issues in Nursing

Comparative Early Childhood Education Services

Customer and Business Analytic

NEXT GENERATION TALENT MANAGEMENT

Copyright material from - licensed to npg - PalgraveConnect

Family Law. Blackstone s Statutes on. Mika Oldham. 23rd edition. edited by. MA, PhD. Fellow of Jesus College, Cambridge

superseries FIFTH EDITION

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Developing Courses in English for Specific Purposes

This is a sample chapter from A Manager's Guide to Service Management. To read more and buy, visit BSI British

Ukulele In A Day. by Alistair Wood FOR. A John Wiley and Sons, Ltd, Publication

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, p i.

Typeset by MPS Limited, Chennai, India.

DATA MINING TECHNIQUES AND APPLICATIONS

This page has been left blank intentionally

College of Occupational Therapists Specialist Section Independent Practice. Code of Business Practice

ALMS TERMS OF USE / TERMS OF SERVICE Last Updated: 19 July 2013

Psychology for Language Learning

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Property Investment Appraisal UNCORRECTED PROOF

Understanding the New ISO Management System Requirements

Contents. Table of Statutes. Table of Secondary Legislation. Table of Cases. Understanding Undefended Debt Claims. Enforcement of Money Judgments

How To Understand The Differences Between The 2005 And 2011 Editions Of Itil 20000

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Carbon Trading in China

The Translation Service Provider s Guide to BS EN 15038

ARC, LLC Proprietary Statement

Data Visualization. Principles and Practice. Second Edition. Alexandru Telea

Breathe Well and Live Well with COPD. preview

Guide to Practice Management for Small- and Medium-Sized Practices. User Guide

Predictive Modeling and Big Data

Effective Methods for Software and Systems Integration

International Marketing Research

Using telehealth to monitor patients remotely:

Data Domain Profiling and Data Masking for Hadoop

To Mum and Dad and my wife Julie

Introduction to the ISO/IEC Series

SECURITY AND QUALITY OF SERVICE IN AD HOC WIRELESS NETWORKS

DMX-h ETL Use Case Accelerator. Web Log Aggregation

Deployment of Predictive Models. Sumit Kumar Bardhan

Web development, intellectual property, e-commerce & legal issues. Presented By: Lisa Abe

Contents. Table of Statutes. Table of Secondary Legislation. Table of Cases. Pre-action Conduct of Litigation

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Intellectual Property Law and Interactive Media

New York StartUP! 2013 Business Plan Competition Company Profile

(the "Website") is provided by Your Choice Counselling.

Fundamentals of the Average Case Analysis of Particular Algorithms

Information & ICT Security Policy Framework

Azure Machine Learning, SQL Data Mining and R

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

AN INTRODUCTION TO OPTIONS TRADING. Frans de Weert

R s and Predictive Modeling Boot Camp Nov. 8-9, Session #1: Predictive Modeling: An Overview Syed Muzayan Mehmud, ASA, FCA, MAAA

Mining. Practical. Data. Monte F. Hancock, Jr. Chief Scientist, Celestech, Inc. CRC Press. Taylor & Francis Group

INSIGNIA MEDICAL SYSTEMS LTD PRIVACY POLICY

Integrated Reservoir Asset Management

The following terms apply to your purchase of Shopify brought to you by Rogers, provided by Rogers supplier, Shopify Inc.

Department of Veterans Affairs VA DIRECTIVE 6001 LIMITED PERSONAL USE OF GOVERNMENT OFFICE EQUIPMENT INCLUDING INFORMATION TECHNOLOGY

4.7 Website Privacy Policy

Consult Yourself. The NLP Guide to Being a Management Consultant. Carol Harris

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

Switching and Finite Automata Theory

How To Create A Virtual World From A Computer World

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

2010 Data Miner Survey Highlights

Human Rights in European Criminal Law

Security in Fax: Minimizing Breaches and Compliance Risks

HP Laptop & Apple ipads

Assessing the Quality of Doctoral Programs in Criminology in the United States*

Engineering Drawing Practices

Preliminary Considerations. This chapter will enable you to achieve the following learning outcomes from the CILEx syllabus:

The Practice Nurse. Theory and practice. Pauline] effree SPRINGER-SCIENCE+BUSINESS MEDIA. B.V.

TERMS & CONDITIONS: LIMITED LICENSE:

Release System Administrator s Guide

Transcription:

Predictive Analytics, Data Mining and Big Data

This page intentionally left blank

Predictive Analytics, Data Mining and Big Data Myths, Misconceptions and Methods Steven Finlay

Steven Finlay 2014 All rights reserved. No reproduction, copy or transmission of this publication may be made without written permission. No portion of this publication may be reproduced, copied or transmitted save with written permission or in accordance with the provisions of the Copyright, Designs and Patents Act 1988, or under the terms of any licence permitting limited copying issued by the Copyright Licensing Agency, Saffron House, 6 10 Kirby Street, London EC1N 8TS. Any person who does any unauthorized act in relation to this publication may be liable to criminal prosecution and civil claims for damages. The author has asserted his right to be identified as the author of this work in accordance with the Copyright, Designs and Patents Act 1988. First published 2014 by PALGRAVE MACMILLAN Palgrave Macmillan in the UK is an imprint of Macmillan Publishers Limited, registered in England, company number 785998, of Houndmills, Basingstoke, Hampshire RG21 6XS. Palgrave Macmillan in the US is a division of St Martin s Press LLC, 175 Fifth Avenue, New York, NY 10010. Palgrave Macmillan is the global academic imprint of the above companies and has companies and representatives throughout the world. Palgrave and Macmillan are registered trademarks in the United States, the United Kingdom, Europe and other countries. ISBN 978 1 137 37927 6 This book is printed on paper suitable for recycling and made from fully managed and sustained forest sources. Logging, pulping and manufacturing processes are expected to conform to the environmental regulations of the country of origin. A catalogue record for this book is available from the British Library. A catalog record for this book is available from the Library of Congress. Typeset by MPS Limited, Chennai, India.

To Ruby and Samantha

This page intentionally left blank

Contents Figures and Tables Acknowledgments x xii 1 Introduction 1 1.1 What are data mining and predictive analytics? 2 1.2 How good are models at predicting behavior? 6 1.3 What are the benefits of predictive models? 7 1.4 Applications of predictive analytics 9 1.5 Reaping the benefits, avoiding the pitfalls 11 1.6 What is Big Data? 13 1.7 How much value does Big Data add? 16 1.8 The rest of the book 19 2 Using Predictive Models 21 2.1 What are your objectives? 22 2.2 Decision making 23 2.3 The next challenge 31 2.4 Discussion 34 2.5 Override rules (business rules) 36 3 Analytics, Organization and Culture 39 3.1 Embedded analytics 40 3.2 Learning from failure 42 3.3 A lack of motivation 43 3.4 A slight misunderstanding 45 3.5 Predictive, but not precise 50 3.6 Great expectations 52 3.7 Understanding cultural resistance to predictive analytics 54 3.8 The impact of predictive analytics 60 vii

viii Contents 3.9 Combining model-based predictions and human judgment 62 4 The Value of Data 65 4.1 What type of data is predictive of behavior? 66 4.2 Added value is what s important 70 4.3 Where does the data to build predictive models come from? 73 4.4 The right data at the right time 76 4.5 How much data do I need to build a predictive model? 79 5 Ethics and Legislation 85 5.1 A brief introduction to ethics 86 5.2 Ethics in practice 89 5.3 The relevance of ethics in a Big Data world 90 5.4 Privacy and data ownership 92 5.5 Data security 96 5.6 Anonymity 97 5.7 Decision making 99 6 Types of Predictive Models 104 6.1 Linear models 106 6.2 Decision trees (classification and regression trees) 112 6.3 (Artificial) neural networks 114 6.4 Support vector machines (SVMs) 118 6.5 Clustering 120 6.6 Expert systems (knowledge-based systems) 122 6.7 What type of model is best? 124 6.8 Ensemble (fusion or combination) systems 128 6.9 How much benefit can I expect to get from using an ensemble? 130 6.10 The prospects for better types of predictive models in the future 131 7 The Predictive Analytics Process 134 7.1 Project initiation 135 7.2 Project requirements 138 7.3 Is predictive analytics the right tool for the job? 142 7.4 Model building and business evaluation 143 7.5 Implementation 145

Contents ix 7.6 Monitoring and redevelopment 149 7.7 How long should a predictive analytics project take? 154 8 How to Build a Predictive Model 157 8.1 Exploring the data landscape 158 8.2 Sampling and shaping the development sample 159 8.3 Data preparation (data cleaning) 162 8.4 Creating derived data 163 8.5 Understanding the data 164 8.6 Preliminary variable selection (data reduction) 165 8.7 Pre-processing (data transformation) 166 8.8 Model construction (modeling) 170 8.9 Validation 171 8.10 Selling models into the business 172 8.11 The rise of the regulator 176 9 Text Mining and Social Network Analysis 179 9.1 Text mining 179 9.2 Using text analytics to create predictor variables 181 9.3 Within document predictors 181 9.4 Sentiment analysis 184 9.5 Across document predictors 185 9.6 Social network analysis 186 9.7 Mapping a social network 191 10 Hardware, Software and All that Jazz 194 10.1 Relational databases 197 10.2 Hadoop 200 10.3 The limitations of Hadoop 202 10.4 Do I need a Big Data solution to do predictive analytics? 203 10.5 Software for predictive analytics 206 Appendix A. Glossary of Terms 209 Appendix B. Further Sources of Information 218 Appendix C. Lift Charts and Gain Charts 223 Notes 227 Index 246