EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION 4.0



Similar documents
Master of Arts in Mathematics

What is Modeling and Simulation and Software Engineering?

Introduction to time series analysis

Executive Summary Principles and Standards for School Mathematics

Big Ideas in Mathematics

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001

Advanced Big Data Analytics with R and Hadoop

Practical Time Series Analysis Using SAS

o-minimality and Uniformity in n 1 Graphs

Math 4310 Handout - Quotient Vector Spaces

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Introduction to Engineering System Dynamics

Abstraction in Computer Science & Software Engineering: A Pedagogical Perspective

Copyright. Network and Protocol Simulation. What is simulation? What is simulation? What is simulation? What is simulation?

New Tracks in B.S. in Mathematics

How many numbers there are?

1: B asic S imu lati on Modeling

Math at a Glance for April

Introduction to computer science

The D-Wave 2X Quantum Computer Technology Overview

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

ArcGIS Data Models Practical Templates for Implementing GIS Projects

The Basics of Graphical Models

Genetic Algorithm Evolution of Cellular Automata Rules for Complex Binary Sequence Prediction

Master of Mathematical Finance: Course Descriptions

Protein Protein Interaction Networks

Engineering Process Software Qualities Software Architectural Design

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Reputation Management Algorithms & Testing. Andrew G. West November 3, 2008

The Master s Degree with Thesis Course Descriptions in Industrial Engineering

How To Get A Master'S Degree In Mathematics In Norway

CREDIT TRANSFER: GUIDELINES FOR STUDENT TRANSFER AND ARTICULATION AMONG MISSOURI COLLEGES AND UNIVERSITIES

On the Traffic Capacity of Cellular Data Networks. 1 Introduction. T. Bonald 1,2, A. Proutière 1,2

RUSRR048 COURSE CATALOG DETAIL REPORT Page 1 of 6 11/11/ :33:48. QMS 102 Course ID

Information Visualization WS 2013/14 11 Visual Analytics

parent ROADMAP MATHEMATICS SUPPORTING YOUR CHILD IN HIGH SCHOOL

Christian Bettstetter. Mobility Modeling, Connectivity, and Adaptive Clustering in Ad Hoc Networks

MEng, BSc Applied Computer Science

Introducing Formal Methods. Software Engineering and Formal Methods

How To Understand The Theory Of Probability

Database Marketing, Business Intelligence and Knowledge Discovery

Dashboards with Live Data For Predictive Visualization. G. R. Wagner, CEO GRW Studios, Inc.

3 More on Accumulation and Discount Functions

Electrical and Computer Engineering Undergraduate Advising Manual

Elementary School Mathematics Priorities

(Refer Slide Time: 01:52)

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: X DATA MINING TECHNIQUES AND STOCK MARKET

TRU Math Conversation Guide

INTRODUCTION TO ALGEBRAIC GEOMETRY, CLASS 16

Credit Number Lecture Lab / Shop Clinic / Co-op Hours. MAC 224 Advanced CNC Milling MAC 229 CNC Programming

MEng, BSc Computer Science with Artificial Intelligence

MSCA Introduction to Statistical Concepts

How To Understand The World Of Simple Programs

Teaching Business Statistics through Problem Solving

Efficient Curve Fitting Techniques

Bootstrapping Big Data

Why Get an M.Eng. in CS or Anything Else? Prof. Charlie Van Loan CS M.Eng. Program Director

Generic Polynomials of Degree Three

Software Engineering and Service Design: courses in ITMO University

Data Mining mit der JMSL Numerical Library for Java Applications

Title: Integrating Management of Truck and Rail Systems in LA. INTERIM REPORT August 2015


POSTECH SUMMER SCHOOL 2013 LECTURE 4 INTRODUCTION TO THE TRACE FORMULA

Programme Specification (Undergraduate) Date amended: 27 February 2012

Social Media Mining. Data Mining Essentials

Prescriptive Analytics. A business guide

You know from calculus that functions play a fundamental role in mathematics.

Linear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University

Better planning and forecasting with IBM Predictive Analytics

FIBER PRODUCTS AND ZARISKI SHEAVES

Factoring & Primality

INTRUSION PREVENTION AND EXPERT SYSTEMS

Max-Min Representation of Piecewise Linear Functions

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

Notes on Complexity Theory Last updated: August, Lecture 1

Structure of Presentation. The Role of Programming in Informatics Curricula. Concepts of Informatics 2. Concepts of Informatics 1

A Framework for the Delivery of Personalized Adaptive Content

Prediction of Stock Performance Using Analytical Techniques

A capacity planning / queueing theory primer or How far can you go on the back of an envelope? Elementary Tutorial CMG 87

The Four-Color Problem: Concept and Solution

POLYNOMIAL RINGS AND UNIQUE FACTORIZATION DOMAINS

Ivo Wenzler: Simulations as learning from the future

Creating, Solving, and Graphing Systems of Linear Equations and Linear Inequalities

Knowledge-based systems and the need for learning

Mathematics programmes of study: key stage 4. National curriculum in England

Data, Measurements, Features

CONTINUED FRACTIONS AND FACTORING. Niels Lauritzen

How To Use Neural Networks In Data Mining

AML710 CAD LECTURE 1. Introduction to CAD. Motivation Principles of CAD / CAM. AML710 Computer Aided Design

Customer Analytics. Turn Big Data into Big Value

EXPLORING SPATIAL PATTERNS IN YOUR DATA

APPENDIX F Science and Engineering Practices in the NGSS

The equivalence of logistic regression and maximum entropy models

JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

Affine-structure models and the pricing of energy commodity derivatives

Transcription:

EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION 4.0 ELINOR L. VELASQUEZ Dedicated to the children and the young people. Abstract. This is an outline of a new field in predictive analytics: topologicalgeometrical-analytic-algebraic predictive analytics, a cellular-based data analytics field - the notion here is an old one: cellular in the biological sense, an essential biological theme often referred to as form and function, in the biological sense, in other words, what biologists often emphasize by use form and function. Mathematics intertwines with bio-mimicry to reform the foundation of what has long been a simple, yet elegant prediction theory - the theory encapsulated by the deceptively modest Central Limit Theorem. Later papers will focus on specifics and applications of this rather simple notion. 1. Introduction Predictive analytics, a subfield of data analytics, estimates outcomes of future events using probability and statistics based tools. While algorithm designs are rapidly developing with regard to information science as an outcome of machine learning, mathematical modeling, and data mining, the actual foundation for making a prediction remains the still: The premise of prediction essentially on the Central Limit Theorem. What is typically predicted using the Central Limit Theorem? Here is a toy example. Consider predicting if Patient X will have the flu given their body temperature. The standard method is to obtain a list of patient body temperatures arrived via a Date: June 2, 2015. Thank you to all the library staff and administrative personnel of the University of California, The San Francisco State University, The City College of San Francisco, The Public Libraries of San Francisco, and the civil workers of San Francisco for providing such a welcoming and pleasant environment to create such innovative technical research. Thank you to all the members of the Bioinformatics Department, University of California, San Cruz for providing such ongoing stimulating conversation and encouragement. 1

EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION 4.02 sampling of flu patients temperatures, to estimate a sample mean T of body temperature, and to measure Patient X s temperature, T X. The Central Limit Theorem gives us a confidence or probability as to how likely it is that Patient X has the flu: P (µ µ 0 < T X < µ + µ 0 ) = P (a < T < b) = p, meaning there exists a probability, p, that Patient X will have the flu, given temperature T X. To make this prediction, it is needed to have a dataset of flu patients, each with their own unique health pathology, as well as a population mean, µ 0, of flu patient body temperature, derived from another dataset. Even if a quantity of data is bootstrapped in order to create a theoretical population flu patient temperature dataset, the fact remains that the prediction really rests on the sample mean, T, simply a single numerical value. In fact, the whole population of flu patients may be viewed as a large universe equipped with a data invariant, actually the canonical mean defined as the flu temperature. If the set of data invariants is enlarged to more than one value or estimate, an increase and improvement in the possible methods of prediction ought to occur. The key message here is to examine a given dataset subspace of the universe of all information, denoted as Ω, and to look for other useful data invariants, to improve predictive estimates. Thus, one roadmap for creating novel predictive analytics methodologies is quite clear: Step 1. Consider novel data invariants for a given dataset. Step 2. Consider how we use these data invariants, both individually and collectively, to create new predictions for a scenario. Step 3. Consider novel ways to estimate the accuracy of these predictions that we have made via our novel collection of data invariants for any given subset. 2. Introduction Of A New Field: Topological-Geometrical-Analytical-Algebraic Predictive Analytics While the Central Limit Theorem has worked reasonably well for most applications using small data sets, the current and future needs of data analytics applications which apply information theory concepts require complex rationales because of Big Data constraints; Big Data has unusual data landscapes not well understood until supercomputer algorithms have been slowed or halted due to nuances in the big Data that were not previously apparent nor even possibly considered prior to the

EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION 4.03 data processing. Big Data has hills, mountain ranges, gullies and valleys, not just the data meadows common to small data; this is a well recognized fact. What is not well understood is how to manage such unusual geographical territory? In the days of small data and static or deterministic dynamics assumptions, computer processing power and capability for standard technical applications was indeed trivial. Supercomputing algorithms and Big Data together create a new world: The Big World. It is necessary to significantly focus attention on expanding what have been long accepted mathematical foundations in data analysis, meaning it is important to create novel themes in statistical and mathematical modeling, in addition to enlarging current work in areas quite well established, such as nonlinear (both deterministic and stochastic) dynamics. For instance, applying stochastic dynamics in areas such as climate theory to study climate data has been done innumerable times. For instance, what is most needed is a better understanding of how Big Data will affect not just possible outcomes arising via standard stochastic dynamics modeling, but also how Big Data affects novel ways of thinking which naturally result when predictive analytics is applied to what is really an old problem: What happens when the climate moves in directions completely chaotic? Simple regression applied to practically any historical data predicts that the earth is warming and global change is happening, but simple or even nonlinear regression cannot possibly predict how global water movement will change as time increases in a realistic manner, especially when regression is confronted with Big Data. For such practical applications, innovative, artful ways of creating non-standard modeling are simply de rigueur. Thus, it is necessary to reconsider all the standard ways of prediction, in order to extrapolate and move beyond what used to be simply called forecasting. It is necessary to reconsider what exactly is the Central Limit Theorem and why did it in particular become the foundation for basic hypothesis testing? Certainly there are other ways of arriving at a prediction. The definite opening up of a entirely new world in prediction theory becomes apparent, when approached via this mindset. Cellular predictive analytics, at first glance, may be modeled using a topologicalgeometrical-analytical-algebraic predictive analytics approach, for envisioning new ways of predicting outcomes, in hopes of deriving more optimal solutions to the most complicated of problems, but ideally, the journey does not sit still on this particular path only but easily segues in directions not in any presently imagined universe. This offering of a prediction theory which is based on a theme so common in bio-mimicry, that of modeling how Nature behaves, can feed-forward and feedback to other more desirable, sophisticated ways of thinking, e.g. what is presently considered the basis of knowledge.

EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION 4.04 2.1. Rationale for This New Theory: Topological-Geometrical-Analytical- Algebraic Predictive Analytics Equals Cellular Predictive Analytics - Form and Function. 2.2. Justification for Why Such a New Theory is needed: Precision Medicine Predictive Analytics. 3. The Foundational Theory Recall that Ω is the universe of all information. Equip the universe, Ω be with the laws of physics, allowing the concept that certain laws will be discovered as time evolves, once time begins. Note that the universe could have easily been conceived or have been equipped using other laws or other concepts. This is just one representation. Definition 1. The Physics Laws Representation of the Universe of All Information, Ω. Let time to be unidirectional, so that time evolves only in the forward direction. Denote the Canonical Ghost-child map or GC map, by Q, the Present Theory of Everything Entropy or T OE Entropy, by S, and the canonical universal measure or UM, by µ. The universe, Ω, as represented by the laws of physics, has the following definition: [Q, ˆQ] = is, with ˆQ denoting the Fourier transform of Q, and the laws of physics encoded by [Q, ˆQ]dµ = 0. C What is key here is that information can be parsed into components, and it is the components and their interactions that are to be studied in this theory. Each component an has exceptionally well defined structure which allows the data to be reformulated in a manner more tractable for amenable predictions. Each of the canonical ingredients is then reformulated in an explicit way in each specific component. Remark: In later versions of this outline, a discussion will be provided so that the foundational theory will become quite explicit regarding the structure of these components and how they are used and when, etc. At present, the foundational theory is completely esoteric, but soon will become quite explicit in later versions.

EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION 4.05 4. Description of Cellular Predictive Analytics: The Cellular Components Currently, two fields helpful for data analysis are topological data analysis and information geometry. These two fields are also being adapted to work on predictive analytics problems. 4.1. Current field: Topological Data Analysis. 4.2. Emerging field: Topological Predictive Analytics. 4.2.1. Current Topological Invariant for Topological Data Analysis: The Persistence Diagram. 4.2.2. Proposal for Canonical Topological Invariant for Predictive Analytics: The Canonical Knot Polynomial. Consider the Kaufmann bracket as the best canonical knot polynomial at present. Question is: Can we do better? 4.3. Current field: Geometric Information. 4.4. Emerging field: Geometric Predictive Analytics. 4.4.1. Proposal for Canonical Geometric Invariant for Geometric Predictive Analytics: The Canonical Curvature. Open question: Is there a coordinate-free definition for the canonical curvature, or are coordinates needed for the best representation? 4.5. New Field: Analytic Predictive Analytics. 4.5.1. Proposal for Canonical Analytic Invariant for Analytic Predictive Analytics: The Canonical Automorphic Form. 4.6. New Field: Algebraic Predictive Analytics. 4.6.1. Proposal for Canonical Algebraic Invariant for Algebraic Predictive Analytics: The Canonical Determinant.

EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION 4.06 4.7. Open Questions. 1. Does there exist yet a discrete version of geometric topology surgery theory? 2. Let DG : G SG be the deformation map from the group G to the semigroup SG; this map is quite advantageous. The open question is what is the best representation for DG that is optimal for predictive analytics?