EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION 4.0


 Marianna Lucas
 1 years ago
 Views:
Transcription
1 EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION 4.0 ELINOR L. VELASQUEZ Dedicated to the children and the young people. Abstract. This is an outline of a new field in predictive analytics: topologicalgeometricalanalyticalgebraic predictive analytics, a cellularbased data analytics field  the notion here is an old one: cellular in the biological sense, an essential biological theme often referred to as form and function, in the biological sense, in other words, what biologists often emphasize by use form and function. Mathematics intertwines with biomimicry to reform the foundation of what has long been a simple, yet elegant prediction theory  the theory encapsulated by the deceptively modest Central Limit Theorem. Later papers will focus on specifics and applications of this rather simple notion. 1. Introduction Predictive analytics, a subfield of data analytics, estimates outcomes of future events using probability and statistics based tools. While algorithm designs are rapidly developing with regard to information science as an outcome of machine learning, mathematical modeling, and data mining, the actual foundation for making a prediction remains the still: The premise of prediction essentially on the Central Limit Theorem. What is typically predicted using the Central Limit Theorem? Here is a toy example. Consider predicting if Patient X will have the flu given their body temperature. The standard method is to obtain a list of patient body temperatures arrived via a Date: June 2, Thank you to all the library staff and administrative personnel of the University of California, The San Francisco State University, The City College of San Francisco, The Public Libraries of San Francisco, and the civil workers of San Francisco for providing such a welcoming and pleasant environment to create such innovative technical research. Thank you to all the members of the Bioinformatics Department, University of California, San Cruz for providing such ongoing stimulating conversation and encouragement. 1
2 EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION 4.02 sampling of flu patients temperatures, to estimate a sample mean T of body temperature, and to measure Patient X s temperature, T X. The Central Limit Theorem gives us a confidence or probability as to how likely it is that Patient X has the flu: P (µ µ 0 < T X < µ + µ 0 ) = P (a < T < b) = p, meaning there exists a probability, p, that Patient X will have the flu, given temperature T X. To make this prediction, it is needed to have a dataset of flu patients, each with their own unique health pathology, as well as a population mean, µ 0, of flu patient body temperature, derived from another dataset. Even if a quantity of data is bootstrapped in order to create a theoretical population flu patient temperature dataset, the fact remains that the prediction really rests on the sample mean, T, simply a single numerical value. In fact, the whole population of flu patients may be viewed as a large universe equipped with a data invariant, actually the canonical mean defined as the flu temperature. If the set of data invariants is enlarged to more than one value or estimate, an increase and improvement in the possible methods of prediction ought to occur. The key message here is to examine a given dataset subspace of the universe of all information, denoted as Ω, and to look for other useful data invariants, to improve predictive estimates. Thus, one roadmap for creating novel predictive analytics methodologies is quite clear: Step 1. Consider novel data invariants for a given dataset. Step 2. Consider how we use these data invariants, both individually and collectively, to create new predictions for a scenario. Step 3. Consider novel ways to estimate the accuracy of these predictions that we have made via our novel collection of data invariants for any given subset. 2. Introduction Of A New Field: TopologicalGeometricalAnalyticalAlgebraic Predictive Analytics While the Central Limit Theorem has worked reasonably well for most applications using small data sets, the current and future needs of data analytics applications which apply information theory concepts require complex rationales because of Big Data constraints; Big Data has unusual data landscapes not well understood until supercomputer algorithms have been slowed or halted due to nuances in the big Data that were not previously apparent nor even possibly considered prior to the
3 EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION 4.03 data processing. Big Data has hills, mountain ranges, gullies and valleys, not just the data meadows common to small data; this is a well recognized fact. What is not well understood is how to manage such unusual geographical territory? In the days of small data and static or deterministic dynamics assumptions, computer processing power and capability for standard technical applications was indeed trivial. Supercomputing algorithms and Big Data together create a new world: The Big World. It is necessary to significantly focus attention on expanding what have been long accepted mathematical foundations in data analysis, meaning it is important to create novel themes in statistical and mathematical modeling, in addition to enlarging current work in areas quite well established, such as nonlinear (both deterministic and stochastic) dynamics. For instance, applying stochastic dynamics in areas such as climate theory to study climate data has been done innumerable times. For instance, what is most needed is a better understanding of how Big Data will affect not just possible outcomes arising via standard stochastic dynamics modeling, but also how Big Data affects novel ways of thinking which naturally result when predictive analytics is applied to what is really an old problem: What happens when the climate moves in directions completely chaotic? Simple regression applied to practically any historical data predicts that the earth is warming and global change is happening, but simple or even nonlinear regression cannot possibly predict how global water movement will change as time increases in a realistic manner, especially when regression is confronted with Big Data. For such practical applications, innovative, artful ways of creating nonstandard modeling are simply de rigueur. Thus, it is necessary to reconsider all the standard ways of prediction, in order to extrapolate and move beyond what used to be simply called forecasting. It is necessary to reconsider what exactly is the Central Limit Theorem and why did it in particular become the foundation for basic hypothesis testing? Certainly there are other ways of arriving at a prediction. The definite opening up of a entirely new world in prediction theory becomes apparent, when approached via this mindset. Cellular predictive analytics, at first glance, may be modeled using a topologicalgeometricalanalyticalalgebraic predictive analytics approach, for envisioning new ways of predicting outcomes, in hopes of deriving more optimal solutions to the most complicated of problems, but ideally, the journey does not sit still on this particular path only but easily segues in directions not in any presently imagined universe. This offering of a prediction theory which is based on a theme so common in biomimicry, that of modeling how Nature behaves, can feedforward and feedback to other more desirable, sophisticated ways of thinking, e.g. what is presently considered the basis of knowledge.
4 EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION Rationale for This New Theory: TopologicalGeometricalAnalytical Algebraic Predictive Analytics Equals Cellular Predictive Analytics  Form and Function Justification for Why Such a New Theory is needed: Precision Medicine Predictive Analytics. 3. The Foundational Theory Recall that Ω is the universe of all information. Equip the universe, Ω be with the laws of physics, allowing the concept that certain laws will be discovered as time evolves, once time begins. Note that the universe could have easily been conceived or have been equipped using other laws or other concepts. This is just one representation. Definition 1. The Physics Laws Representation of the Universe of All Information, Ω. Let time to be unidirectional, so that time evolves only in the forward direction. Denote the Canonical Ghostchild map or GC map, by Q, the Present Theory of Everything Entropy or T OE Entropy, by S, and the canonical universal measure or UM, by µ. The universe, Ω, as represented by the laws of physics, has the following definition: [Q, ˆQ] = is, with ˆQ denoting the Fourier transform of Q, and the laws of physics encoded by [Q, ˆQ]dµ = 0. C What is key here is that information can be parsed into components, and it is the components and their interactions that are to be studied in this theory. Each component an has exceptionally well defined structure which allows the data to be reformulated in a manner more tractable for amenable predictions. Each of the canonical ingredients is then reformulated in an explicit way in each specific component. Remark: In later versions of this outline, a discussion will be provided so that the foundational theory will become quite explicit regarding the structure of these components and how they are used and when, etc. At present, the foundational theory is completely esoteric, but soon will become quite explicit in later versions.
5 EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION Description of Cellular Predictive Analytics: The Cellular Components Currently, two fields helpful for data analysis are topological data analysis and information geometry. These two fields are also being adapted to work on predictive analytics problems Current field: Topological Data Analysis Emerging field: Topological Predictive Analytics Current Topological Invariant for Topological Data Analysis: The Persistence Diagram Proposal for Canonical Topological Invariant for Predictive Analytics: The Canonical Knot Polynomial. Consider the Kaufmann bracket as the best canonical knot polynomial at present. Question is: Can we do better? 4.3. Current field: Geometric Information Emerging field: Geometric Predictive Analytics Proposal for Canonical Geometric Invariant for Geometric Predictive Analytics: The Canonical Curvature. Open question: Is there a coordinatefree definition for the canonical curvature, or are coordinates needed for the best representation? 4.5. New Field: Analytic Predictive Analytics Proposal for Canonical Analytic Invariant for Analytic Predictive Analytics: The Canonical Automorphic Form New Field: Algebraic Predictive Analytics Proposal for Canonical Algebraic Invariant for Algebraic Predictive Analytics: The Canonical Determinant.
6 EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION Open Questions. 1. Does there exist yet a discrete version of geometric topology surgery theory? 2. Let DG : G SG be the deformation map from the group G to the semigroup SG; this map is quite advantageous. The open question is what is the best representation for DG that is optimal for predictive analytics?
Hypercomputation: computing more than the Turing machine
Hypercomputation: computing more than the Turing machine Abstract: Toby Ord Department of Philosophy * The University of Melbourne t.ord@pgrad.unimelb.edu.au In this report I provide an introduction to
More informationExecutive Summary Principles and Standards for School Mathematics
Executive Summary Principles and Standards for School Mathematics Overview We live in a time of extraordinary and accelerating change. New knowledge, tools, and ways of doing and communicating mathematics
More informationAsset Management B.C.
Asset Management BC Roadmap Project Guide for using the Asset Management BC Roadmap Asset Management B.C. ROADMAP MAY 2011 Asset Management BC ROADMAP PROJECT Guide for using the Asset Management BC Roadmap
More informationHow many numbers there are?
How many numbers there are? RADEK HONZIK Radek Honzik: Charles University, Department of Logic, Celetná 20, Praha 1, 116 42, Czech Republic radek.honzik@ff.cuni.cz Contents 1 What are numbers 2 1.1 Natural
More informationBig Data Analytics: Disruptive Technology in the Water Industry?
International Water Association World Congress on Water, Climate and Energy The Convention Centre Dublin, Ireland May 1318, 2012 Plenary Keynote Address Big Data Analytics: Disruptive Technology in the
More informationAnalysis of dynamic sensor networks: power law then what?
Analysis of dynamic sensor networks: power law then what? (Invited Paper) Éric Fleury, JeanLoup Guillaume CITI / ARES INRIA INSA de Lyon F9 Villeurbanne FRANCE Céline Robardet LIRIS / CNRS UMR INSA de
More informationAnalysis, Design and Implementation of a Helpdesk Management System
Analysis, Design and Implementation of a Helpdesk Management System Mark Knight Information Systems (Industry) Session 2004/2005 The candidate confirms that the work submitted is their own and the appropriate
More informationEvery Good Key Must Be A Model Of The Lock It Opens. (The Conant & Ashby Theorem Revisited)
Every Good Key Must Be A Model Of The Lock It Opens (The Conant & Ashby Theorem Revisited) By Daniel L. Scholten Every Good Key Must Be A Model Of The Lock It Opens Page 2 of 45 Table of Contents Introduction...3
More informationChapter 3 Study Design and Methodology
Chapter 3 Study Design and Methodology 3.1. Introduction This study conducted exploratory and descriptive research on the creation of a specific information technology standard to gain an understanding
More informationA SelfDirected Guide to Designing Courses for Significant Learning
A SelfDirected Guide to Designing Courses for Significant Learning L. Dee Fink, PhD Director, Instructional Development Program University of Oklahoma Author of: Creating Significant Learning Experiences:
More informationMethods for Understanding Student Learning University of Massachusetts Amherst Contributing Authors:
R Urban Policy Calculus Lively Arts Minute Paper Accounting Syllabus Metaphysics Pre/Post Cervantes Cyberlaw Primary Trait COURSEBased Review and Assessment Methods for Understanding Student Learning
More informationIntellectual Need and ProblemFree Activity in the Mathematics Classroom
Intellectual Need 1 Intellectual Need and ProblemFree Activity in the Mathematics Classroom Evan Fuller, Jeffrey M. Rabin, Guershon Harel University of California, San Diego Correspondence concerning
More informationA guide for students by students. ProblemBased Learning at HYMS
A guide for students by students ProblemBased Learning at HYMS JUNE 2012 Contents 1) Introduction: The PBL Induction Programme.pp.34 2) What is PBL and where did it come from? pp.512 PBL in context
More informationPractical Predictive Analytics for Healthcare 101. A white paper by Steven S. Eisenberg, MD
Practical Predictive Analytics for Healthcare 101 A white paper by Steven S. Eisenberg, MD You cannot scan a healthcare related newspaper, newsfeed, magazine or website these days without seeing a reference
More informationHighway Capacity and Quality of Service
A3A10: Committee on Highway Capacity and Quality of Service Secretary: Richard G. Dowling, Dowling Associates Highway Capacity and Quality of Service WAYNE K. KITTELSON, Kittelson & Associates, Inc. This
More informationWHEN ARE TWO ALGORITHMS THE SAME?
WHEN ARE TWO ALGORITHMS THE SAME? ANDREAS BLASS, NACHUM DERSHOWITZ, AND YURI GUREVICH Abstract. People usually regard algorithms as more abstract than the programs that implement them. The natural way
More informationA Formalization of Digital Forensics 1
Abstract A Formalization of Digital Forensics 1 Ryan Leigland University of Idaho Axel W. Krings 2 IDIMAG, France Forensic investigative procedures are used in the case of an intrusion into a networked
More informationTop 10 Reasons Faculty Fail When Using Blackboard CMS
Top 10 Reasons Faculty Fail When Using Blackboard CMS Dr. Penny Johnson Computer Science Carroll College johnsonp@cc.edu Abstract In today s ever increasing world of information technology it is not enough
More informationThe Gödel Phenomena in Mathematics: A Modern View
Chapter 1 The Gödel Phenomena in Mathematics: A Modern View Avi Wigderson Herbert Maass Professor School of Mathematics Institute for Advanced Study Princeton, New Jersey, USA 1.1 Introduction What are
More informationYou Could Have Invented Spectral Sequences
You Could Have Invented Spectral Sequences Timothy Y Chow Introduction The subject of spectral sequences has a reputation for being difficult for the beginner Even G W Whitehead (quoted in John McCleary
More informationSurvey. PROGRAMBased. Capstone Communications. Portfolios Chemistry. Interviews. Review and Assessment Tools and Techniques for Program Improvement
R Linguistics Management Philosophy Portfolios Chemistry Survey Interviews Nutrition Biology Capstone Communications PROGRAMBased Review and Assessment Tools and Techniques for Program Improvement office
More informationBig Data: Issues and Challenges Moving Forward
2013 46th Hawaii International Conference on System Sciences Big Data: Issues and Challenges Moving Forward Stephen Kaisler i_sw Corporation skaisler1@comcast.net Frank Armour American University fjarmour@gmail.com
More informationMaximizing the Spread of Influence through a Social Network
Maximizing the Spread of Influence through a Social Network David Kempe Dept. of Computer Science Cornell University, Ithaca NY kempe@cs.cornell.edu Jon Kleinberg Dept. of Computer Science Cornell University,
More informationChapter 2 The Question: Do Humans Behave like Atoms?
Chapter 2 The Question: Do Humans Behave like Atoms? The analogy, if any, between men and atoms is discussed to single out what can be the contribution from physics to the understanding of human behavior.
More informationHomotopy Type Theory: A synthetic approach to higher equalities
Homotopy Type Theory: A synthetic approach to higher equalities Michael Shulman 1 Introduction Ask an average mathematician or philosopher today about the foundations of mathematics, and you are likely
More informationA GUIDE TO SERVICE IMPROVEMENT
A GUIDE TO SERVICE IMPROVEMENT Measurement Analysis Techniques and Solutions Tools and techniques for the delivery of modern health care A GUIDE TO SERVICE IMPROVEMENT Measurement Analysis Techniques and
More informationA Cooperative Agreement Program of the Federal Maternal and Child Health Bureau and the American Academy of Pediatrics
A Cooperative Agreement Program of the Federal Maternal and Child Health Bureau and the American Academy of Pediatrics Acknowledgments The American Academy of Pediatrics (AAP) would like to thank the Maternal
More informationData Abstraction and Hierarchy
Data Abstraction and Hierarchy * This research was supported by the NEC Professorship of Software Science and Engineering. Barbara Liskov Affiliation: MIT Laboratory for Computer Science Cambridge, MA,
More informationUsing Focal Point Learning to Improve HumanMachine Tacit Coordination
Using Focal Point Learning to Improve HumanMachine Tacit Coordination Inon Zuckerman 1, Sarit Kraus 1, Jeffrey S. Rosenschein 2 1 Department of Computer Science BarIlan University RamatGan, Israel {zukermi,
More informationCapacity Planning DISCIPLINE FOR DATA CENTER DECISIONS
Capacity Planning DISCIPLINE FOR DATA CENTER DECISIONS TQEB01 Rev. A Copyright 2004 TeamQuest Corporation All Rights Reserved Like what you see? Subscribe. Table of Contents 1 Introduction...11 2 The
More information