Modelling and Big Data. Leslie Smith ITNPBD4, October 10 2015. Updated 9 October 2015

Similar documents
Artificial Neural Network and Non-Linear Regression: A Comparative Study

a 1 x + a 0 =0. (3) ax 2 + bx + c =0. (4)

Designing a neural network for forecasting financial time series

What is Modeling and Simulation and Software Engineering?

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

November 16, Interpolation, Extrapolation & Polynomial Approximation

1 Review of Least Squares Solutions to Overdetermined Systems

DRAFT. Further mathematics. GCE AS and A level subject content

COMBINED NEURAL NETWORKS FOR TIME SERIES ANALYSIS

RELEVANT TO ACCA QUALIFICATION PAPER P3. Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam

South Carolina College- and Career-Ready (SCCCR) Algebra 1

3.1. RATIONAL EXPRESSIONS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

= δx x + δy y. df ds = dx. ds y + xdy ds. Now multiply by ds to get the form of the equation in terms of differentials: df = y dx + x dy.

Software Development Cost and Time Forecasting Using a High Performance Artificial Neural Network Model

An Introduction to Neural Networks

ISA HELP BOOKLET AQA SCIENCE NAME: Class:

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Creating, Solving, and Graphing Systems of Linear Equations and Linear Inequalities

EQUATIONS and INEQUALITIES

Towards running complex models on big data

Natural cubic splines

Analecta Vol. 8, No. 2 ISSN

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

1. Classification problems

Summary of important mathematical operations and formulas (from first tutorial):

EdExcel Decision Mathematics 1

Business Intelligence and Decision Support Systems

Florida Math for College Readiness

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

Appendix B Data Quality Dimensions

Mathematical goals. Starting points. Materials required. Time needed

Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER

Find the Square Root

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

1.7. Partial Fractions Rational Functions and Partial Fractions. A rational function is a quotient of two polynomials: R(x) = P (x) Q(x).

Algebra 1 If you are okay with that placement then you have no further action to take Algebra 1 Portion of the Math Placement Test

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

A New Approach For Estimating Software Effort Using RBFN Network

THE PREDICTIVE MODELLING PROCESS

Efficient Curve Fitting Techniques

Gouvernement du Québec Ministère de l Éducation, ISBN

1 Solving LPs: The Simplex Algorithm of George Dantzig

Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network

Introduction to Support Vector Machines. Colin Campbell, Bristol University

is identically equal to x 2 +3x +2

Advanced analytics at your hands

The investigation is an individual project undertaken by you with support from your teacher/lecturer to show that you can:

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

What does the number m in y = mx + b measure? To find out, suppose (x 1, y 1 ) and (x 2, y 2 ) are two points on the graph of y = mx + b.

The Logistic Function

E10: Controlled Experiments

Issues in Information Systems Volume 16, Issue IV, pp , 2015

2.3. Finding polynomial functions. An Introduction:

USB 3.0 Jitter Budgeting White Paper Revision 0.5

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Copyright. Network and Protocol Simulation. What is simulation? What is simulation? What is simulation? What is simulation?

Measurement and Metrics Fundamentals. SE 350 Software Process & Product Quality

Data Flow Organising action on Research Methods and Data Management

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

expression is written horizontally. The Last terms ((2)( 4)) because they are the last terms of the two polynomials. This is called the FOIL method.

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.

Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs

Neural Network Applications in Stock Market Predictions - A Methodology Analysis

The Cobb-Douglas Production Function

x = + x 2 + x

Chapter 6. The stacking ensemble approach

POLYNOMIAL FUNCTIONS

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.

PRACTICAL GUIDE TO DATA SMOOTHING AND FILTERING

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Basics of Polynomial Theory

(Refer Slide Time: 2:03)

SR2000 FREQUENCY MONITOR

Vieta s Formulas and the Identity Theorem

Hedging Illiquid FX Options: An Empirical Analysis of Alternative Hedging Strategies

Application. Outline. 3-1 Polynomial Functions 3-2 Finding Rational Zeros of. Polynomial. 3-3 Approximating Real Zeros of.

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

Data mining and official statistics

Manufacturing View. User View. Product View. User View Models. Product View Models

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Combining GLM and datamining techniques for modelling accident compensation data. Peter Mulquiney

Econometrics Simple Linear Regression

A Multi-level Artificial Neural Network for Residential and Commercial Energy Demand Forecast: Iran Case Study

NEURAL NETWORKS IN DATA MINING

!"#$ Reservoir Fluid Properties. State of the Art and Outlook for Future Development. Dr. Muhammad Al-Marhoun

Review of Fundamental Mathematics

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Bank Customers (Credit) Rating System Based On Expert System and ANN

Statistical Models in Data Mining

On using numerical algebraic geometry to find Lyapunov functions of polynomial dynamical systems

MATH 132: CALCULUS II SYLLABUS

Math 4310 Handout - Quotient Vector Spaces

Summary of feedback on Big data and data protection and ICO response

Tennessee Department of Education

DRAFT. Algebra 1 EOC Item Specifications

Car Insurance. Havránek, Pokorný, Tomášek

Transcription:

Modelling and Big Data Leslie Smith ITNPBD4, October 10 2015. Updated 9 October 2015

Big data and Models: content What is a model in this context (and why the context matters) Explicit models Mathematical models Statistical models Implicit models Neural networks Data Models Models and parameters Constraining models Creating models Directly from the data, or using explicit knowledge? Using Neural Networks ITNPD4: Applications of Big Data 2

Models A word that means many different things in different scientific contexts. And has even more meanings in Computing (never mind elsewhere) In Biology: model organism Also in Biology: a simplified version of a complex system That can be used to make predictions In Physics: a set of equations (etc.) that explains (up to a point) the behaviour of a system Again often for making predictions In data analysis: a set of equations, or a set of computer code, that describes a complex set of data + different meanings in a Computing/data processing context One of the most used words in science with many confusingly different meanings. ITNPD4: Applications of Big Data 3

Different types of model in experimental/empirical science Explicit model A model that can be described precisely For example a set of coupled differential equations describing how different aspects of a dataset interact with each other Implicit model A model that is described in a set of computer code Generally created from a set of data Implicit in the sense that, although an explicit description may be possible, the model is generally used to make predictions directly from a set of data, rather than directly. Note that models may or may not be deterministic. ITNPD4: Applications of Big Data 4

Models in Computing In Computing: a data model, A data model organizes data elements and standardizes how the data elements relate to one another. Since data elements document real life people, places and things and the events between them, the data model represents reality, for example a house has many windows or a cat has two eyes (Wikipedia) (Note: even though this is a Computing Science Department, Computing is generally not an experimental or empirical subject) ITNPD4: Applications of Big Data 5

Data models See Big Databases and NoSQL course, ITNPD3 Data models provide a framework for storing data At one end, one has an SQL database Structured data At the other end one has completely unstructured data (actually, even unstructured data usually has some structure: without structural metadata, data is not usable at all) In fact Data Modelling has many forms Try the Wikipedia page on data models! ITNPD4: Applications of Big Data 6

Data driven business models (DDBM) DDBM is a model of how the business uses data, what the business uses data for Useful for an overview of the whole Big Data system in an organisation ITNPD4: Applications of Big Data 7

Explicit and implicit models We saw that we needed models to allow us to understand causation Without a model we can only have correlations: causation implies mechanism We use models to make sense of data Such models can take many forms Simple linear models With a and b constants: a model connecting y and x. Like most models it has parameters: a and b And we can use existing data to set these. This is clearly an explicit model y = ax + b ITNPD4: Applications of Big Data 8

More explicit mathematical models Or a polynomial model of degree n y = a n x n + a n 1 x n 1 +...+ a 0 which has n+1 parameters. Explicit models are often expressed in differential equation terms: dy dx = 1 y + s(t) ITNPD4: Applications of Big Data 9

Using explicit models We often want to make predictions from models For explicit models this means constraining the parameters of the model: giving them values The quality of the prediction depends on The appropriateness of the model The accuracy of the parameters One can argue that the model selection is itself a parameter selection problem Which functions to use, how many to use, etc. In general, one uses a mixture of the actual data available, and knowledge about the system to choose the model, The parameters are then set using the data. Sometimes initialised to ballpark correct values first using domain knowledge. ITNPD4: Applications of Big Data 10

Simple linear interpolation ITNPD4: Applications of Big Data 11

Implicit models Implicit models (generally) learn from the data Idea is that the model learns directly And is unbiased by the designer of the model Neural networks are the best known type of implicit model. These generally need to be used in conjunction with some kind of possibly informal model of the system Idea: use existing data to train the network Then use the trained network to make predictions ITNPD4: Applications of Big Data 12

Neural network Input layer Hidden layer Output layer Input #1 Input #2 Input #3 Output Input #4 ITNPD4: Applications of Big Data 13

Training a neural network 1. Initialise network architecture 2. Initialise weights 3. For each training input:output pair, adjust the weights 4. If the overall error exceeds some delta Go to step 3 5. Test on validation set If result is not good enough, go to step 1 6. Finished (i.e. use trained neural network) ITNPD4: Applications of Big Data 14

Neural networks for prediction What are the dangers here? However, there are specific aspects of appraisal work which pose specific problems for the utilization of MRA in these types of contexts. In this regard, small sample size as well as the difficulty in obtaining sales information due to Texas being a non-disclosure state where tax payers are not required by law to reveal what they paid for their property are major obstacles to the typical larger samples needed for MRA. I have heard that ANN (artificial neural networks) are not encumbered by these factors. Quote is from an email I received asking for my advice. ITNPD4: Applications of Big Data 15

Prediction and NNs Neural networks will always make a prediction And the prediction may look quite sensible But: Is it the right answer? Has the NN been appropriately trained? Is it the right NN? Is it the right type of NN? Generally, one breaks up the training data into three disjoint sets A training set A cross-validation set A test set One trains up the system repetitively, and checks each network with the cross-validation set Then one tests all the networks with the test set ITNPD4: Applications of Big Data 16

Big Data and Models Data sets are used to constrain models For explicit mathematics models, this means adjusting parameters so that the data conforms to the models This will never be exact So some form of approximation or error minimisation Is required. E.g. minimising the sum of the squares of the error For other types of model, there may be specific techniques Error correction in neural networks is a good example of this ITNPD4: Applications of Big Data 17

In Conclusion Be careful when using the word model Because it has many meanings Data models describe the structure of data in general And in Big Data applications this can be quite complex Implicit and explicit models describe systems And can be constrained (adapted, trained) by data Getting the model right (or at least not too wrong) can make a big difference to predictions from data ITNPD4: Applications of Big Data 18