Sequence to Sequence Weather Forecasting with Long Short-Term Memory Recurrent Neural Networks



Similar documents
Learning to Process Natural Language in Big Data Environment

Power Prediction Analysis using Artificial Neural Network in MS Excel

Neural Network Design in Cloud Computing

Recurrent Neural Networks

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

PhD in Computer Science and Engineering Bologna, April Machine Learning. Marco Lippi. Marco Lippi Machine Learning 1 / 80

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Prediction Model for Crude Oil Price Using Artificial Neural Networks

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network

Analecta Vol. 8, No. 2 ISSN

Use of Artificial Neural Network in Data Mining For Weather Forecasting

How To Use Neural Networks In Data Mining

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

Implementation of Neural Networks with Theano.

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.

Lecture 6. Artificial Neural Networks

An Early Attempt at Applying Deep Reinforcement Learning to the Game 2048

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

AN APPLICATION OF TIME SERIES ANALYSIS FOR WEATHER FORECASTING

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

Neural Networks in Data Mining

A Time Series ANN Approach for Weather Forecasting

Follow links Class Use and other Permissions. For more information, send to:

Neural Networks and Support Vector Machines

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

Neural network software tool development: exploring programming language options

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *

Efficient online learning of a non-negative sparse autoencoder

APPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED VARIABLES TO PREDICT STOCK TREND DIRECTION

Long Short Term Memory Networks for Anomaly Detection in Time Series

Chapter 4: Artificial Neural Networks

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Introduction to Machine Learning CMU-10701

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Simplified Machine Learning for CUDA. Umar

Padma Charan Das Dept. of E.T.C. Berhampur, Odisha, India

Stock Prediction using Artificial Neural Networks

Solar Irradiance Forecasting Using Multi-layer Cloud Tracking and Numerical Weather Prediction

Supply Chain Forecasting Model Using Computational Intelligence Techniques

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Price Prediction of Share Market using Artificial Neural Network (ANN)

Stabilization by Conceptual Duplication in Adaptive Resonance Theory

Cash Forecasting: An Application of Artificial Neural Networks in Finance

Introduction to Machine Learning Using Python. Vikram Kamath

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks

Lecture 8 February 4

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Machine learning algorithms for predicting roadside fine particulate matter concentration level in Hong Kong Central

Self Organizing Maps: Fundamentals

Performance Evaluation of Artificial Neural. Networks for Spatial Data Analysis

Pedestrian Detection with RCNN

NEURAL NETWORKS IN DATA MINING

A Novel Method for Predicting the Power Output of Distributed Renewable Energy Resources

COMBINED NEURAL NETWORKS FOR TIME SERIES ANALYSIS

Automated Machine Learning For Autonomic Computing

A Look Into the World of Reddit with Neural Networks

Artificial Neural Network for Speech Recognition

Optimizing content delivery through machine learning. James Schneider Anton DeFrancesco

KL-DIVERGENCE REGULARIZED DEEP NEURAL NETWORK ADAPTATION FOR IMPROVED LARGE VOCABULARY SPEECH RECOGNITION

ELLIOTT WAVES RECOGNITION VIA NEURAL NETWORKS

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Weather forecast prediction: a Data Mining application

Computational Intelligence Introduction

TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC

Field Data Recovery in Tidal System Using Artificial Neural Networks (ANNs)

Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network

Predict Influencers in the Social Network

Advanced analytics at your hands

CIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies

Software Development Cost and Time Forecasting Using a High Performance Artificial Neural Network Model

Role of Neural network in data mining

Bayesian probability theory

LARGE-SCALE MALWARE CLASSIFICATION USING RANDOM PROJECTIONS AND NEURAL NETWORKS

Accurate and robust image superresolution by neural processing of local image representations

Bank Customers (Credit) Rating System Based On Expert System and ANN

Spark: Cluster Computing with Working Sets

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Denoising Convolutional Autoencoders for Noisy Speech Recognition

Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep

Application of Neural Network in User Authentication for Smart Home System

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

DATA MINING TECHNIQUES AND APPLICATIONS

Evaluation of Machine Learning Techniques for Green Energy Prediction

Online Farsi Handwritten Character Recognition Using Hidden Markov Model

Comparison of K-means and Backpropagation Data Mining Algorithms

An Introduction to Deep Learning

Iranian J Env Health Sci Eng, 2004, Vol.1, No.2, pp Application of Intelligent System for Water Treatment Plant Operation.

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

Applications of improved grey prediction model for power demand forecasting

Steven C.H. Hoi School of Information Systems Singapore Management University

ARTIFICIAL NEURAL NETWORKS IN THE SCOPE OF OPTICAL PERFORMANCE MONITORING

Image Classification for Dogs and Cats

Neural Computation - Assignment

Transcription:

Volume 143 - No.11, June 16 Sequence to Sequence Weather Forecasting with Long Short-Term Memory Recurrent Neural Networks Mohamed Akram Zaytar Research Student Department of Computer Engineering Faculty of Science and Technology, Tangier Route Ziaten Tangier, 90000, Morocco Chaker El Amrani Associate Professor Department of Computer Engineering Faculty of Science and Technology, Tangier Route Ziaten PO. Box 416, Tangier, 90000, Morocco ABSTRACT The aim of this paper is to present a deep neural network architecture and use it in time series weather prediction. It uses multi stacked LSTMs to map sequences of weather values of the same length. The final goal is to produce two types of models per city (for 9 cities in Morocco) to forecast 24 and 72 hours worth of weather data (for Temperature, Humidity and Wind Speed). Approximately 15 years (00-15) of hourly meteorological data was used to train the model. The results show that LSTM based neural networks are competitive with the traditional methods and can be considered a better alternative to forecast general weather conditions. General Terms Machine Learning, Weather Forecasting, Pattern Recognition, Times Series Keywords Deep Learning, Sequence to Sequence Learning, Artificial Neural Networks, Recurrent Neural Networks, Long-Short Term Memory, Forecasting, Weather 1. INTRODUCTION Weather Forecasting began with early civilizations and was based on observing recurring astronomical and meteorological events. Nowadays, weather forecasts are made by collecting data about the current state of the atmosphere and using scientific systems to predict how the atmosphere will evolve. The chaotic nature of the atmosphere, the massive computational power required to solve all of the equations that describe the atmosphere mean that forecasts become less accurate and more expensive as the range of the forecasts increase, this puts us in a position to think of new ways to forecast weather that can be more efficient and/or less expensive. In Machine Learning, Artificial Neural Networks (ANNs) are classes of models inspired by biological neural networks, which comprise interconnected adaptive processing nodes or units. What makes ANNs important is their adaptive nature, this feature makes ANNs a well suited tool to approximate highly nonlinear and multivariate functions. Because the final Neural Network model predicts time series values, it uses LSTM layers in its architecture to counter time related problems like the Vanishing Gradient Problem. the difference between LSTMs and other traditional Recurrent Neural Networks (RNNs) is its ability to process and predict time series sequences without forgetting unimportant information, LSTMs achieve state of the art results in sequence related problems like handwriting recognition [4, 3], speech recognition [6, 1], music composition [2] and grammar learning [8] (In natural language processing). 2. RECURRENT NEURAL NETWORKS AND LSTMS The Recurrent Neural Network Architecture is a natural generalization of feedforward neural networks to sequences, RNNs are networks with loops in them, which results in information persistence. Fig. 1. The relation between simple RNNs and Feed-forward ANNs Given a sequence of inputs (x 1, x 2,..., x N ), a standard RNN computes a sequence of outputs (y 1, y 2,..., y 3 ) by iterating over the following equation : h t = sigm(w hx x t + W hh h t 1 ) y t = W yh h t The RNN can map sequences to sequences whenever the alignment between the inputs and the outputs is known ahead of time. In the 7

Volume 143 - No.11, June 16 context of predicting future weather values sequences of lengths 24 and 72 values were set. 2.1 The problem of Long-Term Dependencies When thinking about weather data, One gets the impression that recent weather data is more important for forecasting the next 24 hours than data from the previous month. In this context, one might only need to look at recent information to perform the forecasting. However, there might be cases when older data can help the model recognize general trends and movements that recent data fail to show. Unfortunately, as the gap grows between the present and the past data, general RNNs fail to learn to connect the inputs, and this is called the problem of Long-Term Dependencies. p(y 1,..., y N x 1,..., x N ) = 3. THE MODEL N p(y t v, y 1,.., y t 1 ) A multi layer model consisting of two LSTM layers and a fully connected hidden layer with a 0 neuron was implemented to form the base architecture for the model, The general layers and their I/O shapes are in the following figure : t=1 2.2 Long-Short Term Memory Neural Networks LSTMs are a type of Recurrent Neural Networks capable of learning long-term dependencies. They were introduced by Hochreiter and Schmidhuber [7]. LSTMs remember information for long periods of time thanks to their inner cells which can carry information unchanged at will. The network have the complete control over the cell state, it can add,edit or remove information in the cell using special structures called gates. Fig. 2. A simple LSTM gate with only input, output, and forget gates. To put it in simple terms, What makes the network have control over flowing information is largely due to its sigmoid layer which 1 outputs numbers between zero and one (S(t) = ), this 1+e t plays a role similar to a Faucet in controlling how much of each component should be let through. A value of zero means let nothing through, while a value of one means let everything through and with this type of system a model can make a far distant value in the past significant in its prediction for the next hour as an example. And this is what makes LSTMs useful for the model in use. Mathematically speaking, The goal of the LSTM is to estimate the conditional probability p(y 1,..., y N x 1,..., x N ) where (x 1,..., x N ) is an input sequence and (y 1,..., y N ) is its corresponding output sequence with the same length. The LSTM computes this conditional probability by first obtaining the fixed-dimensional representation v of the input sequence (x 1,..., x N ) given by the last hidden state of the LSTM, and then computing the probability of (y 1,..., y N ) with a standard LSTM-LM formulation whose initial hidden state is set to the representation v of (x 1,..., x N ): Fig. 3. The Model s Layers with I/O shapes. Dense : A fully connected layer of neurons, a value of 0 hidden neurons have been chosen to train based on multiple experiments. Activation : As an Activation function the Rectifier function have been chosen. Repeat Vector : this layer repeat the final output vector from the encoding layer as a constant input to each timestep of the decoder. Time Distributed Dense : Applies a same Dense (fully-connected) operation to every timestep of a 3D tensor. As an optimizer for the network, it used RMSprop, which is an unpublished, adaptive learning rate method proposed by Geoff Hinton in Lecture 6e of his Coursera Class [5]. It keeps a moving average of the squared gradient for each weight : E[g 2 ] t = 0.9E[g 2 ] t 1 + 0.1g 2 t 8

Volume 143 - No.11, June 16 θ t+1 = θ t η E[g2 ] t + ɛ g t RMSprop as well divides the learning rate by an exponentially decaying average of squared gradients. Hinton suggests γ to be set to 0.9, while a recommended value for the learning rate η is 0.001. And as a loss estimator, the network used the mean squared error estimator which is one of the most widely adapted ones : MSE = 1 n (Y p i Yi r ) 2 n i=1 Looking from the outside, the model s architecture is simple. it gets as input the previous 24 values and it gives us back the next 24 hours of predictions, the same model was constructed for 72 hours (3 days), the figure below is a graphical explanation (24 hours) : 50 40 30 0 01 Fig. 5. 03 05 07 09 11 13 15 DateUTC TemperatureC 15 Years of Temperature values in Morocco, Tangier. Fig. 4. Input/Output Data Flow. ti, hi, wi signifies Temperature, Humidity and wind speed in time i, the result is a deep neural network based on LSTMs that can predict a fixed length sequence from the same length previous sequence. 4.2 Data Preprocessing Data was often not consistent, missing values or values out of range was common, however, there was no long continuous noise segments (for days as an example). The methods used for cleaning is to replace the missing or noisy values by forward filling them using previous points in time (ex. filling the missing temperature value with the last recorded temperature value). The following figure demonstrates the procedure used for data preprocessing : 4. EXPERIMENTS Fifteen years worth of hourly data was trained, from nine cities in Morocco. With the following variables : Table 1. METEOROLOGICAL VARIABLES No. Meteorological Variables every hour time frame Unit 1 Temperature Deg. C 2 Relative Humidity %Rh 3 Wind Speed km/h 4.1 Dataset Details Weather data from nine cities were collected using Wunderground s API (Wunderground collect most of its data in morocco from weather stations in airports). The final data set consists of Temperature, Humidity and wind speed values for every hour, Here is a figure of Temperature values for fifteen years in the city of tangier : Fig. 6. The Data preprocessing graphical Model. After cleaning the data, all of the values were normalized to points in [ 1, 1] to avoid training problems (example: local optima problem) and also weight decay and Bayesian estimation can be done more conveniently with standardized inputs [9]. this assertion was used : 4.3 Validation X i X i Mean(X) MAX(X) MIN(X) At first, all of the values were reshaped into a serie of sequences to feed into the neural network, each input consists of 24 (or 72) triples [T, H, W S] of values. 9

Volume 143 - No.11, June 16 The chosen weather data was divided into three selected groups of sequences, the training group, corresponding to 80% of the sequences, the test group (untrained data) used to evaluate the models after the training phase (corresponding to % of sequences), and finally the local validation data for the local mini-batch training. The Mean Squared Error (MSE) was used as a measure of error made by the neural network model. 4.4 Training Details It s known that LSTM models are fairly easy to train after preparing the data. A model consisting of two LSTM layers and a fully connected hidden layer in between with a 0 neuron was used. The resulting neural network had 132,403 parameters of which 1,000 are pure recurrent connections (41600 for the encoder LSTM and 80400 for the decoder LSTM). The training details are given below : All of the LSTM s parameters were initialized with the uniform distribution between -0.05 and 0.05. The Mini-batch gradient descent was used with a fixed learning rate of 0.001. For the gradient method, Batches of 512 sequences and 0 epochs were trained. 4.5 Parallelization A Python implementation of deep neural networks with the configuration from the previous section was used to train the model. The Theano library was used to build and compile the model on a NVIDIA GRID K5 single GPU device, with 1536 CUDA cores, 800MHz Core Clocks and 4GB of memory size. Training took about hours per model : 4.6 Experimental Results Table 2. AVERAGE TRAINING TIME PER MODEL Model Training Time LSTM24 3.2h LSTM72 6.7h The Mean Squared Error was used as a score function to evaluate the quality of the predictions. The scores were calculated using the untrained test data with the following results as an example : Table 3. MEAN SQUARED ERROR ON TEST DATA FROM NINE CITIES IN MOROCCO City 24 hours MSE 72 hours MSE Tangier 0.00636 0.00825 Agadir 0.00775 0.046 Al Hoceima 0.00827 0.00985 Fes-Sais 0.00744 0.00997 Marrakech 0.00839 0.053 Nouasseur 0.00516 0.00698 Oujda 0.00536 0.00987 Rabat-Sale 0.00550 0.00675 Tetouan 0.00695 0.00863 Temprature Humidity Wind Speed 28 26 24 vs Temprature values 0 5 15 Fig. 7. 95 90 85 80 75 70 65 60 55 16 14 12 8 6 4 Comparison of Temperature values for 24 hours. vs Humidity values 0 5 15 Fig. 8. Comparison of Humidity values for 24 hours. 0 5 15 Fig. 9. vs Wind Speed values Comparison of Wind Speed values for 24 hours. Examples of graphical predictions on test data next.

Volume 143 - No.11, June 16 Temperature Humidity Wind Speed 28 26 24 16 14 vs Temperature values 0 30 40 50 60 70 Fig.. 90 85 80 75 70 65 60 55 50 45 25 15 5 Comparison of Temperature values for 72 hours. vs Humidity values 0 30 40 50 60 70 Fig. 11. Comparison of Humidity values for 72 hours. vs Wind Speed values 0 0 30 40 50 60 70 Fig. 12. Comparison of Wind Speed values for 72 hours. 5. CONCLUSION The results of this paper shows that a deep LSTM network can forecast general weather variables with a good accuracy. The success of the model suggests that it could be used on other weather related problems, and while Theano provides excellent environment to compile and train models, it also gives the ability to carry any model into a production server and integrate them in pre-existing applications (as an example, one could perform real time predictions on top of an existing web application). Our vision is for this model to represent the cornerstone of an Artificial intelligence based system that can replace humans and traditional methods in weather forecasting in the future. combining numerical models and image recognition ones (in satellite images for example) might form the basis of a new weather forecasting system that can outperform and overcome the traditional expensive ones and become the new standard in weather prediction. 6. ACKNOWLEDGMENTS We would like to express our gratitude to Weather Underground for providing us the required data to do this research, and also we want to thank the teams behind the python libraries, Theano and Keras for making it easy for us to implement our model and run it. 7. REFERENCES [1] George E Dahl, Dong Yu, Li Deng, and Alex Acero. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on, (1):30 42, 12. [2] Douglas Eck and Jürgen Schmidhuber. Learning the long-term structure of the blues. In Artificial Neural NetworksICANN 02, pages 284 289. Springer, 02. [3] Alex Graves, Marcus Liwicki, Horst Bunke, Jürgen Schmidhuber, and Santiago Fernández. Unconstrained on-line handwriting recognition with recurrent neural networks. In Advances in Neural Information Processing Systems, pages 577 584, 08. [4] Alex Graves and Jürgen Schmidhuber. Offline handwriting recognition with multidimensional recurrent neural networks. In Advances in neural information processing systems, pages 545 552, 09. [5] Geoffrey Hinton. Overview of mini-batch gradient descent. http://goo.gl/a9lkpi, 14. [Online; accessed 09-June-16]. [6] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6):82 97, 12. [7] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735 1780, 1997. [8] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 34 3112, 14. [9] NC Warren S. Sarle, Cary et al. FAQ, Part 2 of 7: Learning. http://goo.gl/gzjgse, 02. [Online; accessed 09-June-16]. 11