International Journal of Computer Engineering and Technology (IJCET) Volume 6, Issue 7, July 2015, pp. 19-26, Article ID: 50120150607003 Available online at http://www.iaeme.com/currentissue.asp?jtype=ijcet&vtype=6&itype=7 ISSN Print: 0976 6367 and ISSN Online: 0976 6375 IAEME Publication APPLICATIONS OF DATA MINING TO PREDICT MESOSCALE WEATHER EVENTS (TORNADOES AND CLOUDBURSTS) Miss Gurbrinder Kaur Assistant Professor, M.C.A Department, BCIIT, Delhi ABSTRACT Over the last decade or so, predicting the weather and climate has emerged as one of the most important areas of scientific Research. This is partly because the increase in skill of current weather forecasts has made society more and more dependent on them day to day for a whole range of decision making. And it is partly because climate change is now widely accepted and the realization is growing rapidly that it will affect every person in the world either directly or indirectly. Keywords: False Alarm Ratio (FAR), Mesocyclone Detection Algorithm (MDA), Numerical Weather Prediction (NWP), Receiver Operating Characteristic (ROC), Probability of Detection (POD). Cite This Article: Miss Gurbrinder Kaur, Applications of Data Mining To Predict Mesoscale Weather Events (Tornadoes and Cloudbursts). International Journal of Computer Engineering and Technology, 6(7), 2015, pp. 19-26. http://www.iaeme.com/currentissue.asp?jtype=ijcet&vtype=6&itype=7 1. INTRODUCTION Although considerable progress has been made in the observation, modeling and understanding of tornadoes, warning and forecasting before ahead remains a considerable challenge for the forecasters. The statistics have clearly shown warning probability of detection (POD) and lead time have remained at the same level in recent years with false alarm ratio (FAR) remaining relatively constant. This is principally because the existing radars and weather detection methodologies suffer from limitations that allow meteorological quantities and associated features to go undetected. There is a need of new advances in this area if substantial improvements in warning and forecasting accuracy are to take place. The improvements over the past decades are evident from the Figure.1 with POD and Lead Time but FAR relatively remain constant over the past 20 years with approximate value of 75%.Because FAR never shown improvement over the past 20 http://www.iaeme.com/ijcet.asp 19 editor@iaeme.com
Miss Gurbrinder Kaur years thus is not expected to improve in near future until new advancement in the technology is not developed. Figure.1 Nationwide tornado warning verification statistics from 1986-2007 as well as NWS goals for new storm-based beginning in 2008: Probability of Detection (black line with circles), false alarm ration (red line with squares) and lead time (blue line) with future goals (same with dotted lines). [Data courtesy of B. MacAloney II, National Weather Service Performance Branch, 2008] 2. RELATED WORK A. Predicting Tornadoes by Applying Data Mining Techniques In [1] the goal of much of Amy McGovern s research as an associate professor in the School of Computer Science at the University of Oklahoma has been to revolutionize tornado prediction and other forms of severe weather. The author has done these using artificial intelligent techniques, data mining, machine learning, and storm simulations. The research proves that Radars provide an incomplete picture of the atmosphere. Although they can sense the intensity of the precipitation and a single dimension of the wind vector, there are many other important variables such as the full threedimensional wind field, pressure, temperature, etc. that are important to prediction [2].The author has developed a unique set of simulations of supercell thunderstorms which are most severe type of thunderstorms and cause most destructive tornadoes. McGovern s models provide the ability to identify spatiotemporal relationships between these regions that can be used to predict the severe weather events. Novel data mining models has been developed that make use of the spatiotemporal nature of the data because neither space nor time can be ignored for weather prediction. Weather is three-dimensional and the models can identify arbitrary shapes and relationships between the shapes. In [3] McGovern et al. developed spatiotemporal models and applied these models to severe weather data. These models addressed both the spatial and spatiotemporal changes in data using a relational approach. In their work they have also developed a set of high resolution simulations capable of resolving tornadoes. In [4] V Lakshmanan, Gregory J. Stumpf, Arthur Witt developed A Mesocyclone Detection Algorithm (MDA) and a near-storm environment (NSE) algorithm at the National Severe Storms Laboratory. The MDA algorithm identified those storm-scale circulations which are precursors to tornadoes. Marzban and Stumpf in [5] and [6] http://www.iaeme.com/ijcet.asp 20 editor@iaeme.com
Applications of Data Mining To Predict Mesoscale Weather Events (Tornadoes and Cloudbursts) developed a neural network based on the MDA parameters to identify which of the circulations would be tornadic using a small set of data cases [5]. That work was extended to cover 43 storm days in [7] using a more robust methodology. The neural networks developed in this paper (both for MDA and MDA+NSE inputs) achieve similiar Heidke skill scores on the training, validation and independent data sets. The low variability of the Receiver Operating Characteristic (ROC) plots in this paper also suggest that the neural networks developed in this paper are robust and not overtrained. In [8] Indra Adrianto, Theodore B. Trafalis, And Valliappa lakshmanan make use of Support Vector Machines for predicting the location and time of tornadoes. They extended the work of Lakshmanan et al [7] to use a set of 33 storm days and introduced some variations to the above results. The objective of the research was to estimate the probability of a tornado event at a particular location within a given time window. They presented least-squares methodology to estimate shear, quality control of radar reflectivity, morphological image processing to estimate gradients, fuzzy logic to generate compact measures of tornado possibility and support vector machine classification to generate the final spatiotemporal probability field. The results of the research proved that it might increase the lead time of tornado warning since the estimated probability that there would be a tornado at a particular spatial location in the next 30 minutes, while the average lead time of a tornado being predicted by the National Weather Service currently is 18 minutes. Thus the results were promising. Thus more spatial inputs can be considered and other classification methods such as Bayesian SVMs and Bayesian neural networks may improve the results. B. Application of Data Mining In Predicting Cloudburst Formation There is no satisfactory technique for anticipating the occurrence of cloud bursts because of their small scale. A very fine net work of radars is required to be able to detect the likelihood of a cloud burst and this would be prohibitively expensive. Only the areas likely to receive heavy rainfall can be identified on a short range scale. A real life case of cloudburst has been discussed using DM k-means clustering technique by Kavita in [9]. It is observed that this very large region of relative humidity is an early signal of formation of cloudburst. In the research, the derivation of sub-grid scale weather systems from NWP model output products is demonstrated. Such signals are not possible through normal MOS technique. The study has demonstrated that intelligent systems can be a good alternative for unstable MOS. Data mining, specially clustering when applied on divergence and relative humidity can provide an early indication of formation of cloudburst. This study is an effort towards providing timely and actionable information of these events using data mining techniques in supplement with NWP models that can be a great benefit to society. 3. PRINCIPAL AND METHODOLOGY OF WEATHER FORECASTING A. Ensemble Forecasting A forecast is an estimate of the future state of the atmosphere. It is created by estimating the current state of the atmosphere using observations, and then calculating how this state will evolve in time using a numerical weather prediction computer model. As the atmosphere is a chaotic system, very small errors in its initial state can http://www.iaeme.com/ijcet.asp 21 editor@iaeme.com
Miss Gurbrinder Kaur lead to large errors in the forecast. This means that we can never create a perfect forecast system because we can never observe each detail of the atmosphere's initial state. Tiny errors in the initial state will be amplified, so there is always a limit to how far ahead we can predict any detail. To test how these small differences in the initial conditions may affect the outcome of the forecast, an ensemble system can be used to produce many forecasts. Instead of running just a single forecast, the computer model is run a number of times from slightly different starting conditions. The complete set of forecasts is referred to as the ensemble, and individual forecasts within it as ensemble members. Instead of running just a single forecast, the computer model is run a number of times from slightly different starting conditions. The complete set of forecasts is referred to as the ensemble, and individual forecasts within it as ensemble members. Figure. 2. Schematic of how the ensemble samples the uncertainty in the forecast. The notion of ensemble forecasting was first introduced in the studies of Lorenz [10], where he examined the initial state uncertainties and well known butterfly effect. The study of Lorenz showed that no matter how good the observations are, or how good the forecasting techniques, there is almost certainly an insurmountable limit as to how far into the future one can forecast. In ensemble forecasting the major issue relates to the removal of the collective errors of multimodels. The major drawback of straight average approach of assigning an equal weight of 1.0 to each model is that it may include several poor models. The average of these poor models degrades the overall results. To address this problem if ensemble forecasting, in [11] and [12] Krishnamurti introduced a multimodel super ensemble technique that shows a major improvement in the prediction skill. B. Observation and Assimilation of Observational Data Observations are important to the process of creating forecasts. Around huge number of observations is received recording the atmospheric conditions around the world every day. Current main sources of observations are: Surface and marine data, satellites, weather balloons and aircraft. To use these observations in an operational weather forecasting system, observations have to monitor their availability; quality controls them, and processes them into a form that can be used by the computer models and forecasters. Current main sources of observations are surface and marine data, satellites, radiosondes and aircrafts. Even with the many observations received we do not have enough information to tell us what the atmosphere is doing at all http://www.iaeme.com/ijcet.asp 22 editor@iaeme.com
Applications of Data Mining To Predict Mesoscale Weather Events (Tornadoes and Cloudbursts) points on and above the Earth's surface. There are large areas of ocean, inaccessible regions on land and remote levels in the atmosphere where we have very few, or no, observations. To fill in the 'gaps' we can combine what observations we do have with forecasts of what we expect the conditions in the atmosphere to be. This is a process called data assimilation and gives us our best estimate of the current state of the atmosphere - the first step in producing a weather forecast. Without data assimilation, any attempt to produce reliable forecasts is almost certain to end in failure. Data assimilation research is focused on making the best use of observations using advanced variational and ensemble data assimilation techniques. C. Numerical Weather Prediction Model The numerical weather prediction (NWP) process involves assimilation of observations to provide the starting conditions for a numerical weather forecast model. The model is essentially a computer simulation of the processes in the Earth's atmosphere, land surface and oceans which affect the weather. Once current weather conditions are known, the changes in the weather are predicted by the model. Even tiny changes in the atmospheric conditions can lead to drastically different weather patterns after only a short time, so it is vital that the current state of the atmosphere is represented as accurately as possible. This process is highly mathematical and takes the supercomputer longer to accurately estimate the current atmospheric state than it does to actually make the forecast. Weather Forecasting entails predicting how the present state of the atmosphere will change. Present weather conditions are obtained by ground observations, observation from satellites, ships, aircraft, buoys, balloons and weather stations covering the entire planet. This includes information from over the oceans, from the surface (ships and buoys), from high in the atmosphere (satellites) and below the oceans (a network of special floats called Argo).Creating forecasts is a complex process which is constantly being updated. Weather forecasts made for 12 and 24 hours are typically quite accurate. Forecasts made for two and three days are usually good. But beyond about five days, forecast accuracy falls off rapidly. The rate of data generation and storage far exceeds the rate of data analyses. This represents lost opportunities in terms of scientific insights not gained and impacts or adaptation strategies not adequately informed. D. The Synoptic and Mesoscale Weather Phenomenon The synoptic scale in meteorology is the term used to describe the scale of large-scale weather systems of the scale of the order of 1000 kilometres or more. The extratropical weather. This corresponds to weather events to occur at low pressure areas e.g extropical cyclones. The term mesoscale is believed to have been introduced by Ligda in [13] reviewing the use of weather radar, in order to describe phenomena smaller than the synoptic scale but larger than the microscale, a term that was widely used at the time (and still is) in reference to phenomena having a scale of a few kilometers or less. Several weather events associated with small-scale disturbances, regarded as noise in daily weather analyses, became the focal point of storm researchers a micro study by Fujita [14].Meanwhile U.S weather Bureau defined the mesoscale to be centered between 10 and 100 mi, leading to the publication of mesometeorological (mesometeorological study of squall lines by Fujita[15].Further Fujita in [16] found that diameter of tornadoes rarely exceeds 1000m or the mesoscale. http://www.iaeme.com/ijcet.asp 23 editor@iaeme.com
Miss Gurbrinder Kaur Figure. 3 Typical Time and Space Scale of atmospheric motion (Source: DTU university of Denmark) Figure.4. From large scale to small scale forecast (source: Mesoscale meteorological modeling, university of Denmark) 4. CONCLUSION While forecasters can identify conditions favorable for major tornado outbreaks several days in advance, short-term forecasting of individual storms, providing additional advanced notice, and predicting probable tornado paths remain a challenge. Because of these limitations the weather forecasters strongly need to corporate additional information to develop the better understanding of the formation of tornadoes. 5. ACKNOWLEDGEMENT The author would like to express deepest sense of gratitude to Guide Dr. Rattan K. Datta, Former Advisor, Department of Science & Technology, Government of India http://www.iaeme.com/ijcet.asp 24 editor@iaeme.com
Applications of Data Mining To Predict Mesoscale Weather Events (Tornadoes and Cloudbursts) and currently Director, Mohyal Educational Research Institute of Technology, for his encouragement, guidance and mentoring. Without his support, it would not have been possible to take up research in this challenging field. REFERENCES [1] McGovern, Amy and Barto, Andrew G. (2002) Autonomous Discovery of Temporal Abstractions from Interaction with an Environment.Poster presentation at the Symposium on Abstraction, Refomulation, and Approximation (SARA 2002), Volume 2371/2002, pages 338-339. [2] McGovern, Amy and Hiers, Nathan and Collier, Matthew and Gagne II, David J. and Brown, Rodger A. (2008). Spatiotemporal Relational Probability Trees. Proceedings of the 2008 IEEE International Conference on Data Mining, Pages 935-940. Pisa, Italy. 15-19 December 2008. [3] McGovern, Amy and Gagne II, David John and Troutman, Nathaniel and Brown, Rodger A. and Basara, Jeffrey and Williams, John. (2011) Using Spatiotemporal Relational Random Forests to Improve our Understanding of Severe Weather Processes. Statistical Analysis and Data Mining, special issue on the best of the 2010 NASA Conference on Intelligent Data Understanding. Vol 4, Issue 4, pages 407-429 [4] Lakshmanan, V., Rabin, R. and DeBrunner, V. (2003a) Multiscale storm identification and forecast, Atmospheric Research, 67-68, 367 380. [5] Lakshmanan, V., Hondl, K., Stumpf, G., and Smith, T. (2003b) Quality control of weather radar data using texture features and a neural network, in 5th International Conferece on Advances in Pattern Recognition (Kolkota, India), IEEE. [6] Lakshmanan, V., Adrianto, I., Smith, T., and Stumpf, G. (2005a) A spatiotemporal approach to tornado prediction, in Proceedings of 2005 IEEE International Joint Conference on Neural Networks (Montreal, Canada), 3, 1642 1647. [7] Lakshmanan, V., Stumpf, G., and Witt, A. (2005b) A neural network for detecting and diagnosing tornadic circulations using the mesocyclone detection and near storm 21 environment algorithms, in 21st International Conference on Information Processing Systems (San Diego, CA), American Meteorological Society, CD ROM, J5.2. [8] Adrianto, I., Trafalis, T. B., & Lakshmanan, V., Support vector machines for spatiotemporal tornado prediction, International Journal of General Systems, Volume 38, Issue 7, Pages 759 776, 2009. [9] (Kavita Pabreja; Rattan K. Datta) A data warehousing and data mining approach for analysis and forecast of cloudburst events using OLAP-based data hypercube Int. J. of Data Analysis Techniques and Strategies, 2012 Vol.4, No.1, pp.57 82 [10] Lorenz E.N 1963 Deterministic non-periodic flow. J. Atmos. Sci. 42, 433 471. [11] Krishnamurti, T. N., C. M. Kishtawal, T. LaRow, D. Bachiochi, Z. Zhang, C. E. Williford, S. Gadgil, and S. Surendran (1999), improved weather and seasonal climate forecasts from multimodel superensemble, Science, 285, 1548 1550, doi:10.1126/science.285.5433.1548. [12] Krishnamurti, T. N., C. M. Kishtawal, Z. Zhang, T. LaRow, D. Bachiochi, C. E. Williford, S. Gadgil, and S. Surendran (2000), Multimodel ensemble forecasts for weather and seasonal climate, J. Clim., 13, 4196 4216, doi:10.1175/1520-0442(2000)0132.0.co. http://www.iaeme.com/ijcet.asp 25 editor@iaeme.com
Miss Gurbrinder Kaur [13] Ligda, M. G. H., 1951: Radar storm observation. Compendium of Meteorology, T. F. Malone, Ed., Amer. Meteor. Soc., 1265 1282 [14] Fujita, T.T., 1973: Proposed mechanism of tornado formation from rotating thunderstorms. [15] Climatological Data, National Summary, 4, 6, 1953, p. 181. FUJITA, T., 1950: Microanalytical study of thundernose, Geoph. Mag. ojjapan, 22, 2, pp. 71-88. [16] Fujita, T. T., 1963: Analytical mesometeorology: A review, Meteor. Monogr., 5, No. 27, Amer. Meteor. Soc., 77-125 http://www.iaeme.com/ijcet.asp 26 editor@iaeme.com