Identification of radon anomalies in soil gas using decision trees and neural networks



Similar documents
Weather forecast prediction: a Data Mining application

Artificial Neural Network and Non-Linear Regression: A Comparative Study

DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA

Data quality in Accounting Information Systems

Use of Artificial Neural Network in Data Mining For Weather Forecasting

Italy - Porto Tolle: storage in offshore saline aquifer

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Field Data Recovery in Tidal System Using Artificial Neural Networks (ANNs)

PROHITECH WP3 (Leader A. IBEN BRAHIM) A short Note on the Seismic Hazard in Israel

Regents Questions: Plate Tectonics

Prediction of Stock Performance Using Analytical Techniques

On Es-spread effects in the ionosphere connected to earthquakes

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

The CO2 Geological Storage: how it works Salvatore Lombardi

APPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED VARIABLES TO PREDICT STOCK TREND DIRECTION

The CO2 Geological Storage: how it works

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

PRELIMINARY REPORT ON THE NORTHSTAR #1 CLASS II INJECTION WELL AND THE SEISMIC EVENTS IN THE YOUNGSTOWN, OHIO AREA

Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network

7.2.4 Seismic velocity, attenuation and rock properties

The successful integration of 3D seismic into the mining process: Practical examples from Bowen Basin underground coal mines

Time Series Data Mining in Rainfall Forecasting Using Artificial Neural Network

Power Prediction Analysis using Artificial Neural Network in MS Excel

Comparison of K-means and Backpropagation Data Mining Algorithms

EARTHQUAKE PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

Geothermal. . To reduce the CO 2 emissions a lot of effort is put in the development of large scale application of sustainable energy.

D A T A M I N I N G C L A S S I F I C A T I O N

1. Classification problems

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Planning Workforce Management for Bank Operation Centers with Neural Networks

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

Lecture 6. Artificial Neural Networks

REPORT. Earthquake Commission. Christchurch Earthquake Recovery Geotechnical Factual Report Merivale

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

USING NEURAL NETWORKS FOR THE PREDICTION OF PERMEABILITY IN ROCK FOR DERIK EMBANKMENT EARTH DAM

1. Fluids Mechanics and Fluid Properties. 1.1 Objectives of this section. 1.2 Fluids

Permafrost monitoring at Mölltaler Glacier and Magnetköpfl

Earthquakes. Earthquakes: Big Ideas. Earthquakes

6.2.8 Neural networks for data mining

Development of Tunnel Electrical Resistivity Prospecting System and its Applicaton

Data Mining mit der JMSL Numerical Library for Java Applications

Predicting Critical Problems from Execution Logs of a Large-Scale Software System

Big Data Analytics for SCADA

AS COMPETITION PAPER 2007 SOLUTIONS

Data Mining Approach for Predictive Modeling of Agricultural Yield Data

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Chapter 4: Artificial Neural Networks

Solving Regression Problems Using Competitive Ensemble Models

WILLOCHRA BASIN GROUNDWATER STATUS REPORT

Plate Tectonics: Ridges, Transform Faults and Subduction Zones

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

A hybrid Evolutionary Functional Link Artificial Neural Network for Data mining and Classification

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

Joseph Twagilimana, University of Louisville, Louisville, KY

Georgia Performance Standards Framework for Shaky Ground 6 th Grade

Layers of the Earth and Plate Tectonics

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

DYNAMIC CRUST: Unit 4 Exam Plate Tectonics and Earthquakes

Plate Tectonics Lab. Continental Drift. The Birth of Plate Tectonics

Recurrent Neural Networks

A Novel Method for Predicting the Power Output of Distributed Renewable Energy Resources

D-optimal plans in observational studies

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

ANNMD - Artificial Neural Network Model Developer. Jure Smrekar

ABSG Consulting, Tokyo, Japan 2. Professor, Kogakuin University, Tokyo, Japan 3

VISCOSITY OF A LIQUID. To determine the viscosity of a lubricating oil. Time permitting, the temperature variation of viscosity can also be studied.

Experiment 10. Radioactive Decay of 220 Rn and 232 Th Physics 2150 Experiment No. 10 University of Colorado

A Prediction Model for Taiwan Tourism Industry Stock Index

NEURAL networks [5] are universal approximators [6]. It

Price Prediction of Share Market using Artificial Neural Network (ANN)

Advanced analytics at your hands

A Property & Casualty Insurance Predictive Modeling Process in SAS

Artificial Intelligence and Machine Learning Models

Optimum Design of Worm Gears with Multiple Computer Aided Techniques

Data Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA)

Data Management and Exploration targeting case studies from Tanzania and Ghana

Predictive Dynamix Inc

Novelty Detection in image recognition using IRF Neural Networks properties

Can earthquakes be predicted? Karen Felzer U.S. Geological Survey

Analecta Vol. 8, No. 2 ISSN

The Earth System. The geosphere is the solid Earth that includes the continental and oceanic crust as well as the various layers of Earth s interior.

ARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES

Benchmarking Open-Source Tree Learners in R/RWeka

A Learning Algorithm For Neural Network Ensembles

Neural Networks for Sentiment Detection in Financial Text

Catastrophe Bond Risk Modelling

Keywords - Intrusion Detection System, Intrusion Prevention System, Artificial Neural Network, Multi Layer Perceptron, SYN_FLOOD, PING_FLOOD, JPCap

Proceedings 2005 Rapid Excavation & Tunneling Conference, Seattle

8. Machine Learning Applied Artificial Intelligence

Trace Gas Exchange Measurements with Standard Infrared Analyzers

Applying Multiple Neural Networks on Large Scale Data

Transcription:

NUKLEONIKA 2010;55(4):501 505 ORIGINAL PAPER Identification of radon anomalies in soil gas using decision trees and neural networks Boris Zmazek, Sašo Džeroski, Drago Torkar, Janja Vaupotič, Ivan Kobal Abstract. The time series of radon ( 222 Rn) concentration in soil gas at a fault, together with the environmental parameters, have been analysed applying two machine learning techniques: (i) decision trees and (ii) neural networks, with the aim at identifying radon anomalies caused by seismic events and not simply ascribed to the effect of the environmental parameters. By applying neural networks, 10 radon anomalies were observed for 12 earthquakes, while with decision trees, the anomaly was found for every earthquake, but, undesirably, some anomalies appeared also during periods without earthquakes. Key words: radon soil gas anomalies decision trees artificial neural network earthquakes Introduction B. Zmazek, S. Džeroski, D. Torkar, J. Vaupotič, I. Kobal Jožef Stefan Institute, 39 Jamova Str., 1000 Ljubljana, Slovenia, Tel.: +386 1 477 3213, Fax: +386 1 477 3811, E-mail: janja.vaupotic@ijs.si Received: 13 August 2009 Accepted: 30 December 2009 The transport of radon ( 222 Rn) from the ground towards the surface is influenced by a number of geophysical and geological parameters, among them seismicity. Prior to an earthquake, the formation of stress causes changes in the strain field. The displacement of rock mass within the earth s crust before an earthquake leads to changes in gas transport from deep layers in the earth to the surface [5]. As a result, larger quantities of radon are released from the pores and fractures of the rocks towards the surface. This may be considered as an anomaly in the concentration of radon. Because of seismicity, changes in underground fluid flow may account for anomalous changes in concentration of radon and its progeny [8]. A small change in velocity of gas [6] into or out of the ground causes a significant change in radon concentration at shallow soil depth as changes in gas flow disturb the strong radon concentration gradient existing between the soil and the atmosphere. A small change in gas flow velocity causes a significant change in radon concentration. Thus, monitoring of radon in soil gas is a means of detecting changes related to an earthquake. For small earthquakes, it is often impossible to identify an anomaly caused by a seismic event and not by meteorological or hydrological events. Therefore, the implementation of more advanced statistical methods in data evaluation appears to be essential [1, 3, 7]. In this contribution, the 32-month time series of radon concentration together with the environmental parameters (air and soil temperature, barometric pres-

502 B. Zmazek et al. Fig. 1. The map of Slovenia with the measurement site at the Krško town (black circle) and epicentres of the earthquakes (open circles, with radii corresponding to their magnitudes) which occurred from April 1999 to October 2001, and taken into account in our calculations. sure, rainfall) has been analysed in order to find radon anomalies, possibly caused by earthquakes. In a 480 cm deep borehole at the Orlica fault in the Krško basin in SE Slovenia (Fig. 1), radon concentration in soil gas was measured continuously (once an hour) using a Barasol radon probe (MC-450, ALGADE, France). In addition to radon, the probe also measures barometric pressure and temperature. The borehole wall was protected with a plastic tube and isolated from the atmospheric air with a plastic cap and soil cover to reduce hydro-meteorological influence on the measurement. Decision trees and artificial neural networks have been applied to predict radon concentration from meteorological data. As often experienced, for earthquakes Dobrovolsky s equation [4] was used to calculate R D, i.e., R D = 10 0.43M, where M is the earthquake magnitude and R D the radius of the zone within which precursory phenomena may be manifested (the so-called Dobrovolsky s radius in km). Earthquakes, for which the ratio between R E (distance between the epicentre and our measuring site) and R D is less than 2, have been used in the interpretation. Methodology n days (n varying from 1 to 10 days) after the occurrence of a seismic event, and (ii) the non-sa subset without seismic events, with data remained after subtracting the SA subset from the entire database. Both decision trees and neural networks were trained to predict the radon activity concentration based on the non-sa subset of the environmental data. The entire series was then subjected to analysis and the prediction appeared to fail during the SA periods. By appropriately chosen analytical conditions, a statistically significant difference between the measured and predicted values of radon concentration was observed before earthquakes (called correct anomalies CA). Unfortunately, anomalies were observed also during non-sa periods (called false anomalies FA). On the other hand, for some earthquakes no anomaly was found (called no anomalies NA). Results and discussion Experimental details are described elsewhere [12]. Air temperature and rainfall were measured at the meteorological station Bizeljsko, approximately 14 km from the boreholes. Data recorded are shown in Fig. 2. The time series was divided into two subsets: (i) the seismic activity (SA) subset possibly affected by seismic activity, comprising the data recorded during periods (called seismic windows ) lasting from n days before to Decision trees Since radon concentration is a numeric variable, we have approached the task of predicting radon concen-

Identification of radon anomalies in soil gas using decision trees and neural networks 503 Fig. 2. Time run of daily average radon concentration in soil gas and of soil temperature recorded with Barasol probes in 480 cm deep boreholes at the Krško-1 station at the Orlica fault in the Krško basin during the period from June 2000 to January 2002. Local earthquakes with R E /R D equal to or less than 2 [4], barometric pressure, air temperature and rainfall at the nearby meteorological station Bizeljsko are also shown. tration from meteorological data using regression (or function approximation) methods. We used regression trees [2], as implemented with the WEKA data mining suite [9]. When seismic windows of 7 days have been used, the best agreement between the measured and calculated concentrations during the non-sa periods has been found by an analysis in which the length of the SA periods was varied from 1 to 7 days [11]. The largest drop in the correlation coefficient was observed between 6 and 7 days before an earthquake. For every earthquake, a CA anomaly was found, but, undesirably, we have not been able to decrease the number of FA anomalies below 6. Figure 3 shows the measured radon concentration (m-c Rn ; dotted line) and that predicted by decision trees (p-c Rn ; full line). While the upper plot shows good agreement between the measured and predicted values in a non-sa period (from February 2, 2000 to March 3, 2000), the lower one shows a significant difference between the 2 in the SA periods (from September 9, 2000 to December 31, 2000). Next to bars, showing the magnitude of the earthquakes, values of the R E /R D ratios are also given. Neural networks The ANNs (artificial neural networks) are a well established tool for forecasting problems in different areas, like weather, econometrics, financial, stock prices, material science, with over 40 years of tradition [10]. Because of their universal approximated functional form, ANNs also appear to be an appropriate choice for modelling nonlinear dependency of radon concentrations on multiple variables. An enormous number of various topologies, training algorithms and architectures exist,

504 B. Zmazek et al. Fig. 3. Comparison of the measured radon concentration (m-c Rn ; dotted line) and the radon concentration predicted by decision trees (p-c Rn ; full line) for non-sa period from February 2, 2000 to April 29, 2000 (upper graph) and for SA period from September 10, 2000 to December 31, 2000. Also earthquakes are drawn as bars and R E /R D ratio is given. applicable to a class of modelling problems. It is difficult to tell in advance which training rule is the most suitable for a certain problem or which topology would produce the best results. After extensive experimentation using the collected data we have chosen the traditional MLP (multilayer perceptron) with conjugate gradient learning algorithm for use in further actions. The non-sa datasets contained from 816 samples (seismic window 0 days) to 550 samples (seismic window 10 days) and were firstly randomized and then each of them was divided into three sets: (i) the training set (60%), (ii) the cross-validation set (15%) and (iii) the test set (25%). The training and the cross-validation sets were used to train the ANN while the test set was Fig. 4. Comparison of the measured radon concentration (m-c Rn ; dotted line) and radon concentration predicted by ANN (p-c Rn ; full line) for the non-sa period from February 2, 2000 to April 29, 2000 (upper graph) and for the SA period from September 10, 2000 to December 31, 2000. Also earthquakes are drawn as bars with the values of R E /R D ratios attached.

Identification of radon anomalies in soil gas using decision trees and neural networks 505 used to verify its performance. The number of hidden layers and the training parameters (the learning step, the momentum) were selected by a well known genetic algorithm which creates a population of solutions and applies genetic operators such as mutation and crossover to evolve the solutions in order to find the best one. For the transfer function tanh() was used in all four layers. For each dataset out of 11, the ANN with two hidden layers with 5 neurons in the input layer fed by the five environmental data, 8 neurons in the first hidden layer, 7 in the second hidden layer and one neuron in the output layer was generated. The network was trained two times for predefined 45 000 epochs with reset of weights after the first training although the training and cross-validation error converged much sooner in most of the cases. We also investigated the possibility of an automatic definition of anomaly detection parameters for the available dataset. Observing the difference in correlation coefficients between the ANN prediction of non-sa data and SA + non-sa data we indicated that SA affects the radon concentrations mostly for ± 7 days away from the seismic event. From 13 earthquakes, 10 CA and no FA or NA anomalies were found. Figure 4 shows the measured radon concentration (m-c Rn ; dotted line) and radon concentration predicted by ANN (p-c Rn ; full line). While the upper plot again shows good agreement between the measured and predicted values in a non-sa period (from February 2, 2000 to March 3, 2000), the lower one shows a significant difference between them in the SA periods (from September 9, 2000 to December 31, 2000). Earthquakes are shown as bars with R E /R D ratio. Conclusion The results of applying decision trees and neural networks to identify radon anomalies, possibly caused by seismic events and not solely ascribed to the effects of environmental parameters, are encouraging. We shall direct our efforts towards improvement in applying both methods, in order to have the number of CA anomalies equal to the number of earthquakes, and to reduce the number of FA and NA anomalies to a minimum, desirably to zero. Acknowledgment. The study was financed by the Slovenian Research Agency. References 1. Belyaev AA (2001) Specific features of radon earthquake precursors. Geochem Int 12:1245 1250 2. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont 3. Di Bello G, Ragosta M, Heinicke J et al. (1998) Time dynamics of background noise in geoelectrical and geochemical signals: an application in a seismic area of Southern Italy. Il Nuovo Cimento 6:609 629 4. Dobrovolsky IP, Zubkov SI, Miachkin VI (1979) Estimation of the size of earthquake preparation zones. Pure Appl Geophys 117:1025 1044 5. Fleischer RL (1997) Radon in earthquake prediction: radon measurements by etched track detectors: applications in radiation protection. In: Durrani SA, Ilic R (eds) Earth sciences and the environment. World Scientific, Singapore, pp 285 299 6. Grammakov AG (1936) On the influence of some factors in the spreading of radioactive emanations under natural conditions. Zh Geofiz 6:123 148 7. Negarestani A, Setayeshi S, Ghannadi-Maragheh M, Akashe B (2001) Layered neural networks based analysis of radon concentration and environmental parameters in earthquake prediction. J Environ Radioact 62:225 233 8. Steinitz G, Begin ZB, Gazit-Yaari N (2003) Statistically significant relation between radon flux and weak earthquakes in the Dead Sea rift valley. Geology 31:505 508 9. Witten IH, Frank E (1999) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco 10. Zhang G, Patuwo BE, Hu MY (1998) Forecasting with artificial neural networks: the state of the art. Int J Forecasting 14:35 62 11. Zmazek B, Todorovski L, Džeroski S, Vaupotič J, Kobal I (2003) Application of decision trees to the analysis of soil radon data for earthquake prediction. Appl Radiat Isot 58;6:697 706 12. Zmazek B, Živčić M, Vaupotič J, Bidovec M, Poljak M, Kobal I (2002) Soil radon monitoring in the Krško basin, Slovenia. Appl Radiat Isot 56:649 657