Essential Components of an Integrated Data Mining Tool for the Oil & Gas Industry, With an Example Application in the DJ Basin. Petroleum & Natural Gas Engineering West Virginia University SPE Annual Technical Conference, Denver, Colorado, October 2003 1
OUTLINE Introduction Data Mining Classifications Descriptive Data Mining Predictive Data Mining Application in DJ Basin Conclusions 2
Introduction Not a new process Addition of machine learning Artificial Neural Networks Genetic Optimization Fuzzy Logic A 1.5 billion dollar business by 2005 If your company is not doing it now. It will. 3
Introduction A significant increase in the volume of digital data. DATA INFORMATION KNOWL EDGE Data Mining lets you be proactive. 4
Introduction An Integrated Process. 5
OUTLINE Introduction Data Mining Classifications Descriptive Data Mining Predictive Data Mining Application in DJ Basin Conclusions 6
Data Mining Classification Definition: The nontrivial extraction of implicit, previously unknown, and potentially useful information from data. Exploratory Data Mining What can the data tell me? Guided Data Mining Looking for specific answer to questions. 7
Data Mining Classification 8
Components of a Data Mining Tool Dataset Fuzzy Combinatorial Analysis Neural Model Building Statistical Module Cluster Analysis Automatic Cluster Optimization Genetic optimization Data Cleansing Module Neural Network Data Preparation Training, Calibration & Verification Datasets Fuzzy Decision Support System 9
Quality Control & Preprocessing Probably the most important and time consuming components of the data mining process. May take as much as 50-65% of the entire project. 10
Quality Control & Preprocessing Sources of error Human (erratic, hard to resolve) Data collection Data entry Data manipulation Equipment (systematic, shifts) 11
Quality Control & Preprocessing It must include the following components: Dealing with missing data Dealing with outliers Dealing with contaminated data Statistical analysis Feature selection Feature reduction Pseudo data generation 12
Quality Control & Preprocessing Statistical Analysis Examine the major statistics of the data set. 13
Quality Control & Preprocessing Statistical Analysis Perform Regression analysis on all the features in the data set. 14
Quality Control & Preprocessing Statistical Analysis Data distribution can reveal important information. 15
Quality Control & Preprocessing Statistical Analysis Study the probability density function for all the features. 16
Quality Control & Preprocessing Missing data Identify the location of the missing data 17
Quality Control & Preprocessing Missing data Patch the holes in the data set. 18
Quality Control & Preprocessing Outliers Identify the outlier records in the data set. 19
Quality Control & Preprocessing Outliers Identify a remedy for the outlier. 20
OUTLINE Introduction Data Mining Classifications Descriptive Data Mining Predictive Data Mining Application in DJ Basin Conclusions 21
Descriptive Data Mining Feature Selection Rule Induction Cluster Analysis Hierarchical K-mean Fuzzy c-mean Self Organizing Neural Networks 22
Descriptive Data Mining Exploratory in nature. An absolutely essential component of the data mining process. May reveal interesting and relevant information. 23
Descriptive Data Mining Feature Selection module that has the capability of identifying the most influential parameters in a dataset. 24
Descriptive Data Mining Hard to see which feature is influencing the outcome more than others. 25
Descriptive Data Mining Each feature influences the outcome to a degree. Features influence on another as well as the outcome. 26
Descriptive Data Mining 27
Descriptive Data Mining Clustering describes a collection of unsupervised methods whose aim is to partition an overall data set into a significantly smaller number of ``clusters''. 28
Descriptive Data Mining These methods in general require some kind of distance measure among the data entities in order to group them together and identify each data entity with one cluster. 29
Cluster Analysis K-Mean Clustering Fuzzy C-Mean Clustering 30
Cluster Analysis Identifying optimum number of clusters and optimum features involved in clustering is very important and quite challenging. 31
OUTLINE Introduction Data Mining Classifications Descriptive Data Mining Predictive Data Mining Application in DJ Basin Conclusions 32
Predictive Data Mining Can identify patterns already in the dataset. Has the potential to identify patterns that might not yet exist in the dataset but has the potential of developing. It can fill all the gaps in dataset. 33
Predictive Data Mining A highly supervised process that includes: Decision Tree Analysis Artificial Neural Networks Genetic Algorithms Fuzzy Logic 34
Decision Tree Analysis Appropriate for solving problems that can be dissected into a logical progression of events. 35
Neural Networks Information Processing technique as a function of architecture. Humans: Parallel, Distributive. Computers: Sequential, Pointwise. 36
Neural Networks _ P_NN_ S_V_D _S _ P_NN _RN_D 37
Fuzzy Logic Probably one of the most important tools for data mining. Data = An instant of reality & nature Reality & nature are too complex to be fully explained by the binary system of belief. Fuzzy logic is an absolute necessity. 38
Predictive Data Mining THE KEY COMPONENT Neural model building module: Data preparation module Kohonen self organizing Network Back Propagation Network Radial Basis Network General Regression Network Genetic optimization module Fuzzy logic module INTEGRATION HYBRID INTELLIGENT SYSTEMS 39
OUTLINE Introduction Data Mining Classifications Descriptive Data Mining Predictive Data Mining Application in DJ Basin Conclusions 40
Application in DJ Basin Data mining applied to stimulation and restimulation database for the DJ Basin The original database for stimulation of Codell wells in the DJ basin needed considerable preprocessing: Removal of contaminated data. Identification and management of outliers. Identification and management of missing data. 41
Application in DJ Basin Fuzzy Combinatorial Analysis The analysis was performed for the combination of up to five features. Rank Feature FCA Value Rank Feature FCA Value 1 Flowback Volbbl 0 22 Frac Type 2.2848 2 CO -Phi-H 0.5811 23 No-CO-Perfs 2.303 3 Bicarbonate ppm 0.6666 24 Chloride ppm 2.3298 4 Peak Visc 0.7486 25 NI- Perfed-H 2.3302 5 Lat 0.7734 26 Water phlab 2.3665 6 Orig20/40 Sand-Mlbs 0.9214 27 Pre-Refrac Mcfd 2.3956 7 Long 1.1 28 Cum MMcf 2.4009 8 Refrac Date 1.1934 29 Water Source 2.4018 9 ViscShear 100-30Min 1.3324 30 Iron ppm 2.4351 10 TotHardness ppm 1.518 31 MGAL 2.496 11 Calcium ppm 1.6692 32 TotalPerfs 2.5045 12 AvgRate BPM 1.7415 33 Sulfate ppm 2.5164 13 Est-Ult- GOR 1.7706 34 New Perfs 2.552 14 No-NI -Perfs 1.7863 35 Sodium ppm 2.6039 15 AvgPsi 1.8438 36 Magnesium ppm 2.6108 16 ViscShear 100-5Min 1.9401 37 ViscShear 100-0Min 2.6649 17 Top CO Perf 1.9819 38 Pre- FracISDP 2.7127 18 TDSolid ppm 2.0084 39 TestedPH 2.8066 19 MMcf 2.0777 40 Post- FracISDP 2.8256 20 Orig Fluid-Mgal 2.0855 41 Mlb20-40 2.8907 21 DOFP 2.2451 42 Communication 2.9554 42
Application in DJ Basin 43
Application in DJ Basin 44
Application in DJ Basin No well logs or reservoir characteristics were present in the database. 45
Application in DJ Basin Predictive Model was developed based on the available data. Training Calibration Verification Rsquare 0.783 0.821 0.516 Correlation Coefficient 0.901 0.907 0.809 46
Application in DJ Basin Sensitivity analysis performed on all wells based on the predictive model. 47
Application in DJ Basin Sensitivity analysis performed on all wells based on the predictive model. 48
Application in DJ Basin Sensitivity analysis performed on all wells based on the predictive model. Variable No. of Perforation in Codel (Number) Original 20-40 Sand pumped (Mlbs) Original Fluid Pumped (Mgal) Distribution Uniform Uniform Uniform Minimum 4 85.5 44.5 Maximum 80 600 147.6 49
Application in DJ Basin The dominant trend implies that: Low viscosity frac fluids are preferred to higher viscosity fluids This agrees with the trends identified in the amount of proppant analysis. Indeed, for low proppant concentrations there is no need for high viscosity fluids. 50
Application in DJ Basin 51
OUTLINE Introduction Data Mining Classifications Descriptive Data Mining Predictive Data Mining Application in DJ Basin Conclusions 52
Conclusions Data Mining is gaining momentum in our industry. Commercial products that include all the integrated components mentioned here are not in the market at the present time. Their development is essential for our industry s profitability in future. 53