Kernel methods with Imbalanced Data and Applications to Weather Prediction

Size: px
Start display at page:

Download "Kernel methods with Imbalanced Data and Applications to Weather Prediction"

Transcription

1 Kernel methods with Imbalanced Data and Applications to Weather Prediction Theodore B. Trafalis Laboratory of Optimization and Intelligent Systems School of Industrial and Systems Engineering University of Oklahoma San Francisco Airport Marriott Waterfront Congress Center San Francisco, CA, 8-10 August, 2015 INNS Conference on Big Data /

2 Part I Kernel Methods for Imbalanced Data and Application to Tornado Prediction INNS Conference on Big Data /

3 Imbalanced Data What is an Imbalanced Data Problem? 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set INNS Conference on Big Data /

4 Imbalanced Data What is an Imbalanced Data Problem? Imbalanced Data Problems and their Importance The problem of learning from an imbalanced data set occurs when the number of samples in one class is significantly greater than that of the other class. Imbalanced data is very important in data mining and data classification. Examples of imbalanced data sets include: Fraudulent credit card transactions. Telecommunication equipment failures. Oil spills from satellite images. Tornado, earthquake and landslide occurrences. Cancer and health science data. INNS Conference on Big Data /

5 Imbalanced Data What is an Imbalanced Data Problem? Example of the Classification of Imbalanced Data Source: Tang et al. SVMs Modeling for Highly Imbalanced Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(1): , INNS Conference on Big Data /

6 Imbalanced Data What is an Imbalanced Data Problem? Between Class and Within Class Imbalances Source: He and Garcia. Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9): , INNS Conference on Big Data /

7 Imbalanced Data Impact of Imbalanced Data on Learning Machines 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set INNS Conference on Big Data /

8 Imbalanced Data Impact of Imbalanced Data on Learning Machines Impact of Imbalanced Data Problems in Classification Classifiers tend to provide an imbalanced degree of accuracy with the majority class having close to 100 percent accuracy, and the minority class having an accuracy close to 0-10 percent. In the tornado data set, for example, a 10 percent accuracy for the minority class suggests that 72 tornadoes would be classified as nontornadoes. INNS Conference on Big Data /

9 Imbalanced Data Impact of Imbalanced Data on Learning Machines Illustration of Impact of Imbalanced Classification On the left, the accuracy of the minority class is zero percent. On the right, the accuracy for the minority class is 80 percent. INNS Conference on Big Data /

10 Imbalanced Data State of the Art Techniques for Imbalanced Learning 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set INNS Conference on Big Data /

11 Imbalanced Data State of the Art Techniques for Imbalanced Learning Current Approaches for Imbalanced Learning I Algorithm Level Threshold method. Learn only the minority class. Cost-sensitive approaches. Data Level Random under-sampling and over-sampling. Uninformed under-sampling (EasyEnsemble, BalanceCascade). Synthetic sampling with data generation (SMOTE). Adaptive synthetic sampling (Borderline-SMOTE, ADASYN). Sampling with data cleaning (OSS method, CNN+Tomek Links integration method, Neighborhood Cleaning rule, SMOTE+ENN, SMOTE+Tomek). Cluster-based sampling (CBO). Integration of sampling and boosting (SMOTEBoost). Kernel-based approaches Variations of Support Vector Machines (SVM). INNS Conference on Big Data /

12 Imbalanced Data State of the Art Techniques for Imbalanced Learning Current Approaches for Imbalanced Learning II Kernel Logistic Regression. Evaluation metrics. Metrics used to evaluate accuracies. Receiver Operating Characteristic (ROC), Precision-Recall (PR) and Cost Curves. Singular assessment metrics based on the confusion or multi-class cost matrix (F-measure, G-mean, etc). Source: He and Garcia. Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9): , INNS Conference on Big Data /

13 Imbalanced Data State of the Art Techniques for Imbalanced Learning Problems with Imbalanced Data Prediction Improper evaluation metrics. Lack of data: absolute rarity. Number of observations is small in absolute sense. Relative lack of data: relative rarity. Relative to other events. Data fragmentation. Absolute lack of data within a single partition. Inappropriate inductive bias. Such as an assumption of linearity. INNS Conference on Big Data /

14 Kernel Methods General Description of Kernel Methods 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set INNS Conference on Big Data /

15 Kernel Methods General Description of Kernel Methods Historical Perspective Efficient algorithms for detecting linear relations were used in the 1950s and 1960s (perceptron algorithm). Handling nonlinear relationships was seen as major research goal at that time but the development of nonlinear algorithms with the same efficiency and stability has proven as an elusive goal. In the mid 80s, the field of pattern analysis underwent a nonlinear revolution with backpropagation neural networks (NNs) and decision trees (based on heuristics and lacking a firm theoretical foundation, local minima problems, nonconvexity). In the mid 90s, kernel based methods have been developed while retaining the guarantees and understanding that have been developed for linear algorithms. INNS Conference on Big Data /

16 Kernel Methods General Description of Kernel Methods Overview I Kernel Methods are a new class of machine learning algorithms which can operate on very general types of data and can detect very general types of relations (e.g., Potential function method; Aizerman et al., 1964, Vapnik, 1982, 1995). Correlation, factor, cluster and discriminant analysis are some of the types of machine learning analysis tasks that can be performed on data as diverse as sequences, text, images, graphs and vectors using kernels. Kernel methods provide also a natural way to merge and integrate different types of data. INNS Conference on Big Data /

17 Kernel Methods General Description of Kernel Methods Overview II Kernel methods offer a modular framework. In a first step, a dataset is processed into a kernel matrix. Data can be of various types and also of heterogeneous types. In a second step, a variety of kernel algorithms can be used to analyze the data, using only the information contained in the kernel matrix. INNS Conference on Big Data /

18 Kernel Methods General Description of Kernel Methods Modular Framework Source: J. Shawe-Taylor and N. Cristianini, Kernel methods for pattern analysis, INNS Conference on Big Data /

19 Kernel Methods General Description of Kernel Methods Basic Idea of Kernel Methods Kernel Methods work by: Embedding data in a vector space called feature space using a kernel function. Looking for linear relations in such a space. Much of the geometry of the data in the embedding space (relative positions) is contained in all pair-wise inner products (information bottleneck). We can work in feature space by specifying an inner product function k between points in it. In many cases, inner product in the embedding space (feature space) is very cheap to compute. INNS Conference on Big Data /

20 Kernel Methods General Description of Kernel Methods Properties of Kernels I Definition (Mercer kernel) Let E be any set. A function k : E E R that is continuous, symmetric and finitely positive semi-definite is called here a Mercer kernel. Definition (Finitely positive semi-definiteness) A function k : E E R, where E is any set, is a finitely positive semi-definite kernel if m m i=1 j=1 k(x i,x j )λ i λ j 0, for any m N, λ i R, x i E and i 1,m. It can be seen as the generalization of a positive semi-definite matrix. INNS Conference on Big Data /

21 Kernel Methods General Description of Kernel Methods Properties of Kernels II Definition (RKHS) A Reproducing Kernel Hilbert Space (RKHS) F is a Hilbert space of complexvalued functions on a set E for which there exists a function k : E E C (the reproducing kernel) such that k(,x) F for any x E and such that f,k(,x) = f (x) for all f F (reproducing property). If k is a symmetric positive definite kernel then, by the Moore-Aronszajn s theorem, there is a unique RKHS with k as the reproducing kernel. A symmetric positive definite kernel k can be expressed as a dot product k : (x,y) φ(x),φ(y), where φ is a map from R n to a RKHS H (kernel trick). INNS Conference on Big Data /

22 Kernel Methods General Description of Kernel Methods Properties of Kernels III Properties For any x 1,...,x l the l l matrix K with entries K ij = k(x i,x j ) is symmetric and positive semi-definite. The matrix K is called kernel matrix. A kernel k can be expressed as k : (x,y) φ(x),φ(y), where φ is a map from R n to a Hilbert space H (kernel trick). The space H is called the feature space. INNS Conference on Big Data /

23 Kernel Methods General Description of Kernel Methods Properties of Kernels IV The image of R d by φ is a manifold S in H. Kernels can be interpreted as measures of distance and measures of angles on S. Simple geometric relations between S and hyperplanes of H can give complex forms in R d. INNS Conference on Big Data /

24 Kernel Methods Support Vector Machines Applied to Imbalanced Data 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set INNS Conference on Big Data /

25 Kernel Methods Support Vector Machines Applied to Imbalanced Data Support Vector Machines Definition (SVM) Support Vector Machines are a family of learning algorithms that use kernel methods to solve supervised learning problems. Common supervised learning tasks concern problems of classification and regression. SVMs work by solving Quadratic Programming problems that aim to minimize the generalization error. If we are given a set S of l points x i R n where each x i belongs to either of two classes defined by y i { 1,+1}, then the objective is to find a hyperplane that divides S leaving all the points of the same class on the same side while maximizing the minimum distance between either of the two classes and the hyperplane [Vapnik 1995]. INNS Conference on Big Data /

26 Kernel Methods Support Vector Machines Applied to Imbalanced Data Optimal Separating Hyperplane Source: Microsoft Research. Vision, Graphics, and Visualization Group, INNS Conference on Big Data /

27 Kernel Methods Support Vector Machines Applied to Imbalanced Data Dual problem in nonlinear case The optimal hyperplane is obtained by solving the following Quadratic Programming (QP) problem: { min α t Kα/2 1,α : α,y = 0 and 0 α C }. α R l This QP problem is the dual formulation of a QP problem that maximizes the margin of separation between the sets of points in the feature space. Given a solution α, the optimal hyperplane is expressed as {x R n : l i=1 α i y i k(x i,x)+b = 0} where b is computed using the complementary slackness conditions of the primal formulation. INNS Conference on Big Data /

28 Kernel Methods Support Vector Machines Applied to Imbalanced Data Binary Classification of Imbalanced Data with SVMs Binary classification of imbalanced data needs a rewritting of the primal SVM problem, namely: min w,b,ξ subject to 1 2 w,w H + C 1 ξ i + C 1 ξ i ( y i =1 y i = 1 y i w,φ(xi ) H + b ) 1 ξ i, i [1,l] ξ i 0, i [1,l] C 1 is the trade-off coefficient for the minority class and C 1 is the tradeoff coefficient for the majority class. For imbalanced data, we wish to have C 1 < C 1 i.e. the penalty for outliers in the minority class is greater than the one for the majority class. This approach is strongly related to robust classification. INNS Conference on Big Data /

29 Kernel Methods Support Vector Machines Applied to Imbalanced Data Illustration of SVM Training with Imbalanced Data Source: Tang et al. SVMs Modeling for Highly Imbalanced Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(1): , INNS Conference on Big Data /

30 Kernel Methods Support Vector Machines Applied to Imbalanced Data One-Class SVM for Anomaly Detection Anomaly detection is equivalent to building an enclosure around a cloud of points which are coding non-anomalous objects in order to separate them from outliers which represent anomalies. This problem is known as the soft minimal hypersphere problem and, for points mapped in the feature space H, it is expressed as min c,r,ξ subject to r 2 + C l i=1 ξ i φ(x i ) c 2 H r 2 + ξ i, i [1,l] ξ i 0, i [1,l] INNS Conference on Big Data /

31 Kernel Methods Support Vector Machines Applied to Imbalanced Data Example of a Soft Minimal Enclosing Hypersphere Sublevel sets for different values of the constant γ. INNS Conference on Big Data /

32 Kernel Methods Support Vector Machines Applied to Imbalanced Data Drawbacks of SVMs The soft-margin maximization paradigm minimizes the total error, which in return introduces a bias toward the majority class. Offline calculations. Unsuitable for processing data streams. Inadequate for large problems (except when using heuristics such as Platt s Sequential Minimal Optimization). INNS Conference on Big Data /

33 Application to Tornado Prediction Description of the Experiments 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set INNS Conference on Big Data /

34 Application to Tornado Prediction Description of the Experiments Tornado Experiments I The data were randomly divided into two sets: training/validation and independent testing. In the complete training/validation set, there are 361 cases of tornadic observations and 5048 cases of non-tornadic observations from 59 storm days. In the independent testing set, there are 360 tornadic observations and 5047 non-tornadic observations from 52 storm days. The percentage of tornadic observations in each data set is 6.7%. INNS Conference on Big Data /

35 Application to Tornado Prediction Description of the Experiments Tornado Experiments II Cross validations were applied with different combinations of kernel functions (linear, polynomial and Gaussian radial basis function) and parameter values on the training/validation set. Each classifier is tested on the test observations drawn randomly with replacement using bootstrap resampling (Efron and Tibshirani, 1993) with 1000 replications on the independent testing set to establish confidence intervals. The best support vector solution is chosen for which the classifier has the highest mean Critical Success Index (Hit/(Hit + Miss + False Alarms)) on the validation set. The best classifier uses the Gaussian radial basis function kernel with radius of We apply these optimal parameters to predict the outcomes of the testing set. INNS Conference on Big Data /

36 Application to Tornado Prediction Description of the Experiments INNS Conference on Big Data /

37 Application to Tornado Prediction Results for the Tornado Data Set 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set INNS Conference on Big Data /

38 Application to Tornado Prediction Results for the Tornado Data Set Results for the Tornado Data Set Results computed from the binary confusion matrix with a 95% confidence interval. Measure Validation Test POD 57% ± 13% 57% ± 13% FAR 18% ± 10% 31% ± 14% CSI 50% ± 10% 45% ± 12% Bias % ± 21% 83% ± 20% HSS 62% ± 9% 60% ± 11% POD: probability of detection (hit/(hit + correct null)); FAR: false alarm rate (false alarm/(hit + false alarm)); CSI: critical success ratio (hit/(hit + false alarm + miss)); Bias ((hit + false alarm)/(hit + miss)); HSS: Heidke skill score. Source: I. Adrianto, T. B. Trafalis, and V. Lakshmanan. Support vector machines for spatiotemporal tornado prediction. International Journal of General Systems, 38(7): , 2009 INNS Conference on Big Data /

39 Part II Dynamic Forecasting Using Kernel Methods INNS Conference on Big Data /

40 Filtering with Kernel Methods Key Notions 4 Filtering with Kernel Methods Key Notions Approach Outline Assimilation with Kernel Methods 5 Applications to Meteorology Experimental Setup Lorenz 96 Model Quasi-Geostrophic Model INNS Conference on Big Data /

41 Filtering with Kernel Methods Key Notions Objectives of Dynamic Forecasting Using Kernel Methods Dynamical systems Physical systems are mathematically represented by states in some abstract space. Transitions between states are modeled using transition functions over the state space in order to simulate the system dynamics. Objectives To provide an alternative to Kalman filtering to predict the future states of nonlinear dynamical systems. To use machine learning techniques and kernel methods in order to build nonlinear state predictors. INNS Conference on Big Data /

42 Filtering with Kernel Methods Key Notions Kalman Filtering Definition (Kalman Filter) Given a sequence of perturbed measurements, a Kalman Filter is a process that estimates the states of a dynamical system. We will consider only differentiable real-time nonlinear dynamical systems (to which correspond nonlinear Kalman filters). The state transition and observation models of the nonlinear dynamical system are t x(t) = f (x,u,t) + w(t) and z(t) = h(x,t) + v(t), where x is the state of the system, z is the observation, f is the state transition function, h is the observation model, u is the control and (w,v) are the (Gaussian) noise. INNS Conference on Big Data /

43 Filtering with Kernel Methods Key Notions Example: radar tracking From Pattern Recognition and Machine Learning by C. M. Bishop. Blue points: true positions; Green points: noisy observations; Red crosses: forecasts. INNS Conference on Big Data /

44 Filtering with Kernel Methods Key Notions Kalman Filter and Kernel Methods: Comparison of Assumptions Unlike the linear Kalman Filter, the nonlinear variants do not necessarily give an optimal state estimator. The filter may also diverge if the initial estimate is wrong or if the model is incorrect. For Kalman filters, the process must be Markovian, the perturbations must be independent and they must follow a Gaussian distribution. Implementations must face problems related to matrix storage, matrix inversion and/or matrix factorization. Kernel methods need no statistical assumptions on the process noise and they work with both Markovian and non-markovian processes. Storage and computational requirements are modest. INNS Conference on Big Data /

45 Filtering with Kernel Methods Approach Outline 4 Filtering with Kernel Methods Key Notions Approach Outline Assimilation with Kernel Methods 5 Applications to Meteorology Experimental Setup Lorenz 96 Model Quasi-Geostrophic Model INNS Conference on Big Data /

46 Filtering with Kernel Methods Approach Outline Assimilation and Forecasting with Kernel Methods 1. Assimilation The assimilation step attempts to recover the unperturbed system states from the current and past observations using kernel-based regression techniques. Kernel methods are removing noise from states trajectories and are updating them from the previous forecasts. 2. Forecasting The last assimilated state can be used as an initial estimate for one iteration of a nonlinear Kalman filter. A polynomial predictive analysis on the last recorded state trajectories using a Lagrange interpolation with Chebyshev nodes can provide reliable extrapolations. The generalization property of the SVM regression function can be used to estimate the next future state. INNS Conference on Big Data /

47 Filtering with Kernel Methods Approach Outline Advantages and Shortcomings of Kernel Methods Advantages Low memory requirements (some kernels require only O(n) elements to be stored in memory). Acceptable computational complexity (of the order of O(n 2 ), data thinning can reduce the size of the input data set). Massive parallelization (can be applied separately to each trajectory). No statistical assumptions, no state transition model necessary. Can be combined with a Kalman filter if necessary. Shortcomings Estimation of some kernel parameters. INNS Conference on Big Data /

48 Filtering with Kernel Methods Assimilation with Kernel Methods 4 Filtering with Kernel Methods Key Notions Approach Outline Assimilation with Kernel Methods 5 Applications to Meteorology Experimental Setup Lorenz 96 Model Quasi-Geostrophic Model INNS Conference on Big Data /

49 Filtering with Kernel Methods Assimilation with Kernel Methods Interpolating States Trajectories without Model Main Idea Illustration Find a non-trivial function f such as for every given sample (t i,x i ) R 2 we have f (t i ) = x i (or f (t i ) is in an interval centered at x i and of half-width ε 0). The interpolation function is expressed using an affine combination of kernel-based functions k(t, ). The positive semi-definite matrix K with entries K ij = k(t i,t j ) is called the kernel matrix. Kalman filtering works differently. Contrary to this approach, no state trajectory interpolation takes place with KFs. Furthermore KFs absolutely need a model. Skip to Conclusions INNS Conference on Big Data /

50 Filtering with Kernel Methods Assimilation with Kernel Methods Fitting Functions The non-trivial function f such that f (t i ) = x i is chosen to belong to the function class: { F = t R l i=1 α i k(t i,t) + b R : α t Kα B 2 }. The Rademacher complexity of F measures the capability of the functions of F to fit random data with respect to a probability distribution generating this data. The empirical Rademacher complexity of F, denoted by ˆR(F ), is such that ˆR(F ) 2B tr(k)/l + 2 b / l. INNS Conference on Big Data /

51 Filtering with Kernel Methods Assimilation with Kernel Methods Minimizing the Generalization and the Empirical Errors We use ˆR(F ) to control the upper bound of the generalization error of the interpolation function. Small empirical errors and ˆR(F ) contribute to decrease this bound, therefore: Aim We need to minimize the absolute value of b. Also we have α t Kα K α 2. Thus minimizing α t α + b 2 contributes to a smaller ˆR(F ); The empirical error defined by ( l i=1 ξ i ) /l, where the ξ i s are the differences between targets and desired outputs, should be minimized. To minimize the quantity α t α + b 2 + Cξ t ξ with C > 0. INNS Conference on Big Data /

52 Filtering with Kernel Methods Assimilation with Kernel Methods Optimization Problem Introducing tolerances ρ i > 0, the empirical errors ξ i are equal to f (t i ) x i. That is the only constraints associated with the previous objective function. Hence the previous calculations lead to: Optimization Problem for Data Assimilation (Gilbert et al., 2010) { min α t α + b 2 + Cξ t ξ : Kα + b1 l x = ξ }. (α,ξ,b) R 2l+1 The solution of this optimization problem is: Analytical Solution ( ) K 2 + I l /C + 1 l 1 t l d = x, α = Kd, b = 1 t l d. The solutions α and b describe the regression function (that belongs to the function class F ) which interpolates state trajectories during assimilation. INNS Conference on Big Data /

53 Filtering with Kernel Methods Assimilation with Kernel Methods Computational details To compute the solution d of the linear system ( K 2 + I l /C + 1 l 1 t l) d = x, we define σ = 1/ C, the matrix A = K + σi l (A is symmetric and positive definite) and the following sequences: Aũ 0 = x, Au 0 = ũ 0, Aũ n = u n, Au n+1 = 2σ(u n σũ n ). Aṽ 0 = 1 l, Av 0 = ṽ 0, Aṽ n = v n, Av n+1 = 2σ(v n σṽ n ). We set u = u n and v = v n. Both series are rapidly convergent and they n 0 n 0 are truncated at a step m > 0 in practical problems. Once m is determined we then have d = u 1 l,u l,v v. INNS Conference on Big Data /

54 Filtering with Kernel Methods Assimilation with Kernel Methods Approach Summary We have chosen a class of functions in order to interpolate state trajectories without a model. We defined a fitness measure for this class of functions and linked it to the parameters describing a function in that class. We defined an optimization problem that aims to minimize empirical errors and maximize the fitness of the function interpolating state trajectories. An analytical solution of the optimization problem was derived and its computation was reduced to solving a sequence of linear systems. INNS Conference on Big Data /

55 Applications to Meteorology Experimental Setup 4 Filtering with Kernel Methods Key Notions Approach Outline Assimilation with Kernel Methods 5 Applications to Meteorology Experimental Setup Lorenz 96 Model Quasi-Geostrophic Model INNS Conference on Big Data /

56 Applications to Meteorology Experimental Setup Experimental Setup Machine and Software All codes were implemented on MATLAB 7.9 using a 2002 DELL Precision Workstation 530 with two 2.4 GHz Intel Xeon processors and 2 GiB of RAM. EnKF forecasts were generated with the EnKF Matlab toolbox version 0.23 by Pavel Sakov (available at Evensen s webpage at enkf.nersc.no). Experimental Models The kernel approach was tested on the Lorenz 96 model and the Quasi- Geostrophic 1.5-layer reduced gravity model. Forecasts were obtained using a combination of polynomial predictive analysis and kernel-based extrapolation during the assimilation stage. INNS Conference on Big Data /

57 Applications to Meteorology Lorenz 96 Model 4 Filtering with Kernel Methods Key Notions Approach Outline Assimilation with Kernel Methods 5 Applications to Meteorology Experimental Setup Lorenz 96 Model Quasi-Geostrophic Model INNS Conference on Big Data /

58 Applications to Meteorology Lorenz 96 Model Lorenz 96 Model: Description Introduced by Lorenz and Emanuel (1998). The state transition model is representing the values of atmospheric quantities at discrete locations spaced equally on a latitude circle (1D problem). The state transition model at a location i on the latitude circle is: t x i = (x i+1 x i 2 )x i 1 } {{ } advection x i }{{} dissipation + }{{} F. external forcing The states represent an unspecified scalar meteorological quantity, e.g. vorticity or temperature (Lorenz and Emanuel). This model was introduced in order to select which locations on a latitude circle are the most effective in improving weather assimilation and forecasts. Observations where generated with an error variance of 1. The external forcing was set to F = 8. The system showed a chaotic behavior. INNS Conference on Big Data /

59 Applications to Meteorology Lorenz 96 Model Lorenz 96 Model: Assimilation The following figures illustrate how kernel methods remove the observational noise and interpolate state trajectories during assimilation. The used kernel is a Gaussian RBF kernel with σ = 3δ t. INNS Conference on Big Data /

60 Applications to Meteorology Lorenz 96 Model Lorenz 96 Model: Forecast Errors INNS Conference on Big Data /

61 Applications to Meteorology Quasi-Geostrophic Model 4 Filtering with Kernel Methods Key Notions Approach Outline Assimilation with Kernel Methods 5 Applications to Meteorology Experimental Setup Lorenz 96 Model Quasi-Geostrophic Model INNS Conference on Big Data /

62 Applications to Meteorology Quasi-Geostrophic Model Quasi-Geostrophic Model: Description It is an atmospheric dynamical model involving an approximation of actual winds. It is used in the analysis of large scale extratropical weather systems. System states are scalar quantities representing the air flow. Horizontal winds are replaced by their geostrophic values in the horizontal acceleration terms of the momentum equations, and horizontal advection in the thermodynamic equation is approximated by geostrophic advection. Furthermore, vertical advection of momentum is neglected. It is a 2D problem: the atmosphere has a single level in the vertical and was represented by a square grid where each state is located on a node of that grid. Observations where generated with an error variance of 1. INNS Conference on Big Data /

63 Applications to Meteorology Quasi-Geostrophic Model Quasi-Geostrophic Model: Assimilation Example Despite the absence of dynamical model, the kernel interpolation obtained from the noisy observations closely matches the true states. Return to the Approach Outline INNS Conference on Big Data /

64 Applications to Meteorology Quasi-Geostrophic Model Quasi-Geostrophic Model: Forecast Example Kernel and EnKF forecasts at time step 51. The EnKF forecast is 8 units away from the true state while the kernel forecast is 0.5 units away. INNS Conference on Big Data /

65 Applications to Meteorology Quasi-Geostrophic Model Quasi-Geostrophic Model: Forecast Errors INNS Conference on Big Data /

66 Applications to Meteorology Quasi-Geostrophic Model Conclusions I What was Achieved? A viable kernel-based approach for data assimilation and forecasting has been introduced for nonlinear dynamical systems. It showed predictable performances on meteorological models ranging from equivalent to that of EnKF with 50 ensemble members to exceeding that of EnKF with 100 ensemble members, with lower error as an inverse function of the amount of chaos present in the dynamical system. Encouraging results in removing observational noise and interpolating state trajectories were obtained. They represent an improvement with respect to standard EnKF based on less than 20 ensembles. INNS Conference on Big Data /

67 Applications to Meteorology Quasi-Geostrophic Model Conclusions II Future Work We are currently applying these techniques to financial and petroleum engineering problems with the same type of multi-dimensional time series. We are developing approaches that identify independent factors influencing the shape of multi-dimensional time series in a nonlinear fashion. The same tools will be used for the prediction of rare events and their magnitude. INNS Conference on Big Data /

68 Questions? INNS Conference on Big Data /

69 For Further Reading R. C. Gilbert, M. B. Richman, L. M. Leslie, and T. B. Trafalis. Kernel Methods for Data Driven Numerical Modeling. Monthly Weather Review, submitted, E. N. Lorenz and K. A. Emanuel. Optimal sites for supplementary weather observations: simulation with a small model. Journal of the Atmospheric Sciences, 55(3): , P. Sarma and W. H. Chen. Generalization of the Ensemble Kalman Filter Using Kernels for Nongaussian Random Fields. In SPE Reservoir Simulation Symposium Proceedings, Society of Petroleum Engineers. F. Steinke and B. Schölkopf. Kernels, regularization and differential equations. Pattern Recognition, 41(11): , J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge University Press, Cambridge, UK, V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, NY, USA, INNS Conference on Big Data 2015 /

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

NEURAL NETWORKS A Comprehensive Foundation

NEURAL NETWORKS A Comprehensive Foundation NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments

More information

Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Support Vector Machine. Tutorial. (and Statistical Learning Theory) Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Using Random Forest to Learn Imbalanced Data

Using Random Forest to Learn Imbalanced Data Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

A Learning Algorithm For Neural Network Ensembles

A Learning Algorithm For Neural Network Ensembles A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República

More information

THE SVM APPROACH FOR BOX JENKINS MODELS

THE SVM APPROACH FOR BOX JENKINS MODELS REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 23 36 THE SVM APPROACH FOR BOX JENKINS MODELS Authors: Saeid Amiri Dep. of Energy and Technology, Swedish Univ. of Agriculture Sciences, P.O.Box

More information

Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Decision Support Systems

Decision Support Systems Decision Support Systems 50 (2011) 602 613 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Data mining for credit card fraud: A comparative

More information

Adaptive and Online One-Class Support Vector Machine-based Outlier Detection Techniques for Wireless Sensor Networks

Adaptive and Online One-Class Support Vector Machine-based Outlier Detection Techniques for Wireless Sensor Networks 2009 International Conference on Advanced Information Networking and Applications Workshops Adaptive and Online One-Class Support Vector Machine-based Outlier Detection Techniques for Wireless Sensor Networks

More information

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br

More information

ClusterOSS: a new undersampling method for imbalanced learning

ClusterOSS: a new undersampling method for imbalanced learning 1 ClusterOSS: a new undersampling method for imbalanced learning Victor H Barella, Eduardo P Costa, and André C P L F Carvalho, Abstract A dataset is said to be imbalanced when its classes are disproportionately

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Using artificial intelligence for data reduction in mechanical engineering

Using artificial intelligence for data reduction in mechanical engineering Using artificial intelligence for data reduction in mechanical engineering L. Mdlazi 1, C.J. Stander 1, P.S. Heyns 1, T. Marwala 2 1 Dynamic Systems Group Department of Mechanical and Aeronautical Engineering,

More information

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

Online Classification on a Budget

Online Classification on a Budget Online Classification on a Budget Koby Crammer Computer Sci. & Eng. Hebrew University Jerusalem 91904, Israel kobics@cs.huji.ac.il Jaz Kandola Royal Holloway, University of London Egham, UK jaz@cs.rhul.ac.uk

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Evaluation & Validation: Credibility: Evaluating what has been learned

Evaluation & Validation: Credibility: Evaluating what has been learned Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer

More information

Support Vector Machines

Support Vector Machines CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)

More information

Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

More information

Numerical Analysis An Introduction

Numerical Analysis An Introduction Walter Gautschi Numerical Analysis An Introduction 1997 Birkhauser Boston Basel Berlin CONTENTS PREFACE xi CHAPTER 0. PROLOGUE 1 0.1. Overview 1 0.2. Numerical analysis software 3 0.3. Textbooks and monographs

More information

1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification

1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification 1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006 Principal Components Null Space Analysis for Image and Video Classification Namrata Vaswani, Member, IEEE, and Rama Chellappa, Fellow,

More information

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

SVM Ensemble Model for Investment Prediction

SVM Ensemble Model for Investment Prediction 19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of

More information

Analyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ

Analyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ Analyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ David Cieslak and Nitesh Chawla University of Notre Dame, Notre Dame IN 46556, USA {dcieslak,nchawla}@cse.nd.edu

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

Random Forest Based Imbalanced Data Cleaning and Classification

Random Forest Based Imbalanced Data Cleaning and Classification Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem

More information

Visualization by Linear Projections as Information Retrieval

Visualization by Linear Projections as Information Retrieval Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland jaakko.peltonen@tkk.fi

More information

1. Classification problems

1. Classification problems Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin

More information

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

More information

Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center

More information

A Survey on Pre-processing and Post-processing Techniques in Data Mining

A Survey on Pre-processing and Post-processing Techniques in Data Mining , pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,

More information

SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH

SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH 1 SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH Y, HONG, N. GAUTAM, S. R. T. KUMARA, A. SURANA, H. GUPTA, S. LEE, V. NARAYANAN, H. THADAKAMALLA The Dept. of Industrial Engineering,

More information

Machine Learning in FX Carry Basket Prediction

Machine Learning in FX Carry Basket Prediction Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines

More information

Statistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit

Statistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit Statistics in Retail Finance Chapter 7: Fraud Detection in Retail Credit 1 Overview > Detection of fraud remains an important issue in retail credit. Methods similar to scorecard development may be employed,

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

4.1 Learning algorithms for neural networks

4.1 Learning algorithms for neural networks 4 Perceptron Learning 4.1 Learning algorithms for neural networks In the two preceding chapters we discussed two closely related models, McCulloch Pitts units and perceptrons, but the question of how to

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

STock prices on electronic exchanges are determined at each tick by a matching algorithm which matches buyers with

STock prices on electronic exchanges are determined at each tick by a matching algorithm which matches buyers with JOURNAL OF STELLAR MS&E 444 REPORTS 1 Adaptive Strategies for High Frequency Trading Erik Anderson and Paul Merolla and Alexis Pribula STock prices on electronic exchanges are determined at each tick by

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information