GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM



Similar documents
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

An Integrated Semantically Correct 2.5D Object Oriented TIN. Andreas Koch

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Economic Interpretation of Regression. Theory and Applications

Support Vector Machines

L10: Linear discriminants analysis

n + d + q = 24 and.05n +.1d +.25q = 2 { n + d + q = 24 (3) n + 2d + 5q = 40 (2)

Brigid Mullany, Ph.D University of North Carolina, Charlotte

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Ring structure of splines on triangulations

Efficient Project Portfolio as a tool for Enterprise Risk Management

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Logistic Regression. Steve Kroon

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

What is Candidate Sampling

Time Domain simulation of PD Propagation in XLPE Cables Considering Frequency Dependent Parameters

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Analysis of Premium Liabilities for Australian Lines of Business

7.5. Present Value of an Annuity. Investigate

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Damage detection in composite laminates using coin-tap method

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

CHAPTER 14 MORE ABOUT REGRESSION

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Inter-Ing INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, November 2007.

Extending Probabilistic Dynamic Epistemic Logic

1. Measuring association using correlation and regression

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

1 Example 1: Axis-aligned rectangles

Abstract. Dublin City University

Single and multiple stage classifiers implementing logistic discrimination

Calendar Corrected Chaotic Forecast of Financial Time Series

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

BERNSTEIN POLYNOMIALS

Statistical Approach for Offline Handwritten Signature Verification

The Effect of Mean Stress on Damage Predictions for Spectral Loading of Fiberglass Composite Coupons 1

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

An Algorithm for Data-Driven Bandwidth Selection

Recurrence. 1 Definitions and main statements

A Multi-Camera System on PC-Cluster for Real-time 3-D Tracking

Eliminating Conditionally Independent Sets in Factor Graphs: A Unifying Perspective based on Smart Factors

Point cloud to point cloud rigid transformations. Minimizing Rigid Registration Errors

STATISTICAL DATA ANALYSIS IN EXCEL

Capturing Dynamics in the Power Grid: Formulation of Dynamic State Estimation through Data Assimilation

Lecture 5,6 Linear Methods for Classification. Summary

Mixtures of Factor Analyzers with Common Factor Loadings for the Clustering and Visualisation of High-Dimensional Data

Least Squares Fitting of Data

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Traffic State Estimation in the Traffic Management Center of Berlin

Autonomous Navigation and Map building Using Laser Range Sensors in Outdoor Applications

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

The OC Curve of Attribute Acceptance Plans

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Application of Quasi Monte Carlo methods and Global Sensitivity Analysis in finance

Regression Models for a Binary Response Using EXCEL and JMP

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Calculating the high frequency transmission line parameters of power cables

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

The Distribution of Eigenvalues of Covariance Matrices of Residuals in Analysis of Variance

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Bayesian Cluster Ensembles

Actuator forces in CFD: RANS and LES modeling in OpenFOAM

ECONOMICS OF PLANT ENERGY SAVINGS PROJECTS IN A CHANGING MARKET Douglas C White Emerson Process Management

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Support vector domain description

Statistical Methods to Develop Rating Models

Portfolio Loss Distribution

Estimation of Dispersion Parameters in GLMs with and without Random Effects

Abstract. 260 Business Intelligence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING

Although ordinary least-squares (OLS) regression

How To Calculate The Accountng Perod Of Nequalty

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

Lecture 2 Sequence Alignment. Burr Settles IBS Summer Research Program 2008 bsettles@cs.wisc.edu

Rotation Kinematics, Moment of Inertia, and Torque

Data Visualization by Pairwise Distortion Minimization

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

Software Alignment for Tracking Detectors

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Credit Limit Optimization (CLO) for Credit Cards

Transcription:

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM BARRIOT Jean-Perre, SARRAILH Mchel BGI/CNES 18.av.E.Beln 31401 TOULOUSE Cedex 4 (France) Emal: jean-perre.barrot@cnes.fr 1/Introducton The Bureau Gravmétrque Internatonal s managng a worldwde gravty database. These data have dfferent orgns and must be controlled to detect and elmnate outlers. Up to now, we used a predcton technque based on the L 2 -norm (collocaton) method. We have developed a new method, usng the L 1 -norm. We present here shortly the outlnes of ths method, and compare t for dfferent test cases wth the L 2 -method. 2/ Theory of the L 1 predcton method Self-valdaton s the detecton of outlers n a survey from the cross-comparson of all the values of the survey. g1 g 2 Let g the N - vector of the set of observed gravty values over a survey: g =. g N The N -vector g of the "true" (unknown) values s related to the N -vector g of observed values by I g = g +ε (1), where I N s the dentty matrx of order N and ε s the N -vector of errors. N In a perfect world, ε = 0 and then g= g. In an mperfect (our) world, ε 0. We have then to solve Eq. (1) contamnated by errors. L 2 - norm soluton: ε and g are consdered as random varables wth a pror 0 means and respectvely σ 2 ε I N and Cov( g) covarances. The L 2 a posteror estmate of g as then 1 * g = Cov( g) 2 Cov( g) + σ I ε N g mean

1 1 and σ ε + 2 IN Cov g covarance. Ths s the usual least-squares collocaton soluton. L 1 - norm soluton: From a L 1 -norm pont of vew, we select the partcular g (), whch realzes mn g () ε j= M l= N j = 1 l = 1 () ( j g ) l gl over a set of M realzatons of the N -vectors g and ε, wth () () = g +ε. Of course, n the real world, we have to cope wth a unque realzaton of g and (and we know only ther sum g ), so we 2-1/ select from g () observed M subvectors γ ( = 1,..., M) of dmenson K, 2-2/ complete through a gven nterpolaton-extrapolaton procedure the mssng N K values n order to get M vectors Γ () of dmenson N, 2-3/ select the best estmate Γ () of g as the one whch realzes mn N l = 1 () l Γ g l. Fg. 1: Fttng a lne through 3 data ponts. The L 2 soluton (a) goes through the 3 data ponts( x, y) by realzng mn ( y ( ax+ b)) 2. For the L 1 soluton, the soluton fulflls mn y ( ax+ b), and ab, corresponds to one of the lnes (b1, b2, b3) that jons the 3 two ponts subsets. For L 1 norm, there s no equvalent of covarance matrces, so f we want to have some ndcaton about the robustness of the soluton, we can only construct Monte-Carlo estmates of the errors ab,

by addng to the observed g values a random vector ζ of 0 nfer from ths perturbaton the correspondng mean and varance of Γ 3/ L 1 predcton method algorthm mean and known σ ζ 2 varance and (). For a gven gravty staton where we have to predct the gravty value: 3-1: search of all the neghbourng ponts, up to a gven radus; 3-2: determnaton of the "best" plane (Fg. 2) or parabolod (Fg. 3) approxmaton of the local gravty around the staton, by usng the gravty values of a subset of selected neghbours, n the sense of the L 1 norm. The gravty value at the predcted pont s excluded. As we consder only a lmted number of neghbourng ponts, we study all the subsets of neghbours (subsets of 3 ponts for the "best" plane, subsets of 6 ponts for the "best" parabolod), nstead of consderng the smplex method; 3-3: computaton of the dfference between observed and predcted anomaly, nterpolated from the "best" L 1 -surface, at the locaton of the predcted value; 3-4: comparson wth a gven threshold; 3-5: rejecton or valdaton of the gravty value. 4/ Pros and cons of the L 1 norm method 4-1 Pro: no "contamnaton" of the neghbourng ponts by "bad" ponts (.e. a "good" pont can be flagged as false, f compared to erroneous ("bad") neghbourng ponts); 4-2 Pro: no need to use resdual anomales; 4-3 Con: systematc rejecton of extrema; 4-4 Con: rejecton of ponts near the edges of the map (only wth parabolod predcton); 4-5 Con: rejecton only based on a threshold on the dfference between observed and predcted anomaly; 4-6 Con: no error estmate of the predcted anomaly. 5/ Pros and cons of the L 2 norm method 5-1 Pro: rejecton based on thresholds for the dfference between observed and predcted anomaly and for the standard devaton error of the predcted anomaly; 5-2 Con: robustless soluton: a "good" pont can be flagged as "false", f compared to "bad" neghbourng ponts (see 4-1); 5-3 Con: need of computng resdual anomales before predcton; 5-4 Con: rejecton of extrema.

Fg. 2: Dark trangle: «best» approxmatng plane gong through the neghbourng gravty values. Black bar: dfference between the observed and the predcted anomaly on the selected pont (dot). Fg. 3: Lght grey: «best» approxmatng parabolod gong through the neghbourng gravty values. Black bar: dfference between the observed and the predcted anomaly on the selected pont (dot). 6/ Future mprovements of the L 1 method 6-1: estmatng of the error on the predcted anomaly by Monte-Carlo method (see pont 2); 6-2: Replacng planar or parabolodal approxmaton by collocaton predcton, to take nto account the covarance functon of the anomales. Ths wll realze a "mx" between L 1 and L 2 methods.

7/ Example of data valdaton Fg. 4: Bouguer anomaly map: good ponts (cross marker), doubtful ponts (crcle marker) are dentfed by predcton usng a collocaton technque, takng nto account the local covarance functon. Fg. 5: Wth the collocaton technque, "bad" ponts can "contamnate" neghbourng ponts. Such ponts must be repredcted, after flaggng of the erroneous ("bad") ponts wth the largest dfferences between observed and predcted anomaly (see ponts 4-1 and 5-2).

Fg. 6: Predcton usng L 1 norm and plane approxmaton (Fg. 2). Seven neghbourng ponts are selected per predcted pont. The ponts predcted are consdered doubtful f the dfference between the observed anomaly and the predcted one s larger than 7 mgals. Fg. 7: Predcton usng L 1 norm and parabolod approxmaton (Fg. 3). Ten neghbourng ponts are selected per predcted pont. The ponts predcted are consdered doubtful f the dfference between the observed anomaly and the predcted one s larger than 7 mgals.