GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM BARRIOT Jean-Perre, SARRAILH Mchel BGI/CNES 18.av.E.Beln 31401 TOULOUSE Cedex 4 (France) Emal: jean-perre.barrot@cnes.fr 1/Introducton The Bureau Gravmétrque Internatonal s managng a worldwde gravty database. These data have dfferent orgns and must be controlled to detect and elmnate outlers. Up to now, we used a predcton technque based on the L 2 -norm (collocaton) method. We have developed a new method, usng the L 1 -norm. We present here shortly the outlnes of ths method, and compare t for dfferent test cases wth the L 2 -method. 2/ Theory of the L 1 predcton method Self-valdaton s the detecton of outlers n a survey from the cross-comparson of all the values of the survey. g1 g 2 Let g the N - vector of the set of observed gravty values over a survey: g =. g N The N -vector g of the "true" (unknown) values s related to the N -vector g of observed values by I g = g +ε (1), where I N s the dentty matrx of order N and ε s the N -vector of errors. N In a perfect world, ε = 0 and then g= g. In an mperfect (our) world, ε 0. We have then to solve Eq. (1) contamnated by errors. L 2 - norm soluton: ε and g are consdered as random varables wth a pror 0 means and respectvely σ 2 ε I N and Cov( g) covarances. The L 2 a posteror estmate of g as then 1 * g = Cov( g) 2 Cov( g) + σ I ε N g mean
1 1 and σ ε + 2 IN Cov g covarance. Ths s the usual least-squares collocaton soluton. L 1 - norm soluton: From a L 1 -norm pont of vew, we select the partcular g (), whch realzes mn g () ε j= M l= N j = 1 l = 1 () ( j g ) l gl over a set of M realzatons of the N -vectors g and ε, wth () () = g +ε. Of course, n the real world, we have to cope wth a unque realzaton of g and (and we know only ther sum g ), so we 2-1/ select from g () observed M subvectors γ ( = 1,..., M) of dmenson K, 2-2/ complete through a gven nterpolaton-extrapolaton procedure the mssng N K values n order to get M vectors Γ () of dmenson N, 2-3/ select the best estmate Γ () of g as the one whch realzes mn N l = 1 () l Γ g l. Fg. 1: Fttng a lne through 3 data ponts. The L 2 soluton (a) goes through the 3 data ponts( x, y) by realzng mn ( y ( ax+ b)) 2. For the L 1 soluton, the soluton fulflls mn y ( ax+ b), and ab, corresponds to one of the lnes (b1, b2, b3) that jons the 3 two ponts subsets. For L 1 norm, there s no equvalent of covarance matrces, so f we want to have some ndcaton about the robustness of the soluton, we can only construct Monte-Carlo estmates of the errors ab,
by addng to the observed g values a random vector ζ of 0 nfer from ths perturbaton the correspondng mean and varance of Γ 3/ L 1 predcton method algorthm mean and known σ ζ 2 varance and (). For a gven gravty staton where we have to predct the gravty value: 3-1: search of all the neghbourng ponts, up to a gven radus; 3-2: determnaton of the "best" plane (Fg. 2) or parabolod (Fg. 3) approxmaton of the local gravty around the staton, by usng the gravty values of a subset of selected neghbours, n the sense of the L 1 norm. The gravty value at the predcted pont s excluded. As we consder only a lmted number of neghbourng ponts, we study all the subsets of neghbours (subsets of 3 ponts for the "best" plane, subsets of 6 ponts for the "best" parabolod), nstead of consderng the smplex method; 3-3: computaton of the dfference between observed and predcted anomaly, nterpolated from the "best" L 1 -surface, at the locaton of the predcted value; 3-4: comparson wth a gven threshold; 3-5: rejecton or valdaton of the gravty value. 4/ Pros and cons of the L 1 norm method 4-1 Pro: no "contamnaton" of the neghbourng ponts by "bad" ponts (.e. a "good" pont can be flagged as false, f compared to erroneous ("bad") neghbourng ponts); 4-2 Pro: no need to use resdual anomales; 4-3 Con: systematc rejecton of extrema; 4-4 Con: rejecton of ponts near the edges of the map (only wth parabolod predcton); 4-5 Con: rejecton only based on a threshold on the dfference between observed and predcted anomaly; 4-6 Con: no error estmate of the predcted anomaly. 5/ Pros and cons of the L 2 norm method 5-1 Pro: rejecton based on thresholds for the dfference between observed and predcted anomaly and for the standard devaton error of the predcted anomaly; 5-2 Con: robustless soluton: a "good" pont can be flagged as "false", f compared to "bad" neghbourng ponts (see 4-1); 5-3 Con: need of computng resdual anomales before predcton; 5-4 Con: rejecton of extrema.
Fg. 2: Dark trangle: «best» approxmatng plane gong through the neghbourng gravty values. Black bar: dfference between the observed and the predcted anomaly on the selected pont (dot). Fg. 3: Lght grey: «best» approxmatng parabolod gong through the neghbourng gravty values. Black bar: dfference between the observed and the predcted anomaly on the selected pont (dot). 6/ Future mprovements of the L 1 method 6-1: estmatng of the error on the predcted anomaly by Monte-Carlo method (see pont 2); 6-2: Replacng planar or parabolodal approxmaton by collocaton predcton, to take nto account the covarance functon of the anomales. Ths wll realze a "mx" between L 1 and L 2 methods.
7/ Example of data valdaton Fg. 4: Bouguer anomaly map: good ponts (cross marker), doubtful ponts (crcle marker) are dentfed by predcton usng a collocaton technque, takng nto account the local covarance functon. Fg. 5: Wth the collocaton technque, "bad" ponts can "contamnate" neghbourng ponts. Such ponts must be repredcted, after flaggng of the erroneous ("bad") ponts wth the largest dfferences between observed and predcted anomaly (see ponts 4-1 and 5-2).
Fg. 6: Predcton usng L 1 norm and plane approxmaton (Fg. 2). Seven neghbourng ponts are selected per predcted pont. The ponts predcted are consdered doubtful f the dfference between the observed anomaly and the predcted one s larger than 7 mgals. Fg. 7: Predcton usng L 1 norm and parabolod approxmaton (Fg. 3). Ten neghbourng ponts are selected per predcted pont. The ponts predcted are consdered doubtful f the dfference between the observed anomaly and the predcted one s larger than 7 mgals.