Size: px
Start display at page:



1 Setion on Survey Researh Methods JSM 008 RELEASING MICRODATA: DISCLOSURE RISK ESTIMATION, DATA MASKING AND ASSESSING UTILITY Natalie Shlomo 1 1 Southampton Statistial Sienes Researh Institute, University of Southampton, SO17 1BJ, UK N.Shlomo@soton.a.u Abstrat Statistial Agenies need to mae informed deisions when releasing sample mirodata from soial surveys with respet to the level of protetion required in the data aording to the mode of aess. These deisions should be based on objetive quantitative measures of dislosure ris and data utility. We assume mirodata that ontain individuals investigated in a survey and the population is unnown. Dislosure ris is a funtion of both the population and the sample ounts in ells of a ontingeny table spanned by identifying disrete ey variables, i.e. plae of residene, sex, age, oupation, et. Dislosure ris measures are estimated using probabilisti modeling. Based on the dislosure ris assessment, appropriate Statistial Dislosure Limitation (SDL) methods are hosen depending on the aess mode, user requirements and the ontents of the data. Information loss measures are defined to quantify the effets of SDL methods on statistial analysis. We demonstrate a Dislosure Ris-Data Utility assessment on a sample drawn from a Census where the population is nown and an be used to validate proedures. Key Words: Log-linear models; Goodness of fit; Measurement error; Additive noise; Miroaggregation; Random rounding; PRAM; Information loss 1. Introdution Statistial Agenies release sample mirodata from soial surveys under different modes of aess. Aess methods range from Publi Use Files (PUF) in the form of tables or highly perturbed datasets to Mirodata Under Contrat (MUC) for researhers and liensed institutions where levels of protetion are less severe. Statistial Agenies also often have on-site datalabs where registered researhers an aess unperturbed statistial data. Mirodata Review Panels (MRP) need to mae informed deisions when releasing mirodata based on objetive dislosure ris measures, and set tolerable ris thresholds aording to the aess mode. They also provide quality guidelines and initial rules for data masing based on reoding variables. We assume that the mirodata ontain individuals investigated in a survey and the population is unnown (or only partially nown through some marginal distributions). The dislosure ris is a funtion of both the population and the sample, and in partiular the ell ounts of a ontingeny table defined by ombinations of identifying disrete ey variables, i.e. plae of residene, sex, age, oupation, et. Using probabilisti models, we estimate per-reord dislosure ris measures whih an be used to target high-ris reords for Statistial Dislosure Limitation (SDL) tehniques. Consistent global file-level dislosure ris measures are aggregated from per-reord ris measures. Global ris measures are used by MRPs to inform deisions on the release of mirodata. Setion provides an overview of dislosure ris assessment for sample mirodata using probabilisti modeling. Based on the dislosure ris assessment, Statistial Agenies must hoose appropriate SDL methods either by perturbing, modifying, or summarizing the data. The hoie of the SDL method depends on the aess mode, requirements of the users and the impat on quality and information loss. Choosing an optimal SDL method is an iterative proess where a balane must be found between managing dislosure ris and preserving the utility in the mirodata. SDL methods for mirodata inlude perturbative methods that alter the data and non-perturbative methods whih limit the amount of information released. Eah SDL method impats differently on information loss and they should be ombined and optimized to preserve the onsisteny and integrity of the perturbed mirodata. In Setion 3, we present improvements to some standard perturbative SDL methods for sample mirodata. In Setion 4, we define information loss measures to quantify the effets of SDL methods on bias and variane and other statistial analysis tools. In Setion 5, we demonstrate the Dislosure 1 9

2 Setion on Survey Researh Methods JSM 008 Ris-Data Utility assessment on sample data drawn from a Census (where the population is nown). Setion 6 onludes with a disussion.. Dislosure Ris Assessment Identifying ey variables for dislosure ris assessment are determined by a dislosure ris senario, i.e. assumptions about available external files and IT tools that an be used by intruders to identify individuals in released mirodata. For example, ey variables may be hosen whih allow lining the released mirodata to a publily available file ontaining names and addresses. Dislosure ris is assessed on the ontingeny table of ounts spanned by these identifying ey variables. The other variables in the file are sensitive variables. Some methods for assessing dislosure ris rely on heuristis to identify speial uniques on a set of ross-lassified ey variables, i.e. sample uniques that are liely to be population uniques (see Elliot, et al., 005, Sinner and Elliot, 00 and referenes therein) and probabilisti reord linage (see Yaney, Winler, and Creey, 00, Domingo-Ferrer and Torra, 003 and referenes therein). A drawba of these methods is that they do not tae into aount the protetion afforded by the sampling and inonsistent reord level and global level dislosure ris measures. We assess dislosure ris using probabilisti modeling. We onsider individual per-reord ris measures in the form of a probability of re-identifiation. These per-reord ris measures are aggregated to obtain global ris measures for the entire file. Let F be the population size in ell of a table spanned by ey variables having K ells and Also, K F = =1 = : f N { = 1} K and f = n = 1 f the sample size.. We fous our attention on the set of sample uniques, SU sine these are potential high-ris reords, i.e. population uniques. Two global dislosure ris measures (where I is the indiator funtion) are the following: 1. Number of sample uniques that are population uniques: τ = I ( f = 1, F = 1). Expeted number of orret mathes for sample uniques (i.e., a mathing probability) τ = I ( f = 1)1 / F. The individual ris measure for τ is1 / F. This is the probability that a math between a reord in the mirodata and a reord in the population having the same values of ey variables will be orret. If for example, there are two reords in the population with the same values of ey variables, the probability is 0.5 that the math will be orret. Adding up these probabilities over the sample uniques gives the expeted number (on average) of orretly mathing a reord in the mirodata to the population when we allow guessing. We assume that population frequenies F are unnown and estimate from a probabilisti model the ris measures by: ˆ τ I( f = 1) Eˆ(1/ F f = 1 (1) = ˆ1 τ I( f = 1) Pˆ( F = 1 f = 1) and ) 1 = Sinner and Holmes (1998) and Elamir and Sinner (006) propose a Poisson Model to estimate dislosure ris measures. In this model, they assume the natural assumption in ontingeny table literature: F Poisson( λ ) for eah ell. A sample is drawn by Poisson or Bernoulli sampling ~ with a sampling fration where f π in ell : f F ~ Bin( F, π ). It follows that: Pois( π λ ) and F λ )) () ~ F f are onditionally independent. f ~ Poisson( (1 π The parameters { λ } are estimated using log-linear modeling. The sample frequenies f are independent Poisson distributed with a mean of µ = π λ. A log-linear model for the µ is expressed as: log( µ ) = x β where x is a design vetor whih denotes the main effets and 30

3 Setion on Survey Researh Methods JSM 008 interations of the model for the ey variables. The maximum lielihood (MLE) estimator ˆβ may be obtained by solving the sore equations: f π (3) [ exp( x β )] x = 0 The fitted values are alulated by: = exp( β) and λ u / π Individual dislosure ris measures for ell are: P F = f = =, Plugging uˆ x ˆ ˆ = ˆ. ( 1 1) exp( λ (1 π )) E( 1/ F f 1) = [1 exp( λ (1 π ))]/[ λ (1 π )] λˆ for λ in (4) leads to the estimates P ˆ ( F = 1 = f 1) and ˆ [1/ F = f 1] ˆ ˆ = (4) E and then to τ 1 and τ of (1). Rinott and Shlomo (007b) onsider onfidene intervals for these global ris measures. Sinner and Shlomo (008) develop a method for seleting the log-linear model based on estimating and (approximately) minimizing the bias of the ris estimates ˆ τ 1 and ˆ τ. Defining h( λ ) = P( F = 1 f = 1) for τ 1 and h( λ ) = E(1/ F f = 1) for τ, they onsider the expression: B = E[ I ( f = 1)][ h( ˆ λ ) h( λ )] A Taylor expansion of h leads to the approximation B h ˆ and the relations π λ exp( λ )[ Ef π λ = and '( λ )( λ λ ) + h' '( λ )( ˆ λ λ ) / ] E[( f ˆ ˆ π λ ) f ] = π E( λ λ ) under the hypothesis of a Poisson fit lead to a further approximation of B of the form ˆ ˆ ˆ )[ '( ˆ )( ˆ ) ''( ˆ )[( ˆ B λ exp( π λ h λ f π λ + h λ f π λ ) f ]/(π )] (5) The method selets the model using a forward searh algorithm whih minimizes the standardized bias estimate B ˆ i / vˆ i for ˆ τ, i = 1, i where νˆ i are variane estimates of Bˆ i. In the simple ase where the sample is drawn under simple random sampling, π = π = n N. / Sinner and Shlomo (008) address the estimation of dislosure ris measures under omplex survey designs with stratifiation, lustering and survey weights. In this ase, while the method assumes that all individuals within ell are seleted independently using Bernoulli sampling, i.e. P( f 1 F ) F π (1 π ) F 1 = =, this may not be the ase when sampling lusters (households). In pratie, ey variables typially inlude variables suh as age, sex and oupation, and tend to ut aross lusters. Therefore the above assumption holds in pratie in most household surveys and does not ause bias in the estimation of the ris measures. Inlusion probabilities may vary aross strata, the most ommon stratifiation is on geographies. Strata indiators should always be inluded in the ey variables to tae into aount differential inlusion probabilities. Under omplex sampling, the λ } an be estimated onsistently using pseudo-maximum lielihood estimation (Rao and Thomas, 003), where the estimating equation in (3) is modified as: and [ Fˆ exp( x β )] x = 0 (6) Fˆ is obtained by summing the survey weights in ell : ˆ Fˆ =. w i i The resulting estimates { λ } are plugged into expressions in (4) and π is replaed by the estimate ˆ π = f / Fˆ. Note that the ris measures in (4) only depend on sample uniques and the value of ˆ π in this ase is simply the reiproal of the survey weight. The test riteria ˆB is also adapted to the pseudo-maximum lielihood method. { 3 31

4 Setion on Survey Researh Methods JSM 008 The probabilisti model presented as well as other probabilisti methods (see Bethlehem, Keller, and Panneoe, 1990, Benedetti, Capobianhi, and Franoni, 1998, Rinott and Shlomo 006, 007a) assume that there is no measurement error in the way the data is reorded. Besides typial errors in data apture, ey variables an also purposely be mislassified as a means of masing the data, for example through reord swapping or PRAM. Sinner and Shlomo (007) adapt the estimation of ris measures to tae into aount measurement errors. Denote the ross-lassified ey variables by X and assume that X in the mirodata has undergone some mislassifiation or perturbation error denoted by the value X %. Assume that the values of X in the population are fixed and suppose the values of X % for the reords in the mirodata are determined independently by a mislassifiation matrix M, ~ M j = P( X = X = j) (7) The per-reord dislosure ris measure of a math with a sample unique under measurement error is: j M (1 πm ) 1 F M /(1 πm ) F j j j Under assumptions of small sampling frations and small mislassifiation errors, the measure an be approximated by: M / F M or M / F ~ where F ~ ~ is the population ount with X =. j j j Aggregating the per-reord dislosure ris measures, the global ris measure is: ~ τ I( f 1) M / F (9) = = Note that to alulate the measure only the diagonal of the mislassifiation matrix needs to be nown, i.e. the probabilities of not being perturbed. Population ounts are generally not nown so the estimate in (9) an be obtained by probabilisti modeling on the mislassified sample: ~ ~ ~ τ I( f 1) M Eˆ 1/ F f (10) ˆ = = ( ) 3. Statistial Dislosure Limitation Methods for Sample Mirodata Depending on the outome of the individual and global ris measures, SDL methods may need to be applied. Thresholds are set for releasing the mirodata depending on the mode of aess. SDL tehniques for mirodata inlude perturbative methods whih alter the data and non-perturbative methods whih limit the amount of information released without atually altering the data. Examples of non-perturbative SDC tehniques are global reoding, suppression of values or variables and subsampling reords (see Willenborg and De Waal, 001). Perturbative methods for ontinuous variables inlude adding random noise (Kim, 1986, Fuller, 1993, Brand, 00, Yaney, Winler and Creey, 00), miro-aggregation (replaing values with their average within groups of reords) (Defays and Nanopoulos, 199, Anwar 1993, Domingo-Ferrer and Mateo-Sanz, 00), rounding to a pre-seleted rounding base, and ran swapping (swapping values between pairs of reords within small groups) (Dalenius and Reiss, 198, Fienberg and MIntyre, 005). Perturbative methods for ategorial variables inlude reord swapping (typially swapping geography variables) and a more general postrandomization probability mehanism (PRAM) where ategories of variables are hanged or not hanged aording to a presribed probability matrix and a stohasti seletion proess (Gouweleeuw, et al. 1998). For more information on these methods see also: Willenborg and De Waal, 001, Gomatam and Karr, 003, Domingo-Ferrer, Mateo-Sanz, and Torra, 001, and referenes therein. With non-perturbative SDL methods, the logial onsisteny of the reords remain unhanged. Perturbative methods, however, alter the data, and therefore we an expet onsistent reords to start failing edit rules due to the perturbation. Edit rules, or edits for short, desribe either logial relationships that have to hold true, suh as a two-year old person annot be married or the profit and the osts of an enterprise should sum up to its turnover, or relationships that have to hold true in most ases, suh as a 1-year old girl annot be a mother. Shlomo and De Waal, 008 disuss methods for perturbing sample mirodata whih preserve the logial onsistenies and minimize information loss. The following is a brief summary of some of the methods: (8) 3.1 Additive noise 4 3

5 Setion on Survey Researh Methods JSM 008 Additive noise is an SDL method that is arried out on ontinuous variables. In its basi form random noise is generated independently and identially distributed with a positive variane and a mean of zero. The random noise is then added to the inal variable. Adding random noise will not hange the mean of the variable for large datasets but will introdue more variane. This will impat on the ability to mae statistial inferenes. Researhers may have suitable methodology to orret for this type of measurement error but it is good pratie to minimize these errors through better implementation of the method. Additive noise should be generated within small homogenous sub-groups (for example, perentiles of the ontinuous variable) in order to use different initiating perturbation variane for eah sub-group. Generating noise in sub-groups also auses less edit failures with respet to relationships in the data. A better tehnique is to add orrelated random noise to the ontinuous variable thereby ensuring that not only means are preserved but also the exat variane. A simple method for generating orrelated random noise for a ontinuous variable z is as follows: Proedure 1 (univariate): Define a parameter δ whih taes a value greater than 0 and less than equal to 1. When δ 1 we obtain the ase of fully modeled syntheti data. The parameter δ ontrols = 1 δ the amount of random noise added to the variable z. After seleting a δ, alulate: d = (1 ) = δ and d. Now, generate random noise ε independently for eah reord with a mean of 1 d1 µ = µ and the inal variane of the variable σ. Typially, a Normal Distribution is used d to generate the random noise. Calulate the perturbed variable z i for eah reord i in the sample z = d z + d ε mirodata (i=1,..,n) as a linear ombination: i 1 i i. Note that 1 d1 E ( z ) = d1e( z) + d [ E( z)] = E( z) d and Var ( z ) = (1 ) Var( z) + δ Var( z) = Var( z) δ sine the random noise is generated independently to the inal variable z. An additional problem when adding random noise is that there may be several variables to perturb at one, and these variables may be onneted through an edit onstraint of additivity. If we were to perturb eah variable separately, this edit onstraint would not be guaranteed. One proedure to preserve additivity would be to perturb two of the variables and obtain the third from aggregating the perturbed variables. However, this method will not preserve the total, mean and variane of the aggregated variable and in general, it is not good pratie to ompound effets of perturbation (i.e., aggregate perturbed variables) sine this auses unneessary information loss. We propose Proedure 1 in a multivariate setting where we add orrelated noise to the variables simultaneously. The method not only preserves the means of eah of the three variables and their ovariane matrix, but also preserves the edit onstraint of additivity. Proedure 1 (multivariate): Consider three variables x, and z where x + y = z. This proedure generates random noise that a priori preserves additivity and therefore ombining the random noise to the inal variables will also ensure additivity. In addition, means and the ovariane struture are preserved. The tehnique is as follows: T Generate multivariate random noise: (,, ) ~ N, ), where the supersript T denotes the ε x ε y ε z transpose. In order to preserve sub-totals and limit the amount of noise, the random noise should be generated within perentiles (note that we drop the index for perentiles). The vetor µ ontains the orreted means of eah of the three variables x, y and z based on the noise parameter δ : T 1 d1 1 d1 1 d1 µ = (µ,µ,µ x y z ) = ( µ x, µ y, µ z ). The matrix Σ is the inal ovariane d d d matrix. For eah separate variable, alulate the linear ombination of the inal variable and the y (µ Σ 5 33

6 Setion on Survey Researh Methods JSM 008 random noise as previously desribed. For example, for reord i: z i = d1 zi + d ε zi. The mean vetor and the ovariane matrix remain the same before and after the perturbation, and the additivity is exatly preserved. 3. Miro-aggregation Miro-aggregation is another SDL tehnique for ontinuous variables. Reords are grouped together in small groupings of size p. For eah individual in a group, the value of the variable is replaed with the group average. This method an be arried out for both a univariate or multivariate setting where the latter an be implemented through sophistiated omputer algorithms. Replaing values of variables with their average in a small group will not generally initiate inonsistenies in the data, suh as the relationship between variables, although there may be problems at the boundaries of suh edits. When arrying out miro-aggregation simultaneously on several variables within a group, additivity onstraints will also be preserved sine the sum of the means of two variables will equal the mean of the total variable in a grouping. The fous therefore for minimizing information loss is on the preservation of varianes. Miro-aggregation preserves the mean (and the overall total) of a variable z but will lead to a derease in the variane. This is beause the total variane an be deomposed into a within group variane and a between group variane. When implementing miro-aggregation and replaing values by the average of their group, only the between variane remains. In pratie, there may be little derease in the variane sine the size of the groups is small. In order to minimize information loss due to a derease in the variane, we generate random noise aording to the magnitude of the differene between the total variane and the between variane, and add it to the miro-aggregated variable. Besides raising the variane ba to its inal level, this method will also result in extra protetion against the ris of re-identifiation sine miro-aggregation in some ases an easily be deiphered (see Winler, 00). The ombination of miro-aggregation and additive random noise is disussed in Oganian and Karr, 006. When adding random noise to several miro-aggregated variables that are onneted through an additivity onstraint, we an apply a straight-forward linear programming tehnique to preserve the additivity. 3.3 Unbiased Random Rounding Rounding to a predefined base is a form of adding noise, although in this ase the exat value of the noise is nown a priori and is ontrolled via the rounding base. As in miro-aggregation, it is unliely that inonsistenies will result when rounding the data. However, rounding ontinuous variables separately may ause additivity edit failures sine the sum of rounded variables will not neessarily equal their rounded total. In addition, summing rounded values will not equal their rounded total and large disrepanies an our. We demonstrate a method for preserving totals when arrying out an unbiased random rounding proedure on a ontinuous variable. Rounding proedures are relatively easy to implement. In this example, we desribe a one-dimensional random rounding proedure for a variable whih not only has the property that it is stohasti and unbiased, but also preserves the overall total (and hene the mean) of the variable being rounded. Moreover, the strategy that we propose redues the extra variane indued by the rounding. The algorithm is as follows: Let m be the value to be rounded and let Floor (m) be the largest multiple of the base b suh that b < m. In addition, define the residual of m aording to the rounding base b by res( m) = m Floor( m). For an unbiased random rounding proedure, m is rounded up to m Floor (m with probability ( Floor ( ) + b) with probability res ( m) b and rounded down to ) ( 1 res( m) b). If m is already a multiple of b, it remains unhanged. The expeted value of the rounded value is the inal value. The rounding is usually implemented with replaement in the sense that eah value is rounded independently, i.e. a random uniform number u between 0 and 1 is generated for eah value. If u < res( m) b then the entry is rounded up, otherwise it is rounded down. In order to preserve the exat total of the variable being rounded, we define a simple algorithm for seleting without replaement the values that are rounded up and the values that are rounded 6 34

7 Setion on Survey Researh Methods JSM 008 down: for those entries having res (m), randomly selet a fration of res ( m) b of the values and round upwards, the rest of the values round downwards. Repeat this proess for all res (m). As mentioned, similar to the ase of simple random sampling with and without replaement, this seletion strategy redues the additional variane aused by the rounding. The rounding proedure should be arried out within sub-groups in order to benhmar important totals. This may, however, distort the overall total aross the entire dataset. Users are typially more interested in smaller sub-groups for analysis and therefore preserving totals for sub-groups is generally more desirable than the overall total. Reshuffling algorithms an be applied for hanging the diretion of the rounding for some of the values aross the reords in order to preserve additivity onstraints and the overall totals. 3.4 Proteting Categorial Variables by PRAM As presented in Shlomo and De Waal (008), we examine the use of a tehnique alled the Postrandomization Method (PRAM) (Gouweleeuw, et al., 1998) to perturb ategorial variables. This an be seen as a general ase of a more ommon tehnique based on reord swapping. Willenborg and De Waal (001) desribe the proess as follows: Let P be a L L transition matrix ontaining onditional probabilities p ij = p( perturbed ategory is j inal ategory is i) for a ategorial variable with L ategories, t the vetor of frequenies and v the vetor of relative frequenies: v = t n, where n is the number of reords in the miro-data set. In eah reord of the data set, the ategory of the variable is hanged or not hanged aording to the presribed transition probabilities in the matrix P and the result of a draw of a random multinomial variate u with parameters p ij (j=1,,l). If the j-th ategory is * seleted, ategory i is moved to ategory j. When i = j, no hange ours. Let t be the vetor of the * perturbed frequenies. t is a random variable and E ( t * t) = tp. Assuming that the transition 1 probability matrix P has an inverse P, this an be used to obtain an unbiased moment estimator of * 1 the inal data: ˆ t = t P. In order to ensure that the transition probability matrix has an inverse and to ontrol the amount of perturbation, the matrix P is hosen to be dominant on the main diagonal, i.e. eah entry on the main diagonal is over 0.5. We an plae the ondition of invariane on the transition matrix P, i.e. tp = t. This releases the users of the perturbed file of the extra effort to obtain unbiased moment estimates of the inal data, * sine t itself will be an unbiased estimate of t. To obtain an invariant transition matrix, we alulate a matrix Q by transposing matrix P, multiplying eah olumn j by v and then normalizing its rows so that the sum of eah row equals one. The invariant matrix is obtained by R = PQ. The property of invariane means that the vetor of the inal frequenies v is an eigenvetor of R. The invariant matrix R may distort the desired probabilities on the diagonal, so we define a parameter α * * and alulate R = αr + (1 α ) I where I is the identity matrix. R will also be invariant and the amount of perturbation is ontrolled by the value of α. The property of invariane means that the expeted values of the marginal distribution of the variable being perturbed are preserved. In order to obtain the exat marginal distribution and redue the additional variane aused by the perturbation, we propose using a without replaement seletion strategy to hoose values to perturb based on the expetations alulated from the transition probabilities (see Setion III.C for the ase of random rounding). This method was used to perturb the Sample of Anonymized Reords (SARs) of the 001 UK Census (Gross, Guiblin and Merrett, 004). As in most perturbative SDL methods, joint distributions between perturbed and unperturbed variables are distorted, in partiular for variables that are highly orrelated with eah other. If no ontrols are taen into aount in the perturbation proess, edit failures may our resulting in inonsistent and silly ombinations. Controlling the perturbation an be arried as follows: 1. Before applying PRAM, the variable to be perturbed is divided into subgroups, g = 1,..., G. The transition (and invariant) probability matrix is developed for eah subgroup g, R g. The transition matries for eah subgroup are plaed on the main diagonal of the overall final transition matrix j 7 35

8 Setion on Survey Researh Methods JSM 008 where the off diagonal probabilities are all zero, i.e. the variable is only perturbed within the subgroup and the differene in the variable between the inal value and the perturbed value will not exeed a speified level. An example of this is perturbing age within broad age bands.. The variable to be perturbed may be highly orrelated with other variables. Those variables should be ompounded into one single variable. PRAM should be arried out on the ompounded variable. Alternatively, the variable to be perturbed is arried out within subgroups defined by the seond highly orrelated variable. An example of this is when age is perturbed within groupings defined by marital status. The ontrol variables in the perturbation proess will minimize the amount of edit failures, but they will not eliminate all edit failures, espeially edit failures that are out of sope of the variables that are being perturbed. Remaining edit failures need to be manually or automatially orreted through edit and imputation proesses depending on the amount and types of edit failures. 4. Information Loss Measures The utility of mirodata that has undergone SDL tehniques is based on whether statistial inferene an be arried out and the same analysis and onlusions drawn on the perturbed data ompared to the inal data. This depends on user requirements and the types of analysis. In general, mirodata is multi-purposed and used by many different users. Therefore, we use proxy measures to assess the utility based on assessing distortions to distributions and the impat on bias, variane and other statistial analysis tools (Chi-squared statisti, R goodness of fit, ranings, et.). Shlomo, 007 and Shlomo and Young, 006 desribe the use of suh measures for assessing information loss in perturbed statistial data. A brief summary of some useful proxy measures are the following: 4.1 Distane Metris Distane metris are used to measure distortions to distributions as a result of applying SDL methods. We apply these measures on distributions alulated from the perturbed mirodata. Some useful metris for aggregated data were presented in Gomatam and Karr, 003. Let D represent a frequeny distribution produed from the mirodata and let D () be the frequeny in ell. Two useful distane metris are: Average Absolute Distane per Cell: AAD ( D, D ) = D ( ) D ( ) / n (11) pert pert where n is the number of ells in the distribution Kolmogorov-Smirnov Two- Sample Test Statisti: For unweighted data, the empirial distribution of the inal values is defined as: D t) I( t) / n D pert (t where I is the indiator funtion. The = and similarly ) ( KS statisti is defined as: KS D, D ) = max( D ( t ) D ( t ) ) (1) ( pert pert j j j { t j values are the where the } n jointly ordered inal and perturbed values of D. The AAD is intuitive and desribes the average absolute differene per ell of the distribution. The KS two-sample test assumes independene of the two samples and therefore the test itself is invalid. However, it is still useful to use the KS statisti as a relevant distane metri.. 4. Impat on Measures of Assoiation Tests for independene are often arried out on joint frequeny distributions between ategorial variables that span a table alulated from the mirodata. The test for independene for a two-way table is based on a Pearson Chi-Squared Statisti ount and e independent then e ij ) = ( o ij χ where o ij is the observed e i j ij = n n ) n is the expeted ount for row i and olumn j. If the row and olumn are ij ( i.. j / χ has an asymptoti hi-square distribution with (R-1)(C-1)and for large values 8 36

9 Setion on Survey Researh Methods JSM 008 the test rejets the null hypothesis in favor of the alternative hypothesis of assoiation. We use the measure of assoiation, Cramer s V: CV = χ / n min( R 1),( C 1) measure by the perent relative differene between the inal and perturbed table: RCV ( D pert, D ) CV ( D pert ) CV ( D 100 CV ( D ) ) and define the information loss = (13) For multiple dimensions, log-linear modeling is often used to examine assoiations. A similar measure to (13) an be alulated taing the relative differene in the deviane obtained from a model based on the inal and perturbed mirodata. 4.3 Impat on a Regression Analysis For ontinuous variables, we assess the impat on the orrelation and in partiular the R of a regression (or ANOVA) analysis. For example, in an ANOVA, we test whether a ontinuous dependent variable has the same means within groupings defined by ategorial explanatory variables. The goodness of fit riterion R is based on a deomposition of the variane of the mean of the dependent variable. By perturbing the statistial data, the groupings may lose their homogeneity, the between variane beomes smaller, and the within variane beomes larger. In other words, the proportions within eah of the groupings shrin towards the overall mean. On the other hand, the between variane may beome artifiially larger showing more assoiation than in the inal distributions. We define information loss based on the between variane of a proportion: Let P () be a target proportion for a ell, i.e. P ( ) D D ( ) = and let ( ) P = D D ( ) ( ) be the overall 1 proportion. The between variane is defined as: BV ( P ) = ( P ( ) P n 1 and the information loss measure is: BV ( Ppert ) BV ( P BVR( Ppert, P ) 100 BV ( P ) ) = (14) In addition, we an assess the impat on the regression oeffiient for a model based on a ontinuous variable where the independent variable is also ontinuous and has undergone different methods of perturbation suh as additive noise, miro-aggregation and rounding. 3. Example We present an example of how a Statistial Ageny might assess dislosure limitation strategies through a dislosure ris-data utility analysis. We use a population from the 1995 Israel Census sample omposed of all individuals aged 15 and over living in 0% of the households in Israel at the time of the Census, N=753,711. We draw a π = 1/ 100 sample of individuals, n=7,537. The MRP needs to assess the dislosure ris and onsider SDL tehniques. Initial reoding of ey variables is arried out. The ey variables for assessing dislosure ris are the following: Loality Code (single odes for large loalities above 10,000 inhabitants and single ombined ode for smaller loalities) 85 ategories; Sex ategories; Age groups - 15 ategories; Oupation -11 ategories, Inome groups - 17 ategories (K=476,850). In addition to the initial ey, the MRP might onsider further perturbation of the geography variable. We examine tehniques: reoding and ollapsing the large loality odes aording to a larger geographial area and loality size (30 ategories) and invariant PRAM (a general ase of reord swapping) on the large loality odes with 0.70 on the diagonal of the mislassifiation matrix. Table 1 ) 9 37

10 Setion on Survey Researh Methods JSM 008 presents a omparison of these two tehniques. The true ris based on = I ( f = 1)1 / F τ are given in the olumn headings in parenthesis. The true dislosure ris for PRAM is alulated by summing 1/ F aross sample uniques that were not perturbed. The estimates ˆ τ in Table 1 are similar to the true values. The asymptotially normal test statisti based on (5) is given in parenthesis. Note that to estimate the dislosure ris for PRAM we used the formula in (10). The distane metris AAD and KS for the reoded loalities are alulated by imputing the average aross the reoded ells. For example, if 10 loalities were reoded into a single ell, eah loality would reeive 1/10 of the total in the ell. ˆ τ (test statisti) Sample uniques τ / SU ˆ Table 1: Comparison of SDL tehniques: Reoding and PRAM Original Key ( τ = 105.7) Dislosure Ris (1.94) % Reoded loalities (30 ategories) ( τ = 571.5) (1.3) % PRAM (70% on the diagonal) ( τ = 714.7) 79.5 (1.4) % Utility AAD aross 85 loalities KS aross 85 loalities RCV for loalities (85) oupation (11) (true=0.1370) BVR for average inome between loalities (85) (true=3.08*10 9 ) As an be seen in Table 1, reoding auses signifiantly more information loss ompared to PRAM, even with 30% of the loalities mislassified. The dislosure ris however is more effetively redued with reoding than with PRAM. The MRP might onsider reduing the dislosure ris further by ombining the tehniques, for example, by identifying those reords that remain unique after the reoding and implementing PRAM on the high-ris reords only. Note that both methods give negative values for RCV and BVR whih reflet a loss of assoiation and more heterogeneity as a result of the SDL tehniques. After deiding on ey variables, MRPs might onsider taing further ation by perturbing sensitive variables, suh as inome. In our example, inome was also used as a ey variable so dislosure ris would need to be reassessed if perturbation is arried out on the inome variable. We arried out three improved tehniques for perturbing inome from wages for those reords with non-zero inome (3,49 out of the 7,537 individuals in the sample): orrelated additive noise, ontrolled random rounding to base 10 and miro-aggregation (size of groups=10) with additive noise. All three tehniques preserve the mean and its variane of the inal variable. Results are given in Table. Table shows onfliting results for the two distane metris. While miro-aggregation with additive noise has more perturbation per ell ompared to the other methods, orrelated noise has more distane between the empirial distributions based on the inal and perturbed variable. The ontrolled random rounding has the smallest distane metris and not surprisingly the lowest amount of reords that swith out of their inal inome group. Table also shows that the utility in the data with respet to some ommon statistial analysis tools is preserved and this is due to the improvements in the implementation of the SDL tehniques whih aim to preserve suffiient statistis

11 Setion on Survey Researh Methods JSM 008 Table : Information loss measures for inome from wages after perturbation for individuals with non-zero inome Correlated Noise Controlled Random Rounding Base 10 Miro-aggregation and Additive Noise Utility AAD aross 17 inome groups KS aross 17 inome groups RCV for inome groups (17) oupation (11) (true=0.1736) BVR average inome between loalities (85) (true=3.08*10 9 ) Perentage of reords swithing inome groups % 0.8% 1.5% 4. Disussion In this paper, we fous on how a Statistial Ageny might arry out a dislosure ris-data utility analysis to inform deisions about the release of sample mirodata. The main onlusions of the paper are: (1) the need for a reliable method for objetively assessing dislosure ris; () SDL tehniques should be optimized and ombined to ensure utility in the perturbed mirodata. Statistial Agenies generally release same sets of mirodata on a yearly basis but the dislosure risdata utility analysis need not be repeated every year if no signifiant hanges are applied to the mirodata. Therefore, it is reommended that time and resoures be spent at least one on an in-depth analysis for ensuring high quality mirodata with tolerable ris thresholds for eah mode of aess. Distributing different sets of the same mirodata may be a ause for onern sine different versions of the mirodata an be lined and the inal data dislosed. MRPs must ensure strit liensing rules and guidelines to ensure that this does not our. In the future, it is liely that mirodata will be distributed via remote aess and Statistial Agenies will have more ontrol of who reeives the mirodata. Referenes Anwar, N. (1993). Miro-Aggregation The Small Aggregates Method. Informe Intern, Luxembourg, Eurostat. Benedetti, R., Capobianhi, A., and Franoni, L. (1998) Individual Ris of Dislosure Using Sampling Design. Contributi Istat. Bethlehem, J., Keller, W., and Panneoe, J. (1990) Dislosure limitation of Mirodata. Journal of the Amerian Statistial Assoiation 85, Brand, R. (00) Miro-data Protetion Through Noise Addition. In: Inferene Control in Statistial Databases (ed. J. Domingo-Ferrer), New Yor: Springer, Dalenius, T. and Reiss, S.P. (198) Data Swapping: A Tehnique for Dislosure limitation. Journal of Statistial Planning and Inferene, 7, Defays, D. and Nanopoulos, P. (199) Panels of Enterprises and Confidentiality: The Small Aggregates Method. Proeedings of Statistis Canada Symposium 9, Design and Analysis of Longitudinal Surveys, Domingo-Ferrer, J., Mateo-Sanz, J. and Torra, V. (001) Comparing SDC Methods for Miro-Data on the Basis of Information Loss and Dislosure Ris. ETK-NTTS Pre-Proeedings of the Conferene, Crete, June 001. Domingo-Ferrer, J. and Mateo-Sanz, J. (00) Pratial Data-Oriented Miro-aggregation for Statistial Dislosure limitation. IEEE Transations on Knowledge and Data Engineering, Vol. 14, Issue 1, Domingo-Ferrer, J. and Torra, V.(003) Dislosure Ris Assessment in Statistial Mirodata Protetion via Advaned Reord Linage, Statistis and Computing, Vol. 13, No. 4, Elamir, E. and Sinner, C.J. (006) Reord-Level Measures of Dislosure Ris for Survey Miro-data. Journal of Offiial Statistis,,

12 Setion on Survey Researh Methods JSM 008 Elliot, M., Manning, A., Mayes, K., Gurd J. and Bane, M. (005) SUDA: A Program for Deteting Speial Uniques, In: Proeedings of the Joint UNECE/Eurostat Wor Session on Statistial Data Confidentiality, Geneva, Fienberg, S.E. and MIntyre, J. (005) Data Swapping: Variations on a Theme by Dalenius and Reiss. Journal of Offiial Statistis, 9, Fuller, W. A. (1993) Masing Proedures for Miro-data Dislosure Limitation. Journal of Offiial Statistis, 9, Gomatam, S. and Karr, A. (003) Distortion Measures for Categorial Data Swapping. Tehnial Report Number 131, National Institute of Statistial Sienes. Gouweleeuw, J., Kooiman, P., Willenborg, L.C.R.J., and De Wolf, P.P. (1998) Post Randomisation for Statistial Dislosure limitation: Theory and Implementation. Journal of Offiial Statistis, 14, Gross, B., Guiblin, P. and Merrett, K. (004) Implementing the Post-Randomisation Method to the Individual Sample of Anonymised Reords (SAR) from the 001 Census. Kim, J.J. (1986) A Method for Limiting Dislosure in Miro-data Based on Random Noise and Transformation. Amerian Statistial Assoiation, Proeedings of the Setion on Survey Researh Methods, Oganian, A. and Karr, A. (006) Combinations of SDC Methods for Miro-data Protetion. Privay. In: Statistial Databases-PSD006 (eds. J. Domingo-Ferrer and L. Franoni), Springer LNCS 430, Rao, J.N.K. and Thomas, D.R. (003) Analysis of Categorial Response Data from Complex Surveys: an Appraisal and Update. In: Analysis of Survey Data (eds. R.L. Chambers and C.J. Sinner), Chihester: Wiley, Rinott, Y. and Shlomo, N (006) A Generalized Negative Binomial Smoothing Model for Sample Dislosure Ris Estimation. In PSD'006 Privay in Statistial Databases, (Eds. J. Domingo- Ferrer and L. Franoni), Springer LNCS 430, Rinott, Y. and Shlomo, N. (007a) A Smoothing Model for Sample Dislosure Ris Estimation. In Complex Datasets and Inverse Problems: Tomography, Networs and Beyond, IMS Leture Notes Monograph Series, Vol. 54, Rinott, Y. and Shlomo, N. (007b) Varianes and Confidene Intervals for Sample Dislosure Ris Measures. 56 th Session of the International Statistial Institute Invited Paper, Lisbon 007 (to appear). Shlomo, N. (007) Statistial Dislosure Limitation Methods for Census Frequeny Tables. International Statistial Review, Vol. 75, Number, pp Shlomo, N. and De Waal T. (008) Protetion of Miro-data Subjet to Edit Constraints Against Statistial Dislosure. Journal of Offiial Statistis, 4, No., 1-6. Shlomo, N. and Young, C. (006) Statistial Dislosure Limitation Methods Through a Ris-Utility Framewor. In PSD'006 Privay in Statistial Databases, (Eds. J. Domingo-Ferrer and L. Franoni), Springer LNCS 430, pp Sinner, C.J., and Elliot, M. J. (00) A Measure of Dislosure Ris for Mirodata. Journal of the Royal Statistial Soiety, Ser. B 64, Sinner, C.J. and Holmes, D. (1998) Estimating the Re-identifiation Ris Per Reord in Mirodata. Journal of Offiial Statistis 14, Sinner, C.J. and Shlomo, N. (008) Assessing Identifiation Ris in Survey Miro-data Using Loglinear Models. JASA Appliations and Case Studies (forthoming) See: Sinner, C.J. and Shlomo, N. (007) Assessing the Dislosure Protetion Provided by Mislassifiation and Reord Swapping. 56 th Session of the International Statistial Institute Invited Paper, Lisbon 007 (to appear). Willenborg, L. and De Waal, T. (001) Elements of Statistial Dislosure limitation in Pratie. Leture Notes in Statistis, 155. New Yor: Springer-Verlag. Winler, W. E. (00) Single Raning Miro-aggregation and Re-identifiation. Statistial Researh Division report RR 00/08, at Yaney, W.E., Winler, W.E., and Creey, R.H. (00) Dislosure Ris Assessment in Perturbative Miro-data Protetion. In: Inferene Control in Statistial Databases (ed. J. Domingo-Ferrer), New Yor: Springer,

A Holistic Method for Selecting Web Services in Design of Composite Applications

A Holistic Method for Selecting Web Services in Design of Composite Applications A Holisti Method for Seleting Web Servies in Design of Composite Appliations Mārtiņš Bonders, Jānis Grabis Institute of Information Tehnology, Riga Tehnial University, 1 Kalu Street, Riga, LV 1658, Latvia,

More information

Weighting Methods in Survey Sampling

Weighting Methods in Survey Sampling Setion on Survey Researh Methods JSM 01 Weighting Methods in Survey Sampling Chiao-hih Chang Ferry Butar Butar Abstrat It is said that a well-designed survey an best prevent nonresponse. However, no matter

More information

Hierarchical Clustering and Sampling Techniques for Network Monitoring

Hierarchical Clustering and Sampling Techniques for Network Monitoring S. Sindhuja Hierarhial Clustering and Sampling Tehniques for etwork Monitoring S. Sindhuja ME ABSTRACT: etwork monitoring appliations are used to monitor network traffi flows. Clustering tehniques are

More information

Channel Assignment Strategies for Cellular Phone Systems

Channel Assignment Strategies for Cellular Phone Systems Channel Assignment Strategies for Cellular Phone Systems Wei Liu Yiping Han Hang Yu Zhejiang University Hangzhou, P. R. China Contat: 000 Mathematial Contest in Modeling (MCM) Meritorious

More information

State of Maryland Participation Agreement for Pre-Tax and Roth Retirement Savings Accounts

State of Maryland Participation Agreement for Pre-Tax and Roth Retirement Savings Accounts State of Maryland Partiipation Agreement for Pre-Tax and Roth Retirement Savings Aounts DC-4531 (08/2015) For help, please all 1-800-966-6355 1 Things to Remember Complete all of the setions

More information

) ( )( ) ( ) ( )( ) ( ) ( ) (1)

) ( )( ) ( ) ( )( ) ( ) ( ) (1) OPEN CHANNEL FLOW Open hannel flow is haraterized by a surfae in ontat with a gas phase, allowing the fluid to take on shapes and undergo behavior that is impossible in a pipe or other filled onduit. Examples

More information


INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS Virginia Department of Taxation INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS 2614086 Rev. 07/14 * Table of Contents Introdution... 1 Important... 1 Where to Get Assistane... 1 Online

More information


INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS Virginia Department of Taxation INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS 2614086 Rev. 01/16 Table of Contents Introdution... 1 Important... 1 Where to Get Assistane... 1 Online File

More information

Classical Electromagnetic Doppler Effect Redefined. Copyright 2014 Joseph A. Rybczyk

Classical Electromagnetic Doppler Effect Redefined. Copyright 2014 Joseph A. Rybczyk Classial Eletromagneti Doppler Effet Redefined Copyright 04 Joseph A. Rybzyk Abstrat The lassial Doppler Effet formula for eletromagneti waves is redefined to agree with the fundamental sientifi priniples

More information


UNIVERSITY AND WORK-STUDY EMPLOYERS WEB SITE USER S GUIDE UNIVERSITY AND WORK-STUDY EMPLOYERS WEB SITE USER S GUIDE September 8, 2009 Table of Contents 1 Home 2 University 3 Your 4 Add 5 Managing 6 How 7 Viewing 8 Closing 9 Reposting Page 1 and Work-Study Employers

More information

Capacity at Unsignalized Two-Stage Priority Intersections

Capacity at Unsignalized Two-Stage Priority Intersections Capaity at Unsignalized Two-Stage Priority Intersetions by Werner Brilon and Ning Wu Abstrat The subjet of this paper is the apaity of minor-street traffi movements aross major divided four-lane roadways

More information

A Keyword Filters Method for Spam via Maximum Independent Sets

A Keyword Filters Method for Spam via Maximum Independent Sets Vol. 7, No. 3, May, 213 A Keyword Filters Method for Spam via Maximum Independent Sets HaiLong Wang 1, FanJun Meng 1, HaiPeng Jia 2, JinHong Cheng 3 and Jiong Xie 3 1 Inner Mongolia Normal University 2

More information


AUDITING COST OVERRUN CLAIMS * AUDITING COST OVERRUN CLAIMS * David Pérez-Castrillo # University of Copenhagen & Universitat Autònoma de Barelona Niolas Riedinger ENSAE, Paris Abstrat: We onsider a ost-reimbursement or a ost-sharing

More information

Chapter 5 Single Phase Systems

Chapter 5 Single Phase Systems Chapter 5 Single Phase Systems Chemial engineering alulations rely heavily on the availability of physial properties of materials. There are three ommon methods used to find these properties. These inlude

More information

Discovering Trends in Large Datasets Using Neural Networks

Discovering Trends in Large Datasets Using Neural Networks Disovering Trends in Large Datasets Using Neural Networks Khosrow Kaikhah, Ph.D. and Sandesh Doddameti Department of Computer Siene Texas State University San Maros, Texas 78666 Abstrat. A novel knowledge

More information

Electrician'sMathand BasicElectricalFormulas

Electrician'sMathand BasicElectricalFormulas Eletriian'sMathand BasiEletrialFormulas MikeHoltEnterprises,In. 1.888.NEC.CODE Introdution Introdution This PDF is a free resoure from Mike Holt Enterprises, In. It s Unit 1 from the Eletrial

More information

Chapter 1 Microeconomics of Consumer Theory

Chapter 1 Microeconomics of Consumer Theory Chapter 1 Miroeonomis of Consumer Theory The two broad ategories of deision-makers in an eonomy are onsumers and firms. Eah individual in eah of these groups makes its deisions in order to ahieve some

More information

Open and Extensible Business Process Simulator

Open and Extensible Business Process Simulator UNIVERSITY OF TARTU FACULTY OF MATHEMATICS AND COMPUTER SCIENCE Institute of Computer Siene Karl Blum Open and Extensible Business Proess Simulator Master Thesis (30 EAP) Supervisors: Luiano Garía-Bañuelos,

More information

' R ATIONAL. :::~i:. :'.:::::: RETENTION ':: Compliance with the way you work PRODUCT BRIEF

' R ATIONAL. :::~i:. :'.:::::: RETENTION ':: Compliance with the way you work PRODUCT BRIEF ' R :::i:. ATIONAL :'.:::::: RETENTION ':: Compliane with the way you work, PRODUCT BRIEF In-plae Management of Unstrutured Data The explosion of unstrutured data ombined with new laws and regulations

More information

Deadline-based Escalation in Process-Aware Information Systems

Deadline-based Escalation in Process-Aware Information Systems Deadline-based Esalation in Proess-Aware Information Systems Wil M.P. van der Aalst 1,2, Mihael Rosemann 2, Marlon Dumas 2 1 Department of Tehnology Management Eindhoven University of Tehnology, The Netherlands

More information

Static Fairness Criteria in Telecommunications

Static Fairness Criteria in Telecommunications Teknillinen Korkeakoulu ERIKOISTYÖ Teknillisen fysiikan koulutusohjelma 92002 Mat-208 Sovelletun matematiikan erikoistyöt Stati Fairness Criteria in Teleommuniations Vesa Timonen, e-mail: vesatimonen@hutfi

More information

Scalable Hierarchical Multitask Learning Algorithms for Conversion Optimization in Display Advertising

Scalable Hierarchical Multitask Learning Algorithms for Conversion Optimization in Display Advertising Salable Hierarhial Multitask Learning Algorithms for Conversion Optimization in Display Advertising Amr Ahmed Google Abhimanyu Das Mirosoft Researh Alexander J. Smola

More information

A Comparison of Service Quality between Private and Public Hospitals in Thailand

A Comparison of Service Quality between Private and Public Hospitals in Thailand International Journal of Business and Soial Siene Vol. 4 No. 11; September 2013 A Comparison of Servie Quality between Private and Hospitals in Thailand Khanhitpol Yousapronpaiboon, D.B.A. Assistant Professor

More information

Henley Business School at Univ of Reading. Pre-Experience Postgraduate Programmes Chartered Institute of Personnel and Development (CIPD)

Henley Business School at Univ of Reading. Pre-Experience Postgraduate Programmes Chartered Institute of Personnel and Development (CIPD) MS in International Human Resoure Management For students entering in 2012/3 Awarding Institution: Teahing Institution: Relevant QAA subjet Benhmarking group(s): Faulty: Programme length: Date of speifiation:

More information

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules Improved Vehile Classifiation in Long Traffi Video by Cooperating Traker and Classifier Modules Brendan Morris and Mohan Trivedi University of California, San Diego San Diego, CA 92093 {b1morris, trivedi}

More information

Findings and Recommendations

Findings and Recommendations Contrating Methods and Administration Findings and Reommendations Finding 9-1 ESD did not utilize a formal written pre-qualifiations proess for seleting experiened design onsultants. ESD hose onsultants

More information

Information Security 201

Information Security 201 FAS Information Seurity 201 Desktop Referene Guide Introdution Harvard University is ommitted to proteting information resoures that are ritial to its aademi and researh mission. Harvard is equally ommitted

More information

Henley Business School at Univ of Reading. Chartered Institute of Personnel and Development (CIPD)

Henley Business School at Univ of Reading. Chartered Institute of Personnel and Development (CIPD) MS in International Human Resoure Management (full-time) For students entering in 2015/6 Awarding Institution: Teahing Institution: Relevant QAA subjet Benhmarking group(s): Faulty: Programme length: Date

More information

Retirement Option Election Form with Partial Lump Sum Payment

Retirement Option Election Form with Partial Lump Sum Payment Offie of the New York State Comptroller New York State and Loal Retirement System Employees Retirement System Polie and Fire Retirement System 110 State Street, Albany, New York 12244-0001 Retirement Option

More information

Pattern Recognition Techniques in Microarray Data Analysis

Pattern Recognition Techniques in Microarray Data Analysis Pattern Reognition Tehniques in Miroarray Data Analysis Miao Li, Biao Wang, Zohreh Momeni, and Faramarz Valafar Department of Computer Siene San Diego State University San Diego, California, USA

More information

i e AT 21 of 2006 EMPLOYMENT ACT 2006

i e AT 21 of 2006 EMPLOYMENT ACT 2006 i e AT 21 of 2006 EMPLOYMENT ACT 2006 Employment At 2006 Index i e EMPLOYMENT ACT 2006 Index Setion Page PART I DISCRIMINATION AT RECRUITMENT ON TRADE UNION GROUNDS 9 1 Refusal of employment on grounds

More information


RATING SCALES FOR NEUROLOGISTS RATING SCALES FOR NEUROLOGISTS J Hobart iv22 WHY Correspondene to: Dr Jeremy Hobart, Department of Clinial Neurosienes, Peninsula Medial Shool, Derriford Hospital, Plymouth PL6 8DH, UK; Jeremy.Hobart@

More information

Sebastián Bravo López

Sebastián Bravo López Transfinite Turing mahines Sebastián Bravo López 1 Introdution With the rise of omputers with high omputational power the idea of developing more powerful models of omputation has appeared. Suppose that

More information

Supply chain coordination; A Game Theory approach

Supply chain coordination; A Game Theory approach aepted for publiation in the journal "Engineering Appliations of Artifiial Intelligene" 2008 upply hain oordination; A Game Theory approah Jean-Claude Hennet x and Yasemin Arda xx x LI CNR-UMR 668 Université

More information


i e AT 1 of 2012 DEBT RECOVERY AND ENFORCEMENT ACT 2012 i e AT 1 of 2012 DEBT RECOVERY AND ENFORCEMENT ACT 2012 Debt Reovery and Enforement At 2012 Index i e DEBT RECOVERY AND ENFORCEMENT ACT 2012 Index Setion Page PART 1 INTRODUCTORY 5 1 Short title... 5

More information

Customer Efficiency, Channel Usage and Firm Performance in Retail Banking

Customer Efficiency, Channel Usage and Firm Performance in Retail Banking Customer Effiieny, Channel Usage and Firm Performane in Retail Banking Mei Xue Operations and Strategi Management Department The Wallae E. Carroll Shool of Management Boston College 350 Fulton Hall, 140

More information

Procurement auctions are sometimes plagued with a chosen supplier s failing to accomplish a project successfully.

Procurement auctions are sometimes plagued with a chosen supplier s failing to accomplish a project successfully. Deision Analysis Vol. 7, No. 1, Marh 2010, pp. 23 39 issn 1545-8490 eissn 1545-8504 10 0701 0023 informs doi 10.1287/dea.1090.0155 2010 INFORMS Managing Projet Failure Risk Through Contingent Contrats

More information

FOOD FOR THOUGHT Topical Insights from our Subject Matter Experts

FOOD FOR THOUGHT Topical Insights from our Subject Matter Experts FOOD FOR THOUGHT Topial Insights from our Sujet Matter Experts DEGREE OF DIFFERENCE TESTING: AN ALTERNATIVE TO TRADITIONAL APPROACHES The NFL White Paper Series Volume 14, June 2014 Overview Differene

More information

Recovering Articulated Motion with a Hierarchical Factorization Method

Recovering Articulated Motion with a Hierarchical Factorization Method Reovering Artiulated Motion with a Hierarhial Fatorization Method Hanning Zhou and Thomas S Huang University of Illinois at Urbana-Champaign, 405 North Mathews Avenue, Urbana, IL 680, USA {hzhou, huang}@ifpuiuedu

More information


DSP-I DSP-I DSP-I DSP-I DSP-I DSP-I DSP-I DSP-I Digital Signal Proessing I (8-79) Fall Semester, 005 IIR FILER DESIG EXAMPLE hese notes summarize the design proedure for IIR filters as disussed in lass on ovember. Introdution:

More information

Hierarchical Beta Processes and the Indian Buffet Process

Hierarchical Beta Processes and the Indian Buffet Process Hierarhial Beta Proesses and the Indian Buffet Proess Romain Thibaux Dept. of EECS University of California, Berkeley Berkeley, CA 9472 Mihael I. Jordan Dept. of EECS and Dept. of Statistis University

More information

Using Live Chat in your Call Centre

Using Live Chat in your Call Centre Using Live Chat in your Call Centre Otober Key Highlights Yesterday's all entres have beome today's ontat entres where agents deal with multiple queries from multiple hannels. Live Chat hat is one now

More information

Optimal Sales Force Compensation

Optimal Sales Force Compensation Optimal Sales Fore Compensation Matthias Kräkel Anja Shöttner Abstrat We analyze a dynami moral-hazard model to derive optimal sales fore ompensation plans without imposing any ad ho restritions on the

More information

A Context-Aware Preference Database System

A Context-Aware Preference Database System J. PERVASIVE COMPUT. & COMM. (), MARCH 005. TROUBADOR PUBLISHING LTD) A Context-Aware Preferene Database System Kostas Stefanidis Department of Computer Siene, University of Ioannina,, Evaggelia

More information

1.3 Complex Numbers; Quadratic Equations in the Complex Number System*

1.3 Complex Numbers; Quadratic Equations in the Complex Number System* 04 CHAPTER Equations and Inequalities Explaining Conepts: Disussion and Writing 7. Whih of the following pairs of equations are equivalent? Explain. x 2 9; x 3 (b) x 29; x 3 () x - 2x - 22 x - 2 2 ; x

More information

Big Data Analysis and Reporting with Decision Tree Induction

Big Data Analysis and Reporting with Decision Tree Induction Big Data Analysis and Reporting with Deision Tree Indution PETRA PERNER Institute of Computer Vision and Applied Computer Sienes, IBaI Postbox 30 11 14, 04251 Leipzig GERMANY,

More information

Improved SOM-Based High-Dimensional Data Visualization Algorithm

Improved SOM-Based High-Dimensional Data Visualization Algorithm Computer and Information Siene; Vol. 5, No. 4; 2012 ISSN 1913-8989 E-ISSN 1913-8997 Published by Canadian Center of Siene and Eduation Improved SOM-Based High-Dimensional Data Visualization Algorithm Wang

More information

In order to be able to design beams, we need both moments and shears. 1. Moment a) From direct design method or equivalent frame method

In order to be able to design beams, we need both moments and shears. 1. Moment a) From direct design method or equivalent frame method BEAM DESIGN In order to be able to design beams, we need both moments and shears. 1. Moment a) From diret design method or equivalent frame method b) From loads applied diretly to beams inluding beam weight

More information

The Optimal Deterrence of Tax Evasion: The Trade-off Between Information Reporting and Audits

The Optimal Deterrence of Tax Evasion: The Trade-off Between Information Reporting and Audits The Optimal Deterrene of Tax Evasion: The Trade-off Between Information Reporting and Audits Yulia Paramonova Department of Eonomis, University of Mihigan Otober 30, 2014 Abstrat Despite the widespread

More information

SLA-based Resource Allocation for Software as a Service Provider (SaaS) in Cloud Computing Environments

SLA-based Resource Allocation for Software as a Service Provider (SaaS) in Cloud Computing Environments 2 th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing SLA-based Resoure Alloation for Software as a Servie Provider (SaaS) in Cloud Computing Environments Linlin Wu, Saurabh Kumar

More information

Lemon Signaling in Cross-Listings Michal Barzuza*

Lemon Signaling in Cross-Listings Michal Barzuza* Lemon Signaling in Cross-Listings Mihal Barzuza* This paper analyzes the deision to ross-list by managers and ontrolling shareholders assuming that they have private information with respet to the amount

More information

Research Data Management ANONYMISATION

Research Data Management ANONYMISATION ANONYMISATION Sensitive Data Sensitive Data is information overing: The raial or ethni origin of the Data Subjet Politial opinions Religious or other beliefs of a similar nature Membership of trade unions

More information

An integrated optimization model of a Closed- Loop Supply Chain under uncertainty

An integrated optimization model of a Closed- Loop Supply Chain under uncertainty ISSN 1816-6075 (Print), 1818-0523 (Online) Journal of System and Management Sienes Vol. 2 (2012) No. 3, pp. 9-17 An integrated optimization model of a Closed- Loop Supply Chain under unertainty Xiaoxia

More information

protection p1ann1ng report

protection p1ann1ng report f1re~~ protetion p1ann1ng report BUILDING CONSTRUCTION INFORMATION FROM THE CONCRETE AND MASONRY INDUSTRIES Signifiane of Fire Ratings for Building Constrution NO. 3 OF A SERIES The use of fire-resistive

More information

university of illinois library AT URBANA-CHAMPAIGN BOOKSTACKS

university of illinois library AT URBANA-CHAMPAIGN BOOKSTACKS university of illinois library AT URBANA-CHAMPAIGN BOOKSTACKS CENTRAL CIRCULATION BOOKSTACKS The person harging this material is responsible for its renewal or its return to the library from whih it was

More information


FIRE DETECTION USING AUTONOMOUS AERIAL VEHICLES WITH INFRARED AND VISUAL CAMERAS. J. Ramiro Martínez-de Dios, Luis Merino and Aníbal Ollero FE DETECTION USING AUTONOMOUS AERIAL VEHICLES WITH INFRARED AND VISUAL CAMERAS. J. Ramiro Martínez-de Dios, Luis Merino and Aníbal Ollero Robotis, Computer Vision and Intelligent Control Group. University

More information

Suggested Answers, Problem Set 5 Health Economics

Suggested Answers, Problem Set 5 Health Economics Suggested Answers, Problem Set 5 Health Eonomis Bill Evans Spring 2013 1. The graph is at the end of the handout. Fluoridated water strengthens teeth and redues inidene of avities. As a result, at all

More information

Effects of Inter-Coaching Spacing on Aerodynamic Noise Generation Inside High-speed Trains

Effects of Inter-Coaching Spacing on Aerodynamic Noise Generation Inside High-speed Trains Effets of Inter-Coahing Spaing on Aerodynami Noise Generation Inside High-speed Trains 1 J. Ryu, 1 J. Park*, 2 C. Choi, 1 S. Song Hanyang University, Seoul, South Korea 1 ; Korea Railroad Researh Institute,

More information

Health Savings Account Application

Health Savings Account Application Health Savings Aount Appliation FOR BANK USE ONLY: ACCOUNT # CUSTOMER # Health Savings Aount (HSA) Appliation ALL FIELDS MUST BE COMPLETED. Missing fields may delay the aount opening proess and possibly

More information

Effectiveness of a law to reduce alcohol-impaired driving in Japan

Effectiveness of a law to reduce alcohol-impaired driving in Japan Effetiveness of a law to redue alohol-impaired driving in Japan T Nagata, 1,2 S Setoguhi, 3 D Hemenway, 4 M J Perry 5 Original artile 1 Takemi Program, Department of International Health, Harvard Shool

More information

Neural network-based Load Balancing and Reactive Power Control by Static VAR Compensator

Neural network-based Load Balancing and Reactive Power Control by Static VAR Compensator nternational Journal of Computer and Eletrial Engineering, Vol. 1, No. 1, April 2009 Neural network-based Load Balaning and Reative Power Control by Stati VAR Compensator smail K. Said and Marouf Pirouti

More information

Prices and Heterogeneous Search Costs

Prices and Heterogeneous Search Costs Pries and Heterogeneous Searh Costs José L. Moraga-González Zsolt Sándor Matthijs R. Wildenbeest First draft: July 2014 Revised: June 2015 Abstrat We study prie formation in a model of onsumer searh for

More information

Granular Problem Solving and Software Engineering

Granular Problem Solving and Software Engineering Granular Problem Solving and Software Engineering Haibin Zhu, Senior Member, IEEE Department of Computer Siene and Mathematis, Nipissing University, 100 College Drive, North Bay, Ontario, P1B 8L7, Canada

More information

Context-Sensitive Adjustments of Cognitive Control: Conflict-Adaptation Effects Are Modulated by Processing Demands of the Ongoing Task

Context-Sensitive Adjustments of Cognitive Control: Conflict-Adaptation Effects Are Modulated by Processing Demands of the Ongoing Task Journal of Experimental Psyhology: Learning, Memory, and Cognition 2008, Vol. 34, No. 3, 712 718 Copyright 2008 by the Amerian Psyhologial Assoiation 0278-7393/08/$12.00 DOI: 10.1037/0278-7393.34.3.712

More information

Iatrogenic Specification Error: A Cautionary Tale of Cleaning Data

Iatrogenic Specification Error: A Cautionary Tale of Cleaning Data DISCUSSION PAPER SERIES IZA DP No. 1093 Iatrogeni Speifiation Error: A Cautionary Tale of Cleaning Data Christopher R. Bollinger Amitabh Chandra Marh 2004 Forshungsinstitut zur Zukunft der Arbeit Institute

More information

RESEARCH SEMINAR IN INTERNATIONAL ECONOMICS. Discussion Paper No. 475. The Evolution and Utilization of the GATT/WTO Dispute Settlement Mechanism

RESEARCH SEMINAR IN INTERNATIONAL ECONOMICS. Discussion Paper No. 475. The Evolution and Utilization of the GATT/WTO Dispute Settlement Mechanism RESEARCH SEMINAR IN INTERNATIONAL ECONOMICS Shool of Publi Poliy The University of Mihigan Ann Arbor, Mihigan 48109-1220 Disussion Paper No. 475 The Evolution and Utilization of the GATT/WTO Dispute Settlement

More information

An Efficient Network Traffic Classification Based on Unknown and Anomaly Flow Detection Mechanism

An Efficient Network Traffic Classification Based on Unknown and Anomaly Flow Detection Mechanism An Effiient Network Traffi Classifiation Based on Unknown and Anomaly Flow Detetion Mehanism G.Suganya.M.s.,B.Ed 1 1 Mphil.Sholar, Department of Computer Siene, KG College of Arts and Siene,Coimbatore.

More information

MATE: MPLS Adaptive Traffic Engineering

MATE: MPLS Adaptive Traffic Engineering MATE: MPLS Adaptive Traffi Engineering Anwar Elwalid Cheng Jin Steven Low Indra Widjaja Bell Labs EECS Dept EE Dept Fujitsu Network Communiations Luent Tehnologies Univ. of Mihigan Calteh Pearl River,

More information

Computer Networks Framing

Computer Networks Framing Computer Networks Framing Saad Mneimneh Computer Siene Hunter College of CUNY New York Introdution Who framed Roger rabbit? A detetive, a woman, and a rabbit in a network of trouble We will skip the physial

More information

Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning

Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning Ranking Community Answers by Modeling Question-Answer Relationships via Analogial Reasoning Xin-Jing Wang Mirosoft Researh Asia 4F Sigma, 49 Zhihun Road Beijing, P.R.China Xudong Tu,Dan

More information


SCHEME FOR FINANCING SCHOOLS SCHEME FOR FINANCING SCHOOLS UNDER SECTION 48 OF THE SCHOOL STANDARDS AND FRAMEWORK ACT 1998 DfE Approved - Marh 1999 With amendments Marh 2001, Marh 2002, April 2003, July 2004, Marh 2005, February 2007,

More information



More information

Paid Placement Strategies for Internet Search Engines

Paid Placement Strategies for Internet Search Engines Paid Plaement Strategies for Internet Searh Engines Hemant K. Bhargava Smeal College of Business Penn State University 342 Beam Building University Park, PA 16802 Juan Feng Smeal College

More information

A Survey of Usability Evaluation in Virtual Environments: Classi cation and Comparison of Methods

A Survey of Usability Evaluation in Virtual Environments: Classi cation and Comparison of Methods Doug A. Bowman Department of Computer Siene Virginia Teh Joseph L. Gabbard Deborah Hix [ jgabbard, hix] Systems Researh Center Virginia Teh A Survey of Usability Evaluation in Virtual

More information



More information

A Robust Optimization Approach to Dynamic Pricing and Inventory Control with no Backorders

A Robust Optimization Approach to Dynamic Pricing and Inventory Control with no Backorders A Robust Optimization Approah to Dynami Priing and Inventory Control with no Bakorders Elodie Adida and Georgia Perakis July 24 revised July 25 Abstrat In this paper, we present a robust optimization formulation

More information


MEDICATION MANAGEMENT ASSESSMENT MEDICATION MANAGEMENT ASSESSMENT The Mediation Management Assessment provides evidene-based reommendations/standards for Minnesota hospitals in the development of a omprehensive mediation safety program.

More information

The Basics of International Trade: A Classroom Experiment

The Basics of International Trade: A Classroom Experiment The Basis of International Trade: A Classroom Experiment Alberto Isgut, Ganesan Ravishanker, and Tanya Rosenblat * Wesleyan University Abstrat We introdue a simple web-based lassroom experiment in whih

More information

Table of Contents. Appendix II Application Checklist. Export Finance Program Working Capital Financing...7

Table of Contents. Appendix II Application Checklist. Export Finance Program Working Capital Financing...7 Export Finane Program Guidelines Table of Contents Setion I General...........................................................1 A. Introdution............................................................1

More information

Chapter 6 A N ovel Solution Of Linear Congruenes Proeedings NCUR IX. (1995), Vol. II, pp. 708{712 Jerey F. Gold Department of Mathematis, Department of Physis University of Utah Salt Lake City, Utah 84112

More information

Behavior Analysis-Based Learning Framework for Host Level Intrusion Detection

Behavior Analysis-Based Learning Framework for Host Level Intrusion Detection Behavior Analysis-Based Learning Framework for Host Level Intrusion Detetion Haiyan Qiao, Jianfeng Peng, Chuan Feng, Jerzy W. Rozenblit Eletrial and Computer Engineering Department University of Arizona

More information

Tax-loss Selling and the Turn-of-the-Year Effect: New Evidence from Norway 1

Tax-loss Selling and the Turn-of-the-Year Effect: New Evidence from Norway 1 Tax-loss Selling and the Turn-of-the-Year Effet: New Evidene from Norway 1 Qinglei Dai Universidade Nova de Lisboa September 2005 1 Aknowledgement: I would like to thank Kristian Rydqvist at Binghamton

More information

Soft-Edge Flip-flops for Improved Timing Yield: Design and Optimization

Soft-Edge Flip-flops for Improved Timing Yield: Design and Optimization Soft-Edge Flip-flops for Improved Timing Yield: Design and Optimization Abstrat Parameter variations ause high yield losses due to their large impat on iruit delay. In this paper, we propose the use of

More information

Board Building Recruiting and Developing Effective Board Members for Not-for-Profit Organizations

Board Building Recruiting and Developing Effective Board Members for Not-for-Profit Organizations Board Development Board Building Reruiting and Developing Effetive Board Members for Not-for-Profit Organizations Board Development Board Building Reruiting and Developing Effetive Board Members for Not-for-Profit

More information

Performance Analysis of IEEE 802.11 in Multi-hop Wireless Networks

Performance Analysis of IEEE 802.11 in Multi-hop Wireless Networks Performane Analysis of IEEE 80.11 in Multi-hop Wireless Networks Lan Tien Nguyen 1, Razvan Beuran,1, Yoihi Shinoda 1, 1 Japan Advaned Institute of Siene and Tehnology, 1-1 Asahidai, Nomi, Ishikawa, 93-19

More information

An Enhanced Critical Path Method for Multiple Resource Constraints

An Enhanced Critical Path Method for Multiple Resource Constraints An Enhaned Critial Path Method for Multiple Resoure Constraints Chang-Pin Lin, Hung-Lin Tai, and Shih-Yan Hu Abstrat Traditional Critial Path Method onsiders only logial dependenies between related ativities

More information

Intelligent Measurement Processes in 3D Optical Metrology: Producing More Accurate Point Clouds

Intelligent Measurement Processes in 3D Optical Metrology: Producing More Accurate Point Clouds Intelligent Measurement Proesses in 3D Optial Metrology: Produing More Aurate Point Clouds Charles Mony, Ph.D. 1 President Creaform in. Daniel Brown, Eng. 1 Produt Manager Creaform in.

More information

VOLUME 13, ARTICLE 5, PAGES 117-142 PUBLISHED 05 OCTOBER 2005 DOI: 10.4054/DemRes.2005.13.

VOLUME 13, ARTICLE 5, PAGES 117-142 PUBLISHED 05 OCTOBER 2005  DOI: 10.4054/DemRes.2005.13. Demographi Researh a free, expedited, online journal of peer-reviewed researh and ommentary in the population sienes published by the Max Plank Institute for Demographi Researh Konrad-Zuse Str. 1, D-157

More information

Impact Simulation of Extreme Wind Generated Missiles on Radioactive Waste Storage Facilities

Impact Simulation of Extreme Wind Generated Missiles on Radioactive Waste Storage Facilities Impat Simulation of Extreme Wind Generated issiles on Radioative Waste Storage Failities G. Barbella Sogin S.p.A. Via Torino 6 00184 Rome (Italy), Abstrat: The strutural design of temporary

More information

Working Paper Deriving the Taylor principle when the central bank supplies money

Working Paper Deriving the Taylor principle when the central bank supplies money eonstor Der Open-Aess-Publikationsserver der ZBW Leibniz-Informationszentrum Wirtshaft The Open Aess Publiation Server of the ZBW Leibniz Information Centre for Eonomis Davies, Ceri; Gillman,

More information

i_~f e 1 then e 2 else e 3

i_~f e 1 then e 2 else e 3 A PROCEDURE MECHANISM FOR BACKTRACK PROGRAMMING* David R. HANSON + Department o Computer Siene, The University of Arizona Tuson, Arizona 85721 One of the diffiulties in using nondeterministi algorithms

More information



More information

Asymmetric Error Correction and Flash-Memory Rewriting using Polar Codes

Asymmetric Error Correction and Flash-Memory Rewriting using Polar Codes 1 Asymmetri Error Corretion and Flash-Memory Rewriting using Polar Codes Eyal En Gad, Yue Li, Joerg Kliewer, Mihael Langberg, Anxiao (Andrew) Jiang and Jehoshua Bruk Abstrat We propose effiient oding shemes

More information

3 Game Theory: Basic Concepts

3 Game Theory: Basic Concepts 3 Game Theory: Basi Conepts Eah disipline of the soial sienes rules omfortably ithin its on hosen domain: : : so long as it stays largely oblivious of the others. Edard O. Wilson (1998):191 3.1 and and

More information

Agent-Based Grid Load Balancing Using Performance-Driven Task Scheduling

Agent-Based Grid Load Balancing Using Performance-Driven Task Scheduling Agent-Based Grid Load Balaning Using Performane-Driven Task Sheduling Junwei Cao *1, Daniel P. Spooner, Stephen A. Jarvis, Subhash Saini and Graham R. Nudd * C&C Researh Laboratories, NEC Europe Ltd.,

More information

Interpretable Fuzzy Modeling using Multi-Objective Immune- Inspired Optimization Algorithms

Interpretable Fuzzy Modeling using Multi-Objective Immune- Inspired Optimization Algorithms Interpretable Fuzzy Modeling using Multi-Objetive Immune- Inspired Optimization Algorithms Jun Chen, Mahdi Mahfouf Abstrat In this paper, an immune inspired multi-objetive fuzzy modeling (IMOFM) mehanism

More information

protection p1ann1ng report

protection p1ann1ng report ( f1re protetion p1ann1ng report I BUILDING CONSTRUCTION INFORMATION FROM THE CONCRETE AND MASONRY INDUSTRIES NO. 15 OF A SERIES A Comparison of Insurane and Constrution Costs for Low-Rise Multifamily

More information

How To Fator

How To Fator CHAPTER hapter 4 > Make the Connetion 4 INTRODUCTION Developing seret odes is big business beause of the widespread use of omputers and the Internet. Corporations all over the world sell enryption systems

More information

Provided in Cooperation with: Ifo Institute Leibniz Institute for Economic Research at the University of Munich

Provided in Cooperation with: Ifo Institute Leibniz Institute for Economic Research at the University of Munich eonstor Der Open-Aess-Publikationsserver der ZBW Leibniz-Informationszentrum Wirtshaft The Open Aess Publiation Server of the ZBW Leibniz Information Centre for Eonomis Pagano, Maro; Immordino,

More information

Learning Curves and Stochastic Models for Pricing and Provisioning Cloud Computing Services

Learning Curves and Stochastic Models for Pricing and Provisioning Cloud Computing Services T Learning Curves and Stohasti Models for Priing and Provisioning Cloud Computing Servies Amit Gera, Cathy H. Xia Dept. of Integrated Systems Engineering Ohio State University, Columbus, OH 4310 {gera.,

More information