A Practical Framework for PrivacyPreserving Data Analytics


 Imogene Osborne
 1 years ago
 Views:
Transcription
1 A Practica Framework for PrivacyPreserving Data Anaytics ABSTRACT Liyue Fan Integrated Media Systems Center University of Southern Caifornia Los Angees, CA, USA The avaiabiity of an increasing amount of user generated data is transformative to our society. We enjoy the benefits of anayzing big data for pubic interest, such as disease outbreak detection and traffic contro, as we as for commercia interests, such as smart grid and product recommendation. However, the arge coection of user generated data contains unique patterns and can be used to reidentify individuas, which has been exempified by the AOL search og reease incident. In this paper, we propose a practica framework for data anaytics, whie providing differentia privacy guarantees to individua data contributors. Our framework generates differentiay private aggregates which can be used to perform data mining and recommendation tasks. To aeviate the high perturbation errors introduced by the differentia privacy mechanism, we present two methods with different samping techniques to draw a subset of individua data for anaysis. Empirica studies with reaword data sets show that our soutions enabe accurate data anaytics on a sma fraction of the input data, reducing user privacy risk and data storage requirement without compromising the anaysis resuts. Categories and Subject Descriptors H.2.7 [Database Management]: Database Administration Security, integrity, and protection; H.2.8 [Database Management]: Database Appications Data mining eywords Data Anaytics, Differentia Privacy, Samping. INTRODUCTION We ive in the age of big data. With an increasing number of peope, devices, and sensors connected with digita networks, individua data now can be argey coected and anayzed to understand important phenomena. One exampe is Googe Fu Trends, a service that estimates fu activity by aggregating individua search work done whie interning with Samsung. Copyright is hed by the Internationa Word Wide Web Conference Committee (IW3C2). IW3C2 reserves the right to provide a hyperink to the author s site if the Materia is used in eectronic media. WWW 25, May 8 22, 25, Forence, Itay. ACM /5/5. Hongxia Jin Samsung R&D Research Center San Jose, CA, USA Records Data Loss Privacy Surpus Users cutoff Figure : Record Distribution of Netfix Users queries. In the retai market, individua purchase histories are used by recommendation toos to earn trends and patterns. Performing anaytics on private data is ceary beneficia, such as eary detection of disease and recommendation services. However, user concerns rise from a privacy perspective, with sharing an increasing amount of information regarding their heath, ocation, service usage, and onine activities. As a matter of fact, the uniqueness of each user is increased by the big coection of individua data. The AOL data reease in 26 is an unfortunate exampe of privacy catastrophe [], in which the search ogs of an innocent citizen were quicky identified by a newspaper journaist. A recent study by de Montjoye et a. [9] concudes that human mobiity patterns are highy unique and four spatiotempora points are enough to uniquey identify 95% of the individuas. In order to protect users from reidentification attacks, their private data must be transformed prior to reease for anaysis. The current stateoftheart paradigm for privacypreserving data anaysis is differentia privacy [], which aows untrusted parties to access private data through aggregate queries. The aggregate statistics are perturbed by a randomized agorithm, such that the output remains roughy the same even if any user is added or removed in the input data. Differentia privacy provides a strong guarantee: given the output statistics, an adversary wi not be abe to infer whether any user is present in the input database. However, this indistinguishabiity can be ony achieved at high perturbation cost. Intuitivey, the more data a user contributes to the anaysis process, the more perturbation noise is needed to hide his/her presence. In some cases, a user coud generate an unbounded amount of data, such as purchase or checkin history, the addition or remova of which may resut in unimited impact on the output. The chaenge of enforcing differentia privacy is that it incurs a surpus of privacy cost, i.e. high perturbation error, being designed to protect each user according to the highest possibe data contribution. In reaity, ony a very sma number of users generate arge amount of persona data, whie the rest contribute itte data each. As shown in Figure, out of 5 users from Netfix prize com 3
2 petition [2], ony user generated around 7 data records, whie the majority of users generated much ess persona data, ess than 2 data records each. If a upper bound is imposed on individua user data contribution, the surpus of privacy, e.g. high perturbation noise, can be reduced at the cost of data oss, i.e. part of data from those users who contributed more than the threshod. To imit individua data contribution, some strategies have been adopted by severa works [6][25]. The authors of [6] used the first d search queries submitted by each user, and the work in [25] reduced the number of items contained in each transaction to with smart truncation. However, there has been no discussion on the choice of the bounds, i.e., d and. Furthermore, the choice of actua user records (or items in a singe transaction) remains nontrivia, for generic appications. With a rigorous privacy notion, we consider how to anayze individuay contributed data to gain a deep understanding of service usage and behavior patterns, for various appication domains. We woud ike to understand the impacts of privacy and data oss on the resuting data anaytics, and design agorithms to draw private data accordingy. Exampe data anaytica questions are: Which paces do peope visit on Thursdays? and What are the most popuar movies with femae watchers under age 25? We formay define the tasks as database queries and detais are provided in Section 3. Contributions. In this paper, we address the probem of differentiay private data anaytics, where each user coud contribute a arge number of records. We propose a generic framework to generate anaysis resuts on a samped database, and study two samping methods as we as the samping factor in order to achieve a baance between data oss and privacy surpus. We summarize the contributions of this paper as foows: () We propose a generic, sampingbased framework for an important cass of data anaytica tasks: top mining and contextaware recommendation. We consider the probem of reeasing a set of count queries regarding the domainspecific items of interest as we as customizabe predicates to answer deep, anaytica questions. The count queries are perturbed prior to reease such that they satisfy differentia privacy. (2) We design two agorithms that draw a sampe of user records from the raw database and generate anaysis resuts on the samped data. The agorithm randomy sampes up to records per user. The HPA agorithm seects up to records from each user that are most usefu for the specific anaytica tasks. The utiity of each record can be customized based on the actua appication domain. We outine each samping method and provide pseudo code for easy impementation. (3) We provide anaysis on the accuracy of random samping, i.e. Mean Squared Error of reeased counts, with respect to the samping factor. We concude that the optima vaue is positivey correated the privacy constraint. We show that performing record samping on individua user s data does not infict extra privacy eakage. We formay prove that both samping agorithms satisfy differentia privacy. (4) We conduct extensive empirica studies with various reaword data sets. We compare our approaches with existing differentiay private mechanisms and evauate the accuracy of reeased count data with three utiity metrics. The experimenta resuts show that athough performed on a sma samped database, our methods provide comparabe performance to the best existing approaches in MSE and Ldivergence, and superior performance in top discovery and contextaware recommendation tasks. The HPA agorithm yieds higher precision, whie the agorithm preserves we the distributiona properties in reeased data. We beieve that our privacypreserving framework wi enabe data anaytics for a variety of services, reducing user privacy cost and data storage requirement without compromising output utiity. The rest of the paper is organized as foows: Section 2 briefy surveys the reated works on privacypreserving data pubishing and anaytics. Section 3 defines the probem and privacy notion. Section 4 presents the technica detais of the proposed framework and two samping agorithms. Theoretica resuts of privacy guarantees are provided in Section 5. Section 6 describes the data set and presents a set of empirica studies. Finay, Section 7 concudes the paper and states possibe directions for future work. 2. RELATED WORS A pethora of differentiay private techniques have been deveoped since the introduction of ɛdifferentia privacy in [2]. Here we briefy review the most recent, reevant works to our probem. Differentia Privacy. Dwork et a. [2] first proposed ɛdifferentia privacy and estabished the Lapace mechanism to perturb aggregate queries to guarantee differentia privacy. Since then, two variants have been proposed and adopted by many works as reaxations of ɛdifferentia privacy. The (ɛ, δ)probabiistic differentia privacy [9] achieves ɛdifferentia privacy with high probabiity, i.e. ( δ). The (ɛ, δ)indistinguishabiity [, 2] reaxes the bound of ɛdifferentia privacy by introducing an additive term δ. Our work adopts the strict definition of ɛdifferentia privacy and the Lapace mechanism to reease numeric data for anaysis. Data Pubication Techniques. A pethora of works have been proposed to pubish sanitized data with differentia privacy. To ist a few representatives among them, there is histogram pubication for range queries [7], for a given workoad [24], and for sparse data [8]. The majority of data pubication methods consider settings where each user contributes ony one record, or affects ony one reeased count. In contrast, we focus on those services where each individua may contribute a arge number of records and coud even have unbounded infuence on the reeased count queries. Bounding Individua Contribution. Here we review works estabished in a simiar probem setting, i.e. where individua data contribution is high, i.e. high goba sensitivity. The work of Nissim et a. [2] proposed smooth sensitivity, which measures individua impact on the output statistics in the neighborhood of the database instance. They showed that smooth sensitivity aows a smaer amount of perturbation noise injected to reeased statistics. However, it does not guarantee ɛdifferentia privacy. Proserpio et a. [22] recenty proposed to generaize ɛdp definition to weighted datasets, and scae down the weights of data records to reduce sensitivity. Rastogi and Nath [23], Fan and Xiong [3] and Chan et. a [5] studied the probem of sharing time series of counts with differentia privacy, where the maximum individua contribution is T, the number of time points. The authors of [23] proposed to preserve ony k discrete Fourier coefficients of the origina count series. The FAST framework in [3] reduced the sensitivity by samping M points in a count series and predicts at other points. The work [5] proposed the notion of psum to ensure each item in the stream ony affects a sma number of psum s. Two works by oroova et a. [6] and Hong et a. [4] addressed the differentiay private pubication of search ogs, where each user coud contribute a arge search history. The work of [6] keeps the first d queries of each user, whie the work of [4] expicity removes those users whose data change the optima output by more than a certain threshod. Zeng et a. [25] studied frequent itemset mining with differentia privacy and truncated each individua transaction to contain up to items. Recenty, earis and Papadopouos [5] 32
3 more specificay at Recommend good paces to visit on a Tuesday!. Moreover, we consider the probem of performing the above tasks on reeased count data. As in Figure 2, each V i s represents an item of interest, e.g. a restaurant, and each A j represents a vaue of the context, e.g. Monday. For each V i, the number of records containing V i is reeased. For each edge connecting V i and A j, the number of records containing V i and A j is reeased. As a resut, top discovery can be performed on the item counts and contextaware recommendation on the edge counts connected to any context A j. We formay state the probem to investigate beow. Figure 2: Reeasing Counts for Data Anaytics proposed to reease nonoverapping count data by grouping simiar coumns, i.e. items in our definition. In their work, each user is aowed to contribute no more than one records to each coumn, thus the maximum individua contribution is bounded by the number of coumns. However, the binary representation of user data may not truy convey information about each coumn, i.e. pace of interest or product. For exampe, when the bit for a user and a ocation is set, we cannot distinguish whether it was an accidenta checkin or the user went there many times due to persona preference. Samping Differentia Privacy. There have been a few works which studied the reationship between samping and differentia privacy. Chaudhuri and Mishra [6] first showed that the combination of kanonymity and random samping can achieve differentia privacy with high probabiity. Li et a. [7] proposed to sampe each record with a caibrated probabiity β and then perform k anonymity on the samped data, to achieve (ɛ, δ)indistinguishabiity. Both works adopt the random samping technique which sampes a data record with certain probabiity. However, when appied in our setting, no guarantee is provided on bounding the individua data in the samped database. Our Competitors. After reviewing existing differentiay private technique, we identify three works that aow high individua contribution, reease aggregate statistics, and satisfy ɛdifferentia privacy. The first is a straightforward appication of Lapace perturbation mechanism [2] to each reeased count, denoted as LPA. The second is the Fourier transform based agorithm from [23], which can be adapted to share count vectors, denoted as DFT. The third is GS, which is the best method proposed in [5]. 3. PRELIMINARIES 3. Probem Formuation Suppose a database D contains data records contributed by a set of n users about m items of interest. Each item coud represent in reaity a pace or a product. Each record in dataset D is a tupe (rid, uid, vid, attra), where rid is the record ID, uid corresponds to the user who contributed this record, vid is the item which the record is about, and attra represents contextua/additiona information regarding this record. In reaity, various information is often avaiabe in the actua database, such as transaction time, user ratings and reviews, and user demographic information. In our probem setting, attra can be an attribute, e.g. dayofweek, or a set of attributes, e.g. (Gender, Age), which can be customized to offer deep insight in a specific appication domain. Let h denote the number of possibe attra vaues. To be more concrete, we seect two anaytic tasks, i.e. top discovery and contextaware recommendation, to iustrate the usabiity of our soutions. The first task answers questions as What are the most popuar paces in city X?, whie the second task aims DEFINITION D, reease (ITEM COUNTS). For each item V i in database c i(d) seect from D where vid = V i () DEFINITION 2 (EDGE COUNTS). For each edge connecting item V i and attribute A j, reease c i,j(d) seect from D where vid = V i and attra = A j (2) PROBLEM (PRIVATE DATA ANALYTICS). Given database D and privacy parameter ɛ, reease a sanitized version of item counts and edge counts, such that the reeased data satisfies ɛ differentia privacy. Note that the probem definition, i.e. the counting queries to reease, can be customized according to the anaytica task to perform. For instance, to understand the correation between items, the bipartite graph in Figure 2 can be adapted as foows: A nodes wi be repaced by items, i.e. V nodes; and each edge (V i, V j) represents the number of times that V j is purchased/watched/visited by users who aso purchase/watch/visit V i. Simiary, those counts can be reeased privatey with sight adaption of our proposed soutions beow. 3.2 Privacy Definition The privacy guarantee provided by our work is differentia privacy [4]. Simpy put, a mechanism is differentiay private if its outcome is not significanty affected by the remova or addition of any user. An adversary thus earns approximatey the same information about any individua, irrespective of his/her presence or absence in the origina database. DEFINITION 3 (ɛdifferential PRIVACY). A noninteractive privacy mechanism A : D T satisfies ɛdifferentia privacy if for any neighboring databases D and D 2, and for any set D T, P r[a(d ) = D] e ɛ P r[a(d 2) = D] (3) where the probabiity is taken over the randomness of A. The privacy parameter ɛ, aso caed the privacy budget [2], specifies the degree of privacy offered. Intuitivey, a ower vaue of ɛ impies stronger privacy guarantee and a arger perturbation noise, and a higher vaue of ɛ impies a weaker guarantee whie possiby achieving higher accuracy. The neighboring databases D and D 2 differ on at most one user. Lapace Mechanism. Dwork et a. [2] show that ɛdifferentia privacy can be achieved by adding i.i.d. noise to query resut q(d): q(d) = q(d) + (Ñ,..., Ñz) (4) Ñ i Lap(, GS(q) ) for i =,..., z (5) ɛ 33
4 Symbo Description D/D Input database / Domain of a databases T k Set of records contributed by user u k in D D R/D R samped database / Domain of D R D G/D G HPA samped database / Domain of D G D E/D E HPA samped database / Domain of D E q / q Query of a item counts / Noisy output of q q 2/ q 2 Query of a edge counts / Noisy output of q 2 p/ p Popuarity vector for a items / Estimation of p M Max records per user aowed in D Max records per user aowed in D R and D G d Max records per user aowed in D E Tabe : Summary of notations where z represents the dimension of q(d). The magnitude of Ñ conforms to a Lapace distribution with mean and GS(q)/ɛ scae, where GS(q) represents the goba sensitivity [2] of the query q. The goba sensitivity is the maximum L distance between the resuts of q from any two neighboring databases. Formay, it is defined as foows: GS(q) = max D,D 2 q(d ) q(d 2). (6) Sensitivity Anaysis. Let M denote the maximum number of records any user coud contribute and D denote the domain of database D. Let q = {c,..., c m} output the item counts for every V i. Let q 2 = {c,, c,2,..., c m,h } output the edge counts for every V i and A j. The foowing emmas estabish the goba sensitivity of q and q 2, in order to protect the privacy of each individua user. The proof is quite straightforward thus omitted here for brevity. LEMMA (ITEM COUNTS SENSITIVITY). The goba sensitivity of q : D R m is M, i.e. GS(q ) = M. (7) LEMMA 2 (EDGE COUNTS SENSITIVITY). The goba sensitivity of q 2 : D R mh is M, i.e. GS(q 2) = M. (8) Composition.. The composition properties of differentia privacy provide privacy guarantees for a sequence of computations, which can be appied to mechanisms that require mutipe steps. THEOREM (SEQUENTIAL COMPOSITION [2]). Let A i each provide ɛ idifferentia privacy. A sequence of A i(d) over the dataset D provides ( i ɛi)differentia privacy. 4. PROPOSED SOLUTIONS Beow we describe two sampingbased soutions to privacypreserving data anaytics. The notations used in the probem definition and our proposed soutions are summarized in Tabe. 4. Simpe Random Agorithm () Our first soution has been inspired by the fact that the maximum number of records contributed by each user, i.e. M, coud be rather arge in rea appications. For exampe, the Netfix user who contributed the most data submitted 7, reviews, as shown in Tabe 4. In fact, a user coud contribute as many records as the domain size, i.e. m, as in the tota number of movies on Netfix. As a resut of the arge magnitude of M, a very high perturbation noise is required to provide differentia privacy, according to the Lapace mechanism. Furthermore, the number of records contributed by each user can be unbounded for many appications, as a Figure 3: Outine of Agorithm Agorithm Simpe Random Agorithm () Input: raw dataset D, samping factor, privacy budget ɛ Output: sanitized answer q and q 2 /* Simpe Random Samping */ : D R 2: for k =,..., n do 3: T k σ uid=uk (D) /* T k : records of user u k */ 4: 5: if T k do D R D R Tk 6: ese do 7: 8: T k random sampe records from T k D R D R T k /* Generate Private Item Counts */ 9: q (D R) compute count c i(d R) for every i : Output q (D R) = q (D R) + Lap( ɛ ) m /* Generate Private Edge Counts */ : q 2(D R) compute count c i,j(d R) for every i, j 2: Output q 2(D R) = q 2(D R) + Lap( ɛ 2 ) mh user coud repeatedy check in at the same ocation or purchase the same product. In that case, M may not be known without breaching individua user privacy. In order to mitigate the effect of very arge or unbounded individua data contribution, we propose to sampe the raw input dataset D and aow up to records per user in the samped database. Therefore, the individua contribution to the samped database is bounded by the fixed constant. The aggregate statistics wi be generated from the samped data and then perturbed correspondingy in order to guarantee differentia privacy. The samping technique used in our soution is simpe random samping without repacement, after which our soution is named. An outine of the agorithm is provided in Figure 3. Given the input database D and a predefined samping factor, the method generates a samped database D R by random samping without repacement at most records for each user in input database D. The samped database D R coud be different every time the agorithm is run, due to the randomness of samping. However, it is guaranteed that for every possibe sampe D R, any user coud have no more than records. The foowing emma estabishes the sensitivity of q and q 2 under such constraint. LEMMA 3 (SAMPLE SENSITIVITY). In the domain of D R, it hods that GS(q ) = and GS(q 2) =. Subsequenty, the method computes the query answers to q and q 2 from the samped database D R, where a individua count queries c i and c i,j are evauated based on the data records in D R. According to the Lapace mechanism, it is sufficient to add perturbation noise from Lap( ɛ ) to each item count c i(d R) to guarantee ɛ differentia privacy. Simiary, adding perturbation noise from Lap( ɛ 2 ) to each edge count c i,j(d R) guarantees ɛ 2 differentia privacy. The pseudocode of method is provided in Agorithm. 34
5 rid uid vid DayOfWeek r Aice Gym Monday r 2 Aice Mary s house Tuesday r 3 Aice de Young Museum Friday r 4 Aice Goden Gate Bridge Saturday Tabe 2: Exampe Checkin Records To sum up, injects ow Lapace noise into reeased query resuts, due to reduced sensitivity in the samped database. However, the accuracy of reeased query resuts is affected by ony using D R, a subset of the input data D. Intuitivey, the more we sampe from each user, the coser q (D R) and q 2(D R) are to the true resuts q (D) and q 2(D), respectivey, at the cost of a higher Lapace perturbation error to achieve differentia privacy. Beow we formay anayze the tradeoff between accuracy and privacy for query q to study the optima choice of. Simiar anaysis can be conducted for query q 2 and is thus omitted here for brevity. DEFINITION 4 (MEAN SQUARED ERROR). Let c i denote the noisy count reeased by q (D R) and c i denote the rea count computed by q (D), for each item V i. The Mean Squared Error of the noisy count c i is defined as foows: MSE( c i) = V ar( c i) + (Bias( c i, c i)) 2. (9) THEOREM 2. Given D R is a simpe random sampe of D and q (D R) = q (D R) + Lap( ɛ ) m, the vaue of that minimizes MSE is a monotonicay increasing function of ɛ 2. PROOF. See Appendix A. The above theorem provides a guideine to choose the vaue given the privacy budget ɛ : when the privacy budget is higher, we can afford to use more private data to overcome the error due to data oss; When privacy budget is imited, a sma number of data records shoud be taken from each user to reduce the perturbation error by the differentia privacy mechanism. 4.2 HandPicked Agorithm (HPA) Observing that a majority of data anaytica tasks depend on popuar paces or products, such as in traffic anaysis and recommendation services, data reated to popuar items shoud preferaby be preserved in the sampe database. In other words, some records generated by one user might be more usefu for data anaytics than the rest. The foowing exampe iustrates the concept of record usefuness. EXAMPLE. Tabe 2 iustrates Aice s checkin records in the raw database. Among the 4 paces Aice has been, de Young Museum and the Goden Gate Bridge are paces of interest and attract a arge number of visitors. On the other hand, gym and Mary s house are oca and persona to Aice and may not interest other peope. Therefore we consider r 3 and r 4 more usefu than r and r 2 for data anaytics. However, r and r 2 may be chosen by over r 3 and r 4, due to the simpe random samping procedure. From Exampe, it can be seen that r 3 and r 4 shoud be picked by the samping procedure over r and r 2, in order to generate meaningfu recommendation resuts. Therefore, we define the foowing popuaritybase utiity score for each private data record and propose to preserve records with highest scores for each user. DEFINITION 5 (UTILITY SCORE). Given record r and r.vid = V i, the utiity score of r is defined as foows: score(r) = p i () where p i represents the underying popuarity of item V i. Figure 4: Outine of HPA Agorithm rid uid vid DayOfWeek Utiity r 4 Aice Goden Gate Bridge Saturday.2 r 3 Aice de Young Museum Friday. r Aice Gym Monday. r 2 Aice Mary s house Tuesday. Tabe 3: Exampe Checkin Records Sorted by Utiity Score Note that the record utiity can be defined in other ways according to the target anaytica questions. Our choice of the popuaritybased measure is motivated by the tasks of discovering popuar paces or products, as we as the fact that popuar items are ess persona/sensitive to individua users. In order to maximize the utiity of the samped database, we propose to greediy pick up to records with highest utiity scores for each user. Note that a user s records with the same score wi have equa chance to be picked. The outine of HPA is provided in Figure 4. Beow we describe () private estimation of record utiity and (2) greedy samping procedure. Popuarity Estimation. For each item V i, the popuarity p i represents the probabiity of any record r having r.vid = V i, which is often estimated by the reative frequency of such records. However, the estimation of p i s from the private user data must not vioate the privacy guarantee. We present our privacypreserving utiity estimation in Agorithm 2 from Line to Line 7, which is outined in the upper haf of Figure 4. The utiity estimation is aso conducted on a samped database D E with samping factor d. D E is obtained by randomy choosing up to d records per user from the raw database D. We adopt randomy samping here because we do not have prior knowedge about the database at this point. The query q is computed based on D E and each count is perturbed with Lapace noise from Lap( d ɛ ). The perturbed counts { c i(d E)} are used to estimate the popuarity for each item V i by the foowing normaization: p i = max( c i(d E), ) m. () i= max( ci(de), ) Since the Lapace perturbation noise is a random variabe and therefore coud be negative, we repace the negative counts with s in computing item popuarity. The resuting p i is used to estimate the utiity score of each record r with r.vid = V i. The foowing emma estabishes the sensitivity of q and q 2 where each user can contribute up to d records. The proof is straightforward and is thus omitted. LEMMA 4. In the domain of D E, it hods that GS(q ) = d. Greedy Samping. The greedy samping procedure handpicks up to records with highest utiity scores among each user s data in 35
6 Agorithm 2 HandPicked Agorithm (HPA) Input: raw dataset D, samping factor Output: sanitized answer q and q 2 /* Popuarity Estimation */ /* Random Sampe */ : D E 2: for k =,..., n do 3: T k σ uid=uk (D) /* T k : records of user u k */ 4: Random sampe d record from T k, add to D E /* Generate Private Item Counts */ 5: q (D E) compute count c i(d E) for every i 6: q (D E) = q (D E) + Lap( d ɛ ) m /* Estimate Popuarity */ 7: p normaize histogram q (D E) /* Greedy Samping */ 8: D G 9: for k =,..., n do : T k σ uid=uk (D) : if T k do 2: D G D G Tk 3: ese do 4: for record r in T k do 5: assign score(r) = p i iff r.vid = V i 6: T k pick records with highest scores from T k 7: D G D G T k /* Generate Private Item Counts */ 8: q (D G) compute count c i(d G) for every i 9: Output q (D G) = q (D G) + Lap( ɛ ) m /* Generate Private Edge Counts */ 2: q 2(D G) compute count c i,j(d G) for every i, j 2: Output q 2(D G) = q 2(D G) + Lap( ɛ 2 ) mh D. The pseudo code is provided in Agorithm 2 from Line 8 to Line 7. Tabe 3 iustrates Aice s records sorted by utiity score. Since Gym and Mary s House do not interest greater pubic, their scores are ikey to be much ower than Goden Gate Bridge" and de Young Museum. Then the top records on the sorted ist wi be put in the samped database D G. This step is performed on every user s data in the raw database D. LEMMA 5. In the domain of D G, it hods that GS(q ) = and GS(q 2) =. After the greedy samping step, the resuts to q and q 2 wi be computed on the samped database D G. Each individua item count and edge count wi be perturbed by Lapace noise from Lap( ɛ ) and Lap( ɛ 2 ), respectivey. We wi provide proof of privacy guarantee in the next section. The advantage of HPA is that it greediy picks the most vauabe data records from each user, without increasing the sampe data size, i.e. records per user. The utiity of each data record is estimated privatey from the overa data distribution. Records with high utiity have higher chance to be picked by greedy samping. Since the samped data greaty depends on the reative usefuness among each user s records, it is difficut to anayze the accuracy of reeased counts. We wi empiricay evauate the effectiveness of this approach in Section PRIVACY GUARANTEE In this section, we prove that both and HPA agorithms are differentiay private. We begin with the foowing emma, which states that record samping on each user does not infict differentia privacy breach. Users 2,579 45,289 48,89 6,4 Items 5,2 7,967 7,7 3,76 D 739,6,276,988,48,57,,29 max T k 4,38,33 7, 2,34 avg T k min T k 2 Tabe 4: Data Sets Statistics LEMMA 6. Let A be an ɛdifferentiay private agorithm and S be a record samping procedure which is performed on each user individuay. A S is aso ɛdifferentiay private. PROOF. See Appendix B. THEOREM 3. satisfies (ɛ + ɛ 2)differentia privacy. PROOF. Let S rand, denote the random samping procedure in. S rand, is therefore a function that takes an raw database and outputs a samped database, i.e. S rand, : D D R. According to the Lapace mechanism and Lemma 3, q : D R R m is ɛ differentiay private. By the above Lemma 6, the item counts by, i.e. q S rand, : D R m is ɛ differentiay private. Simiary, q 2 : D R R mh is ɛ 2differentiay private. The edge counts by, i.e. q 2 S rand, : D R mh is aso ɛ 2differentiay private. Therefore, the overa computation satisfies (ɛ + ɛ 2)differentia privacy by Theorem. THEOREM 4. HPA satisfies (ɛ + ɛ + ɛ 2)differentia privacy. PROOF. Let S rand,d denote the random samping procedure in HPA for popuarity estimation, i.e. S rand,d : D D E. Let S grd, denote the greedy samping procedure, i.e. S grd, : D D G. According to the Lapace mechanism and Lemma 4, q : D E R m is ɛ differentiay private. By Lemma 6, the HPA popuarity estimation step, i.e. q S rand,d : D R m is ɛ differentiay private. Simiary, we can prove that the HPA item counts q S grd, : D R m is ɛ differentiay private, and the HPA edge counts q 2 S grd, : D R mh is ɛ 2differentiay private. Therefore, by Theorem, the overa HPA satisfies (ɛ + ɛ + ɛ 2)differentia privacy. 6. EXPERIMENTS Here we present a set of empirica studies. We compare our soutions and HPA with three existing approaches: ) LPA, the baseine method that injects Lapace perturbation noise to each count, 2) DFT, the Discrete Fourier Transform based agorithm proposed in [23], appied to a vector of counts, and 3) GS, the best method with grouping and smoothing proposed in [5], appied to count histograms. Given the overa privacy budget ɛ, we set ɛ,2 = ɛ ɛ for method, and ɛ = and ɛ,2 =.45ɛ for HPA 2 method. Without specuating about the optima privacy aocation, we set ɛ to a sma fraction of ɛ, because it is used to protect a sma sampe of private data for utiity score estimation. To achieve the same privacy guarantee, we appy LPA, DFT, and GS to item counts and edge counts separatey, with privacy budget ɛ for each 2 appication. Data sets. We conducted our empirica studies with four reaword data sets referred to as Gowaa, Foursquare, Netfix, and Movie Lens, each named after its data source. The first two data sets consist of ocation checkin records. Gowaa is coected among users based in Austin from Gowaa ocationbased socia network by 36
7 Berjani and Strufe [3] between June and October 2. Simiary, Foursquare is coected from Foursquare by Long et a. [8] between February and Juy 22. In these two data sets, each record contains a user, a ocation, and a checkin timestamp. Since a user can checkin at one ocation many times, the checkin data sets can represent a cass of services which vaue the returning behavior, such as buying or browsing. The other two data sets consist of movie ratings, where a movie may not be rated more than once by a user. Netfix is the training data set for the Netfix Prize competition. MovieLens is coected from users of MovieLens website 2. Each rating corresponds to a user, a movie, a rating score, and a timestamp. Moreover, MovieLens aso provides user demographic information, such as gender, age, occupation, and zipcode. The properties of the data sets are summarized in Tabe 4. Note that the minimum individua contribution in MovieLens is 2, as opposed to for other data sets. This is because MovieLens was initiay coected for personaized recommendation, thus users with fewer than 2 records were excuded from the pubished data set. Setup. We impemented our and HPA methods, as we as the baseine LPA and DFT in Java.We obtained Jave code of GS from the authors of [5]. A experiments were run on a 2.9GHz Inte Core i7 PC with 8GB RAM. Each setting was run 2 times and the average resut was reported. The defaut settings of parameters are summarized beow: the overa privacy ɛ =, the samping parameter d for HPA popuarity estimation d = min T k, the samping parameter = for Gowaa, Foursquare, and Netfix and = 3 for MovieLens. Our choice of parameter settings is guided by anaytica resuts and minima knowedge about the data sets and thus might not be optima. For LPA and DFT, we set M to be equa to max T k. However, this vaue may not be known a priori. Stricty speaking, M is unbounded for checkin appications. In this sense, we overestimate the performance of LPA and DFT. 6. HPAPrivate Popuarity Estimation We first examine the private popuarity estimation step of HPA method regarding the abiity to discover top popuar items from the noisy counts q (D E). Reca that D E is generated by randomy samping d records per user and the output of q (D E) is then perturbed with noise from Lap( d ɛ ) to guarantee privacy. Given a sma privacy budget ɛ, it is ony meaningfu to choose a sma d vaue for accuracy, according to Theorem 2. Therefore, we set d equa to the minimum individua contribution, i.e. min T k, in every data set. In this experiment, we sort a items according to q (D E) output and items with highest noisy counts are evauated against the ground truth discovered from the raw data set. Figure 5 reports the precision resuts with various vaues on Foursquare and Netfix data. As can be seen, from the output of q (D E), we are abe to discover more than 6% of top2 popuar ocations in Foursquare and 7% top2 popuar movies on Netfix. When ooking at =, the output of q (D E) captures 4% of the rea popuar ocations and amost 8% popuar movies. We concude that HPA popuarity estimation provides a soid step stone for subsequent greedy samping, at very sma cost of individua data as we as privacy. 6.2 Impact of Samping Factor Here we ook at the upper bound of individua data contribution required by our soutions and study its impact on the accuracy of q and q 2 output. Mean Squared Error(MSE) is adopted as the 2 MSE % 8% 6% 4% 2% q(de ) (a) Foursquare % 8% 6% 4% 2% q(de ) (b) Netfix Figure 5: Estimation of Item Popuarity by HPA 5 x HPA min error = (a) MSE of q MSE HPA min error = (b) MSE of q 2 Figure 6: Impact of with Foursquare Data Set metric for accuracy and is cacuated between the noisy output by our methods and the true resuts of q and q 2 from the raw input data D. We ran our and HPA methods varying the vaue of, in order to generate samped database D R and D G with different sizes. Figure 6(a) summarizes the resuts from Foursquare data for item counts, i.e. q, and Figure 6(b) for edge counts, i.e. q 2. In both figures of Figure 6, when vaue increases, the MSE of the noisy output by our methods first drops as samped database gets arger. For exampe, we observe a decreasing trend of MSE as is raised to 3 in Figure 6(a) and as is raised to 5 in Figure 6(b). Beyond these two points, when further increasing, the MSE grows due to the perturbation noise from Lap( ɛ ). Ceary, there is a tradeoff between sampe data size and the perturbation error. The optima vaue of depends on actua data distribution and the privacy parameter ɛ, according to Theorem 2. This set of resuts show that both and HPA achieve minimum MSE with reativey sma vaues, i.e. = 3 for q and = 5 for q 2. Our findings in Theorem 2 are confirmed and we concude that choosing a sma upper bound on individua data contribution is beneficia especiay when privacy budget is imited. 6.3 Comparison of Methods Here we compare our and HPA methods with existing approaches, i.e. LPA, DFT, and GS on a data sets. The utiity of item counts and edge counts reeased by a private mechanisms are evauated with three metrics. Note that for Gowaa, Foursquare, and Netfix data, each edge connects an item with a dayofweek, from Monday to Sunday. For MovieLens data set, each edge connects a movie with a (Gender, Age) pair. The domain of Gender is { M, F } and the domain of Age is { Under 25, 2534, Above 34 }. Beow we review the resuts regarding the reeased item counts and edge counts, for each utiity metric. Mean Squared Error (MSE). This metric provides a generic utiity comparison of different methods on the reeased counts. Figure 7(a) and Figure 8(a) summarize the MSE resuts for item counts and edge counts, respectivey. As can be seen, the baseine LPA yieds the highest error in both item counts and edge counts. The 37
8 MSE.E+9.E+8.E+7.E+6.E+5.E+4.E+3.E+2.E+.E+ (a) Mean Squared Error Ldivergence (b) LDivergence 2% % 8% 6% 4% 2% % (c) Top Figure 7: Utiity of Reeased Item Counts MSE.E+9.E+8.E+7.E+6.E+5.E+4.E+3.E+2.E+.E+ (a) Mean Squared Error Ldivergence % 8% 6% 4% 2% % (b) LDivergence (c) Average Top Figure 8: Utiity of Reeased Edge Counts GS method, as studied in the origina work [5], is no worse than DFT in every case except for MovieLens item counts. Our methods and HPA provide the owest MSE error except in three cases, i.e. Netfix item counts and MovieLens item/edge counts. This can be interpreted by the high average user contribution in these two data sets, where our methods infict more data oss by imiting individua data in the samped database. Ldivergence. The Ldivergence is a common metric widey used to measure the distance between two probabiity distributions. In this set of experiments, we consider the item/edge counts as data record distributions over the domain of items/edges. Both the reeased counts and origina counts are normaized to simuate probabiity distributions. Note that prior to that, zero or negative counts are repaced with. for continuity without generating many fase positives. We compute the Ldivergence of the reeased distribution with respect to the origina distribution for each query and present the resuts in Figure 7(b) and Figure 8(b). The reeased distributions by LPA are further from origina data distributions than those of other methods for every data set. As expected, DFT and GS preserve the count distributions we in genera, because: ) the DFT method is designed to capture major trends in data series, and 2) the GS method generates smooth distributions by grouping simiar coumns. However, in severa cases, those two methods fai to provide simiar distributions, e.g. on Gowaa and Netfix data. We beieve that their performance depends on the actua data distribution, i.e. whether significant trend or nearuniform grouping exists and can be we extracted. On the other hand, our soutions and HPA provide comparabe performance to the best existing methods, athough not optimized to preserve distributiona simiarities. Furthermore, constanty outperforms HPA in approximating the true distributions, thanks to the nature of simpe random samping technique. Top Discovery. In this set of experiments, we examine the quaity of top discovery retrieved by a privacypreserving mechanisms. For item counts, the top popuar items are evauated. For edge counts, the top popuar items associated with each attribute vaue are evauated and the average precision is reported, to simuate discoveries for each dayofweek and each user demographic group. In Figure 7(c), we observe that existing methods fai to preserve the most popuar items in any dataset. The reason is the baseine LPA suffers from high perturbation error, and DFT and GS yied oversmoothed reeased counts and thus cannot distinguish the most popuar items from those ranked next to them. When is arge enough, we wi see that their performance in top discovery sowy recovers in a subsequent experiment. On the other hand, our methods and HPA greaty outperform existing approaches and HPA even achieves % precision for Netfix data. Simiary, our methods show superior performance in Figure 8(c), with the absoute precision sighty dropped due to sparser data distributions. Overa, HPA outperforms by preserving user records with high popuarity scores. The ony exception where is better than HPA is in finding the top most popuar movies on MovieLens. The reason is that those users who contribute ess than 2 records were excuded from the data set and no movies were preferred by the majority of the rest users. As for finding top movies for each demographic group, HPA greaty improves over, since users within a demographic group show simiar interests. We further ook at top precision of the reeased item counts by a methods, with ranging from to. The resuts are provided in Figure 9. We can see that the performance of our greedy approach HPA is % when = and drops as increases, since the samping step ony picks a sma number of records, i.e. records, from each user with highest utiity score, i.e. item popuarity. Our random approach aso shows decreasing precision as increases, due to the data oss caused by samping. However, the decreasing rate is much sower compared to that of HPA, because records of a user have equa chance to be seected by random samping. On the contrary, LPA, DFT, and GS show % precision when = and higher precision as increases. We concude that and HPA can discover the most popuar items, superior to existing approaches up to =, but do not distinguish ess popuar items due to ack of information in the samped database. 38
9 HPA LPA DFT GS (a) Gowaa HPA LPA DFT GS 2 3 (c) Netfix HPA LPA DFT GS (b) Foursquare.4 HPA LPA.2 DFT GS 2 3 (d) MovieLens Figure 9: Comparison of Methods: Top Mining The existing approaches fai to distinguish the most popuar items, e.g. top, because of perturbation or the smoothing effect of their methods, but might provide good precision for arge, e.g Additiona Benefits Data Reduction. One beneficia side effect of imiting individua data contribution is the reduction of data storage space by generating anaytics from a samped database. Figure shows the number of records in the samped databases used by and HPA compared to that of the raw input. As can be seen, the samped data is much smaer than the raw input for every data set. For Netfix data set, our methods perform privacypreserving anaytics and generate usefu resuts on sampe databases with ess than 5% of the origina data, reducing the data storage requirement without compromising the utiity of output anaytics. Weeky Distribution. We aso examine the samped database by and HPA by the weeky distribution of data records. The percentage of Foursquare checkin records on each day of week is potted in Figure. As is shown, the percentage of Friday, Saturday, and Sunday checkins is higher in the samped databases generated by our methods than in the origina data set, whie the percentage of MondayThursday checkins is ower than the origina. Since the majority of the users are occasiona users and contribute ess than records, our methods preserve their data competey in the samped databases. We may infer that the occasiona users are more ikey to use the checkin service on FridaySunday. Moreover, the samped data is constanty coser to the origina data distribution, compared to HPA. We can further infer that users are more ikey to checkin popuar paces on FridaySunday. Movie Recommendation. A exampe of contextaware, finegrained recommendation is to suggest items based on the common interest demonstrated among the user group with simiar demographics, such as age and gender. We iustrate the top movie recommendation to mae users under the age of 25 with reeased edge counts by our soutions on MovieLens data set. The first coumn in Tabe 5 shows the top recommended movies using origina data, whie the second and third coumns ist movies recommended by our privacypreserving soutions. We observe that some movies NumbermofmRecords Percentage.E+9.E+8.E+7.E+6.E+5.E+4 25% 2% 5% % 5% % SampedmData RawmData Figure : Data Reduction orgina HPA SUN MON TUE WED THU FRI SAT Weekday Figure : Weeky Distribution with Foursquare Data Top Movies Output HPA Output American Beauty Phantasm II American Beauty Star Wars VI Marvin s Room Star Wars VI Star Wars V A Dogs Go to Heaven Terminator 2 The Matrix In the Line of Duty 2 Star Wars V Star Wars IV Star Wars V Jurassic Park Terminator 2 The Sumber Party Massacre III The Matrix Saving Private Ryan The Story of Xinghua Men in Back Jurassic Park American Beauty The Fugitive Star Wars I Shaft Braveheart Braveheart Star Wars I Saving Private Ryan Tabe 5: Movie Recommendations to Mae, Under 25. recommended by may not interest the target audience, such as Marvin s Room and The story of Xinghua. Furthermore, the top movie on ist, i.e. Phantasm II, is a horror movie and not suitabe for underage audience. On the other hand, the movies recommended by HPA are quite consistent with the origina top except for two movies, i.e. Men in Back" and The Fugitive", which may interest the target audience as we. We beieve that HPA captures more information by greedy samping and thus can make better recommendations than, especiay when users have very diverse interests. 7. CONCLUSION AND DISCUSSION We have proposed a practica framework for privacypreserving data anaytics by samping a fixed number of records from each user. We have presented two soutions, i.e. and HPA, which impement the framework with different samping techniques. Our soutions do not require the input data be preprocessed, such as removing users with arge or itte data. The output anaysis resuts are highy accurate for performing top discovery and contextaware recommendations, cosing the utiity gap between no privacy and existing differentiay private techniques. Our soutions benefit from samping techniques that reduce the individua data contribution to a sma constant factor,, and thus reducing the perturbation error inficted by differentia privacy. We provided anaysis resuts about the optima samping factor with respect to the privacy requirement. We formay proved that both mechanisms satisfy ɛdifferentia privacy. Empirica studies with reaword data sets confirm that our soutions enabe accurate data anaytics on a 39
10 sma fraction of the input data, reducing user privacy cost and data storage requirement without compromising utiity. Potentia future work may incude the design of a hybrid approach between and HPA which coud have the benefits of both. For reatime appications, we woud ike to consider how to dynamicay sampe user generated data, in order to further reduce the data storage requirement. Another direction is to appy the proposed samping framework to soving more compex data anaytica tasks, which might invove mutipe, overapping count queries or other statistica queries. 8. ACNOWLEDGMENTS We thank the anonymous reviewers for the detaied and hepfu comments to the manuscript. 9. REFERENCES [] M. Barbaro and T. Zeer. A face is exposed for ao searcher no The New York Times, Aug. 26. [2] J. Bennett and S. Lanning. The netfix prize. In Proceedings of DD cup and workshop, voume 27, page 35, 27. [3] B. Berjani and T. Strufe. A recommendation system for spots in ocationbased onine socia networks. In Proceedings of the 4th Workshop on Socia Network Systems, SNS, pages 4: 4:6, New York, NY, USA, 2. ACM. [4] A. Bum,. Ligett, and A. Roth. A earning theory approach to noninteractive database privacy. In Proceedings of the 4th annua ACM symposium on Theory of computing, pages 69 68, New York, 28. ACM. [5] T.H. H. Chan, E. Shi, and D. Song. Private and continua reease of statistics. ACM Trans. Inf. Syst. Secur., 4(3):26: 26:24, Nov. 2. [6]. Chaudhuri and N. Mishra. When random samping preserves privacy. In Proceedings of the 26th annua internationa conference on Advances in Cryptoogy, CRYPTO 6, pages 98 23, Berin, Heideberg, 26. SpringerVerag. [7] R. Chen, G. Acs, and C. Casteuccia. Differentiay private sequentia data pubication via variabeength ngrams. In Proceedings of the 22 ACM conference on Computer and communications security, CCS 2, pages , 22. [8] G. Cormode, C. Procopiuc, D. Srivastava, and T. T. L. Tran. Differentiay private summaries for sparse data. In Proceedings of the 5th Internationa Conference on Database Theory, ICDT 2, pages 299 3, New York, NY, USA, 22. ACM. [9] Y.A. de Montjoye, C. A. Hidago, M. Vereysen, and V. D. Bonde. Unique in the Crowd: The privacy bounds of human mobiity. Scientific Reports, Mar. [] C. Dwork. Differentia privacy. In M. Bugiesi, B. Prenee, V. Sassone, and I. Wegener, editors, Automata, Languages and Programming, voume 452 of Lecture Notes in Computer Science, pages 2. Springer Berin Heideberg, 26. [] C. Dwork,. enthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourseves: privacy via distributed noise generation. In Proceedings of the 24th annua internationa conference on The Theory and Appications of Cryptographic Techniques, EUROCRYPT 6, pages , Berin, Heideberg, 26. SpringerVerag. [2] C. Dwork, F. Mcsherry,. Nissim, and A. Smith. Caibrating noise to sensitivity in private data anaysis. In In Proceedings of the 3rd Theory of Cryptography Conference, pages , Heideberg, 26. SpringerVerag. [3] L. Fan and L. Xiong. An adaptive approach to reatime aggregate monitoring with differentia privacy. nowedge and Data Engineering, IEEE Transactions on, 26(9):294 26, Sept 24. [4] Y. Hong, J. Vaidya, H. Lu, and M. Wu. Differentiay private search og sanitization with optima output utiity. In Proceedings of the 5th Internationa Conference on Extending Database Technoogy, EDBT 2, pages 5 6, New York, NY, USA, 22. ACM. [5] G. earis and S. Papadopouos. Practica differentia privacy via grouping and smoothing. In Proceedings of the 39th internationa conference on Very Large Data Bases, PVLDB 3, pages 3 32, 23. [6] A. oroova,. enthapadi, N. Mishra, and A. Ntouas. Reeasing search queries and cicks privatey. In Proceedings of the 8th internationa conference on Word wide web, WWW 9, pages 7 8, 29. [7] N. Li, W. Qardaji, and D. Su. On samping, anonymization, and differentia privacy or, kanonymization meets differentia privacy. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, ASIACCS 2, pages 32 33, 22. [8] X. Long, L. Jin, and J. Joshi. Towards understanding traveer behavior in ocationbased socia networks. In Goba Communications Conference (GLOBECOM), 23 IEEE, 23. [9] A. Machanavajjhaa, D. ifer, J. Abowd, J. Gehrke, and L. Vihuber. Privacy: Theory meets practice on the map. In Data Engineering, 28. ICDE 28. IEEE 24th Internationa Conference on, pages , 28. [2] F. McSherry. Privacy integrated queries: an extensibe patform for privacypreserving data anaysis. voume 53, pages 89 97, 2. [2]. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and samping in private data anaysis. In Proceedings of the thirtyninth annua ACM symposium on Theory of computing, STOC 7, pages 75 84, New York, NY, USA, 27. ACM. [22] D. Proserpio, S. Godberg, and F. McSherry. Caibrating data to sensitivity in private data anaysis: A patform for differentiayprivate anaysis of weighted datasets. Proc. VLDB Endow., 7(8): , Apr. 24. [23] V. Rastogi and S. Nath. Differentiay private aggregation of distributed timeseries with transformation and encryption. In Proceedings of the 2 ACM SIGMOD Internationa Conference on Management of data, pages , 2. [24] G. Yuan, Z. Zhang, M. Winsett, X. Xiao, Y. Yang, and Z. Hao. Lowrank mechanism: optimizing batch queries under differentia privacy. Proc. VLDB Endow., 5(): , Juy 22. [25] C. Zeng, J. F. Naughton, and J.Y. Cai. On differentiay private frequent itemset mining. Proc. VLDB Endow., 6():25 36, Nov
11 APPENDIX A. PROOF OF THEOREM 2 PROOF. For item V i, et c i denote the true count computed by q from the sampe D R. Therefore, the noisy count c i is derived by adding a Lapace noise to c i as foows: The MSE of c i can be rewritten as: c i = c i + ν i, (2) ν i Lapace(, /ɛ ). (3) MSE( c i) = V ar(c i + ν i) + (E(c i + ν i c i)) 2 = V ar(c i) + V ar(ν i) + (E(c i) E(c i)) 2. (4) Note that c i and ν i are mutuay independent. Let p i denote the popuarity of item V i, i.e. the probabiity of any record having vid = V i. For simpicity, we assume that users are mutuay independent, records are mutuay independent, and every user has M records in the raw data set D. To obtain D R, records out of M are randomy chosen for each user in D. Thus for any item V i, c i can be represented as the sum of independent random variabes: n c i = δ r,i (5) k= r T k { if r.vid = Vi & r D δ r,i = R, (6) otherwise. The event of δ r,i = is equivaent to the event of record r is about V i and r is samped in D R by chance: P r[δ r,i = ] = P r[r.vid = V i & r D R] = p i M. (7) Therefore, we can obtain the foowing expectation and variance for c i: n E(c i) = E(δ r,i) (8) r T k k= = n p i k= r T k M = np i We concude that the optima vaue is a monotonicay increasing function of ɛ 2. B. PROOF OF LEMMA 6 PROOF. By definition of differentia privacy, we are to prove that for any neighboring raw databases D and D 2, A S satisfies the foowing inequaity for D Range(A S): P r[a S(D ) = D] e ɛ P r[a S(D 2) = D]. (23) Without oss of generaity, we assume D 2 contains one more user than D. Let u denote the user that is contained in D 2 but not D and T be user u s set of records in D 2. By definition of neighboring databases, we can rewrite D 2 = D T 3. Let ˆD denote any possibe samping output of S(D ). We have: P r[a S(D ) = D] = ˆD = ˆD P r[a S(D ) = D S(D ) = ˆD ]P r[s(d ) = ˆD ] P r[a( ˆD ) = D]P r[s(d ) = ˆD ] (24) Let ˆT denote any possibe samping output of S(T ). We note that ˆT can take vaues from the entire domain, in genera: P r[s(t ) = ˆT ] =. (25) ˆT Since S is performed independenty on each user, we can derive: P r[s(d ) = ˆD ] = ˆT = ˆT P r[s(d ) = ˆD ]P r[s(t ) = ˆT ] P r[s(d T ) = ˆD ˆT ]. (26) Note that since D and T are disjoint, the samping output on D and T are aso independent and disjoint. Therefore, P r[a S(D ) = D] = ˆD P r[a( ˆD ) = D] ˆT P r[s(d T ) = ˆD ˆT ] V ar(c i) = = n n k= p i k= r T k r T k V ar(δ r,i) (9) ( pi M M ) = np i( p i M ) Simiary, we can obtain the expectation of c i: E(c i) = nmp i. (2) From the above resuts, we can rewrite Equation 4 as foows: MSE( c i) = np i( p i M ) (np i nmp i) 2 (2) and we can perform the standard east square method to minimize the MSE. The optima vaue is thus: ɛ 2 2n 2 p 2 i M np i = 4/ɛ 2 2np2 i /M + 2n2 p 2 i (22) = ˆD, ˆT P r[a( ˆD ) = D]P r[s(d T ) = ˆD ˆT ] ˆD, ˆT = e ɛ ˆD2 e ɛ P r[a( ˆD ˆT ) = D]P r[s(d T ) = ˆD ˆT ] (27) P r[a( ˆD 2) = D]P r[s(d 2) = ˆD 2] (28) = e ɛ P r[a S(D 2) = D]. (29) Line 27 is due to the fact that A is ɛdifferentiay private and ˆD and ˆD ˆT are neighboring databases. In ine 28 we change notation and et ˆD 2 represent ˆD ˆT. The proof is hence compete. 3 is used to denote a coproduct, or disjoint union of two databases. 32
Secure Network Coding with a Cost Criterion
Secure Network Coding with a Cost Criterion Jianong Tan, Murie Médard Laboratory for Information and Decision Systems Massachusetts Institute of Technoogy Cambridge, MA 0239, USA Emai: {jianong, medard}@mit.edu
More informationFace Hallucination and Recognition
Face Haucination and Recognition Xiaogang Wang and Xiaoou Tang Department of Information Engineering, The Chinese University of Hong Kong {xgwang1, xtang}@ie.cuhk.edu.hk http://mmab.ie.cuhk.edu.hk Abstract.
More informationTeamwork. Abstract. 2.1 Overview
2 Teamwork Abstract This chapter presents one of the basic eements of software projects teamwork. It addresses how to buid teams in a way that promotes team members accountabiity and responsibiity, and
More informationFinance 360 Problem Set #6 Solutions
Finance 360 Probem Set #6 Soutions 1) Suppose that you are the manager of an opera house. You have a constant margina cost of production equa to $50 (i.e. each additiona person in the theatre raises your
More informationAustralian Bureau of Statistics Management of Business Providers
Purpose Austraian Bureau of Statistics Management of Business Providers 1 The principa objective of the Austraian Bureau of Statistics (ABS) in respect of business providers is to impose the owest oad
More informationBetting Strategies, Market Selection, and the Wisdom of Crowds
Betting Strategies, Market Seection, and the Wisdom of Crowds Wiemien Kets Northwestern University wkets@keogg.northwestern.edu David M. Pennock Microsoft Research New York City dpennock@microsoft.com
More informationMultiRobot Task Scheduling
Proc of IEEE Internationa Conference on Robotics and Automation, Karsruhe, Germany, 013 MutiRobot Tas Scheduing Yu Zhang and Lynne E Parer Abstract The scheduing probem has been studied extensivey in
More informationA Similarity Search Scheme over Encrypted Cloud Images based on Secure Transformation
A Simiarity Search Scheme over Encrypted Coud Images based on Secure Transormation Zhihua Xia, Yi Zhu, Xingming Sun, and Jin Wang Jiangsu Engineering Center o Network Monitoring, Nanjing University o Inormation
More informationAdvanced ColdFusion 4.0 Application Development  3  Server Clustering Using Bright Tiger
Advanced CodFusion 4.0 Appication Deveopment  CH 3  Server Custering Using Bri.. Page 1 of 7 [Figures are not incuded in this sampe chapter] Advanced CodFusion 4.0 Appication Deveopment  3  Server
More informationBiteSize Steps to ITIL Success
7 BiteSize Steps to ITIL Success Pus making a Business Case for ITIL! Do you want to impement ITIL but don t know where to start? 7 BiteSize Steps to ITIL Success can hep you to decide whether ITIL can
More informationELEVATING YOUR GAME FROM TRADE SPEND TO TRADE INVESTMENT
Initiatives Strategic Mapping Success in The Food System: Discover. Anayze. Strategize. Impement. Measure. ELEVATING YOUR GAME FROM TRADE SPEND TO TRADE INVESTMENT Foodservice manufacturers aocate, in
More informationThe Radix4 and the Class of Radix2 s FFTs
Chapter 11 The Radix and the Cass of Radix s FFTs The divideandconuer paradigm introduced in Chapter 3 is not restricted to dividing a probem into two subprobems. In fact, as expained in Section. and
More informationPracticing Reference... Learning from Library Science *
Practicing Reference... Learning from Library Science * Mary Whisner ** Ms. Whisner describes the method and some of the resuts reported in a recenty pubished book about the reference interview written
More informationNetwork/Communicational Vulnerability
Automated teer machines (ATMs) are a part of most of our ives. The major appea of these machines is convenience The ATM environment is changing and that change has serious ramifications for the security
More informationTERM INSURANCE CALCULATION ILLUSTRATED. This is the U.S. Social Security Life Table, based on year 2007.
This is the U.S. Socia Security Life Tabe, based on year 2007. This is avaiabe at http://www.ssa.gov/oact/stats/tabe4c6.htm. The ife eperiences of maes and femaes are different, and we usuay do separate
More information3.3 SOFTWARE RISK MANAGEMENT (SRM)
93 3.3 SOFTWARE RISK MANAGEMENT (SRM) Fig. 3.2 SRM is a process buit in five steps. The steps are: Identify Anayse Pan Track Resove The process is continuous in nature and handed dynamicay throughout ifecyce
More informationPayondelivery investing
Payondeivery investing EVOLVE INVESTment range 1 EVOLVE INVESTMENT RANGE EVOLVE INVESTMENT RANGE 2 Picture a word where you ony pay a company once they have deivered Imagine striking oi first, before
More informationPricing and Revenue Sharing Strategies for Internet Service Providers
Pricing and Revenue Sharing Strategies for Internet Service Providers Linhai He and Jean Warand Department of Eectrica Engineering and Computer Sciences University of Caifornia at Berkeey {inhai,wr}@eecs.berkeey.edu
More informationFixed income managers: evolution or revolution
Fixed income managers: evoution or revoution Traditiona approaches to managing fixed interest funds rey on benchmarks that may not represent optima risk and return outcomes. New techniques based on separate
More informationLeakage detection in water pipe networks using a Bayesian probabilistic framework
Probabiistic Engineering Mechanics 18 (2003) 315 327 www.esevier.com/ocate/probengmech Leakage detection in water pipe networks using a Bayesian probabiistic framework Z. Pouakis, D. Vaougeorgis, C. Papadimitriou*
More informationeffect on major accidents
An Investigation into a weekend (or bank hoiday) effect on major accidents Nicoa C. Heaey 1 and Andrew G. Rushton 2 1 Heath and Safety Laboratory, Harpur Hi, Buxton, Derbyshire, SK17 9JN 2 Hazardous Instaations
More informationFast Robust Hashing. ) [7] will be remapped (and therefore discarded), due to the loadbalancing property of hashing.
Fast Robust Hashing Manue Urueña, David Larrabeiti and Pabo Serrano Universidad Caros III de Madrid E89 Leganés (Madrid), Spain Emai: {muruenya,darra,pabo}@it.uc3m.es Abstract As statefu fowaware services
More informationChapter 3: ebusiness Integration Patterns
Chapter 3: ebusiness Integration Patterns Page 1 of 9 Chapter 3: ebusiness Integration Patterns "Consistency is the ast refuge of the unimaginative." Oscar Wide In This Chapter What Are Integration Patterns?
More informationeg Enterprise vs. a Big 4 Monitoring Soution: Comparing Tota Cost of Ownership Restricted Rights Legend The information contained in this document is confidentia and subject to change without notice. No
More informationOligopoly in Insurance Markets
Oigopoy in Insurance Markets June 3, 2008 Abstract We consider an oigopoistic insurance market with individuas who differ in their degrees of accident probabiities. Insurers compete in coverage and premium.
More informationLoad Balancing in Distributed Web Server Systems with Partial Document Replication *
Load Baancing in Distributed Web Server Systems with Partia Document Repication * Ling Zhuo ChoLi Wang Francis C. M. Lau Department of Computer Science and Information Systems The University of Hong Kong
More informationDistribution of Income Sources of Recent Retirees: Findings From the New Beneficiary Survey
Distribution of Income Sources of Recent Retirees: Findings From the New Beneficiary Survey by Linda Drazga Maxfied and Virginia P. Rena* Using data from the New Beneficiary Survey, this artice examines
More informationOrdertoCash Processes
TMI170 ING info pat 2:Info pat.qxt 01/12/2008 09:25 Page 1 Section Two: OrdertoCash Processes Gregory Cronie, Head Saes, Payments and Cash Management, ING O rdertocash and purchasetopay processes
More informationSimultaneous Routing and Power Allocation in CDMA Wireless Data Networks
Simutaneous Routing and Power Aocation in CDMA Wireess Data Networks Mikae Johansson *,LinXiao and Stephen Boyd * Department of Signas, Sensors and Systems Roya Institute of Technoogy, SE 00 Stockhom,
More informationCONTRIBUTION OF INTERNAL AUDITING IN THE VALUE OF A NURSING UNIT WITHIN THREE YEARS
Dehi Business Review X Vo. 4, No. 2, Juy  December 2003 CONTRIBUTION OF INTERNAL AUDITING IN THE VALUE OF A NURSING UNIT WITHIN THREE YEARS John N.. Var arvatsouakis atsouakis DURING the present time,
More informationPricing Internet Services With Multiple Providers
Pricing Internet Services With Mutipe Providers Linhai He and Jean Warand Dept. of Eectrica Engineering and Computer Science University of Caifornia at Berkeey Berkeey, CA 94709 inhai, wr@eecs.berkeey.edu
More informationArt of Java Web Development By Neal Ford 624 pages US$44.95 Manning Publications, 2004 ISBN: 1932394060
IEEE DISTRIBUTED SYSTEMS ONLINE 15414922 2005 Pubished by the IEEE Computer Society Vo. 6, No. 5; May 2005 Editor: Marcin Paprzycki, http://www.cs.okstate.edu/%7emarcin/ Book Reviews: Java Toos and Frameworks
More information(David H T Lan) Secretary for Home Affairs
Message We sha make every effort to strengthen the community buiding programme which serves to foster among the peope of Hong Kong a sense of beonging and mutua care. We wi continue to impement the District
More informationRicoh Healthcare. Process Optimized. Healthcare Simplified.
Ricoh Heathcare Process Optimized. Heathcare Simpified. Rather than a destination that concudes with the eimination of a paper, the Paperess Maturity Roadmap is a continuous journey to strategicay remove
More informationLet s get usable! Usability studies for indexes. Susan C. Olason. Study plan
Let s get usabe! Usabiity studies for indexes Susan C. Oason The artice discusses a series of usabiity studies on indexes from a systems engineering and human factors perspective. The purpose of these
More informationA Description of the California Partnership for LongTerm Care Prepared by the California Department of Health Care Services
2012 Before You Buy A Description of the Caifornia Partnership for LongTerm Care Prepared by the Caifornia Department of Heath Care Services Page 1 of 13 Ony ongterm care insurance poicies bearing any
More informationVendor Performance Measurement Using Fuzzy Logic Controller
The Journa of Mathematics and Computer Science Avaiabe onine at http://www.tjmcs.com The Journa of Mathematics and Computer Science Vo.2 No.2 (2011) 311318 Performance Measurement Using Fuzzy Logic Controer
More informationDynamic Pricing Trade Market for Shared Resources in IIU Federated Cloud
Dynamic Pricing Trade Market or Shared Resources in IIU Federated Coud Tongrang Fan 1, Jian Liu 1, Feng Gao 1 1Schoo o Inormation Science and Technoogy, Shiiazhuang Tiedao University, Shiiazhuang, 543,
More informationWHITE PAPER UndERsTAndIng THE VAlUE of VIsUAl data discovery A guide To VIsUAlIzATIons
Understanding the Vaue of Visua Data Discovery A Guide to Visuaizations WHITE Tabe of Contents Executive Summary... 3 Chapter 1  Datawatch Visuaizations... 4 Chapter 2  Snapshot Visuaizations... 5 Bar
More informationMaintenance activities planning and grouping for complex structure systems
Maintenance activities panning and grouping for compex structure systems Hai Canh u, Phuc Do an, Anne Barros, Christophe Berenguer To cite this version: Hai Canh u, Phuc Do an, Anne Barros, Christophe
More informationArtificial neural networks and deep learning
February 20, 2015 1 Introduction Artificia Neura Networks (ANNs) are a set of statistica modeing toos originay inspired by studies of bioogica neura networks in animas, for exampe the brain and the centra
More informationLife Contingencies Study Note for CAS Exam S. Tom Struppeck
Life Contingencies Study Note for CAS Eam S Tom Struppeck (Revised 9/19/2015) Introduction Life contingencies is a term used to describe surviva modes for human ives and resuting cash fows that start or
More informationWHITE PAPER BEsT PRAcTIcEs: PusHIng ExcEl BEyond ITs limits WITH InfoRmATIon optimization
Best Practices: Pushing Exce Beyond Its Limits with Information Optimization WHITE Best Practices: Pushing Exce Beyond Its Limits with Information Optimization Executive Overview Microsoft Exce is the
More informationA Latent Variable Pairwise Classification Model of a Clustering Ensemble
A atent Variabe Pairwise Cassification Mode of a Custering Ensembe Vadimir Berikov Soboev Institute of mathematics, Novosibirsk State University, Russia berikov@math.nsc.ru http://www.math.nsc.ru Abstract.
More informationEarly access to FAS payments for members in poor health
Financia Assistance Scheme Eary access to FAS payments for members in poor heath Pension Protection Fund Protecting Peope s Futures The Financia Assistance Scheme is administered by the Pension Protection
More informationA Supplier Evaluation System for Automotive Industry According To Iso/Ts 16949 Requirements
A Suppier Evauation System for Automotive Industry According To Iso/Ts 16949 Requirements DILEK PINAR ÖZTOP 1, ASLI AKSOY 2,*, NURSEL ÖZTÜRK 2 1 HONDA TR Purchasing Department, 41480, Çayırova  Gebze,
More informationREADING A CREDIT REPORT
Name Date CHAPTER 6 STUDENT ACTIVITY SHEET READING A CREDIT REPORT Review the sampe credit report. Then search for a sampe credit report onine, print it off, and answer the questions beow. This activity
More informationBusiness schools are the academic setting where. The current crisis has highlighted the need to redefine the role of senior managers in organizations.
c r o s os r oi a d s REDISCOVERING THE ROLE OF BUSINESS SCHOOLS The current crisis has highighted the need to redefine the roe of senior managers in organizations. JORDI CANALS Professor and Dean, IESE
More informationMarket Design & Analysis for a P2P Backup System
Market Design & Anaysis for a P2P Backup System Sven Seuken Schoo of Engineering & Appied Sciences Harvard University, Cambridge, MA seuken@eecs.harvard.edu Denis Chares, Max Chickering, Sidd Puri Microsoft
More informationBetting on the Real Line
Betting on the Rea Line Xi Gao 1, Yiing Chen 1,, and David M. Pennock 2 1 Harvard University, {xagao,yiing}@eecs.harvard.edu 2 Yahoo! Research, pennockd@yahooinc.com Abstract. We study the probem of designing
More informationBusiness Banking. A guide for franchises
Business Banking A guide for franchises Hep with your franchise business, right on your doorstep A true understanding of the needs of your business: that s what makes RBS the right choice for financia
More informationeye talk DIGITAL Contents
eye tak DIGITAL Eye Tak Digita subscribers are abe to downoad a computer fie containing the atest product and price changes, as we as other vauabe resources for the management and deveopment of their practices.
More informationGREEN: An Active Queue Management Algorithm for a Self Managed Internet
: An Active Queue Management Agorithm for a Sef Managed Internet Bartek Wydrowski and Moshe Zukerman ARC Specia Research Centre for UtraBroadband Information Networks, EEE Department, The University of
More informationChapter 3: JavaScript in Action Page 1 of 10. How to practice reading and writing JavaScript on a Web page
Chapter 3: JavaScript in Action Page 1 of 10 Chapter 3: JavaScript in Action In this chapter, you get your first opportunity to write JavaScript! This chapter introduces you to JavaScript propery. In addition,
More informationTechnology and Consulting  Newsletter 1. IBM. July 2013
Technoogy and Consuting  Newsetter Juy 2013 Wecome to Latitude Executive Consuting s atest newsetter, reviewing recent marketpace activity. The newsetter focuses on the Technoogy and Consuting sectors,
More informationIntroduction the pressure for efficiency the Estates opportunity
Heathy Savings? A study of the proportion of NHS Trusts with an inhouse Buidings Repair and Maintenance workforce, and a discussion of eary experiences of Suppies efficiency initiatives Management Summary
More informationThe guaranteed selection. For certainty in uncertain times
The guaranteed seection For certainty in uncertain times Making the right investment choice If you can t afford to take a ot of risk with your money it can be hard to find the right investment, especiay
More informationSPOTLIGHT. A year of transformation
WINTER ISSUE 2014 2015 SPOTLIGHT Wecome to the winter issue of Oasis Spotight. These newsetters are designed to keep you uptodate with news about the Oasis community. This quartery issue features an artice
More informationA New Statistical Approach to Network Anomaly Detection
A New Statistica Approach to Network Anomay Detection Christian Caegari, Sandrine Vaton 2, and Michee Pagano Dept of Information Engineering, University of Pisa, ITALY Emai: {christiancaegari,mpagano}@ietunipiit
More informationMARKETING INFORMATION SYSTEM (MIS)
LESSON 4 MARKETING INFORMATION SYSTEM (MIS) CONTENTS 4.0 Aims and Objectives 4.1 Introduction 4.2 MIS 4.2.1 Database 4.2.2 Interna Records 4.2.3 Externa Sources 4.3 Computer Networks and Internet 4.4 Data
More informationRisk Margin for a NonLife Insurance RunOff
Risk Margin for a NonLife Insurance RunOff Mario V. Wüthrich, Pau Embrechts, Andreas Tsanakas February 2, 2011 Abstract For sovency purposes insurance companies need to cacuate socaed bestestimate
More informationNCH Software FlexiServer
NCH Software FexiServer This user guide has been created for use with FexiServer Version 1.xx NCH Software Technica Support If you have difficuties using FexiServer pease read the appicabe topic before
More informationAvaya Remote Feature Activation (RFA) User Guide
Avaya Remote Feature Activation (RFA) User Guide 03300149 Issue 5.0 September 2007 2007 Avaya Inc. A Rights Reserved. Notice Whie reasonabe efforts were made to ensure that the information in this document
More informationConsidering Dynamic, NonTextual Content when Migrating Digital Asset Management Systems
Considering Dynamic, NonTextua Content when Migrating Digita Asset Management Systems Aya Stein; University of Iinois at UrbanaChampaign; Urbana, Iinois USA Santi Thompson; University of Houston; Houston,
More informationMICROSOFT DYNAMICS CRM
biztech TM MICROSOFT DYNAMICS CRM Experienced professionas, proven toos and methodoogies, tempates, acceerators and vertica specific soutions maximizing the vaue of your Customer Reationships Competency
More informationRisk Margin for a NonLife Insurance RunOff
Risk Margin for a NonLife Insurance RunOff Mario V. Wüthrich, Pau Embrechts, Andreas Tsanakas August 15, 2011 Abstract For sovency purposes insurance companies need to cacuate socaed bestestimate reserves
More informationAssessing Network Vulnerability Under Probabilistic Region Failure Model
2011 IEEE 12th Internationa Conference on High Performance Switching and Routing Assessing Networ Vunerabiity Under Probabiistic Region Faiure Mode Xiaoiang Wang, Xiaohong Jiang and Achie Pattavina State
More informationCERTIFICATE COURSE ON CLIMATE CHANGE AND SUSTAINABILITY. Course Offered By: Indian Environmental Society
CERTIFICATE COURSE ON CLIMATE CHANGE AND SUSTAINABILITY Course Offered By: Indian Environmenta Society INTRODUCTION The Indian Environmenta Society (IES) a dynamic and fexibe organization with a goba vision
More informationDriving Accountability Through Disciplined Planning with Hyperion Planning and Essbase
THE OFFICIAL PUBLICATION OF THE Orace Appications USERS GROUP summer 2012 Driving Accountabiity Through Discipined Panning with Hyperion Panning and Essbase Introduction to Master Data and Master Data
More informationThe eg Suite Enabing ReaTime Monitoring and Proactive Infrastructure Triage White Paper Restricted Rights Legend The information contained in this document is confidentia and subject to change without
More informationOnline Supplement for The Robust Network Loading Problem under Hose Demand Uncertainty: Formulation, Polyhedral Analysis, and Computations
Onine Suppement for The Robust Network Loading Probem under Hose Demand Uncertaint: Formuation, Pohedra Anasis, and Computations Aşegü Atın Department of Industria Engineering, TOBB Universit of Economics
More informationJournal of Economic Behavior & Organization
Journa of Economic Behavior & Organization 85 (23 79 96 Contents ists avaiabe at SciVerse ScienceDirect Journa of Economic Behavior & Organization j ourna ho me pag e: www.esevier.com/ocate/j ebo Heath
More informationEducation sector: Working conditions and job quality
European Foundation for the Improvement of Living and Working Conditions sector: Working conditions and job quaity Work pays a significant roe in peope s ives, in the functioning of companies and in society
More informationNatWest Global Employee Banking Eastwood House Glebe Road Chelmsford Essex England CM1 1RS Depot Code 028
To appy for this account, the printed appication must be competed and returned together with any necessary supporting documentation to the foowing address: NatWest Goba Empoyee Banking Eastwood House Gebe
More informationDesign Considerations
Chapter 2: Basic Virtua Private Network Depoyment Page 1 of 12 Chapter 2: Basic Virtua Private Network Depoyment Before discussing the features of Windows 2000 tunneing technoogy, it is important to estabish
More informationGWPD 4 Measuring water levels by use of an electric tape
GWPD 4 Measuring water eves by use of an eectric tape VERSION: 2010.1 PURPOSE: To measure the depth to the water surface beow andsurface datum using the eectric tape method. Materias and Instruments 1.
More informationApplication and Desktop Virtualization
Appication and Desktop Virtuaization Content 1) Why Appication and Desktop Virtuaization 2) Some terms reated to vapp and vdesktop 3) Appication and Desktop Deivery 4) Appication Virtuaization 5) Type
More informationManaging Business Risks from Major Chemical
Managing Business Risks from Major Chemica Process Accidents Mariana Bardy 1, Dr Luiz Fernando Oiveira 2, and Dr Nic Cavanagh 3 1 Head of Section, Risk Management Soutions Savador, DNV Energy Soutions
More informationMeasuring operational risk in financial institutions
Measuring operationa risk in financia institutions Operationa risk is now seen as a major risk for financia institutions. This paper considers the various methods avaiabe to measure operationa risk, and
More informationDigitalKitbag. Email marketing
Emai marketing Who are Digita Kitbag? We re your business marketing team Digita Kitbag is owned and operated by Johnston Press, one of the argest regiona media pubishers in the UK and Ireand. We have a
More informationNormalization of Database Tables. Functional Dependency. Examples of Functional Dependencies: So Now what is Normalization? Transitive Dependencies
ISM 602 Dr. Hamid Nemati Objectives The idea Dependencies Attributes and Design Understand concepts normaization (HigherLeve Norma Forms) Learn how to normaize tabes Understand normaization and database
More informationWith the arrival of Java 2 Micro Edition (J2ME) and its industry
Knowedgebased Autonomous Agents for Pervasive Computing Using AgentLight Fernando L. Koch and JohnJues C. Meyer Utrecht University Project AgentLight is a mutiagent systembuiding framework targeting
More informationApplicationAware Data Collection in Wireless Sensor Networks
AppicationAware Data Coection in Wireess Sensor Networks Xiaoin Fang *, Hong Gao *, Jianzhong Li *, and Yingshu Li +* * Schoo of Computer Science and Technoogy, Harbin Institute of Technoogy, Harbin,
More informationST. MARKS CONFERENCE FACILITY MARKET ANALYSIS
ST. MARKS CONFERENCE FACILITY MARKET ANALYSIS Prepared by: Lambert Advisory, LLC Submitted to: St. Marks Waterfronts Forida Partnership St. Marks Conference Center Contents Executive Summary... 1 Section
More informationarxiv:1506.05851v1 [cs.ai] 18 Jun 2015
Smart Pacing for Effective Onine Ad Campaign Optimization Jian Xu, Kuangchih Lee, Wentong Li, Hang Qi, and Quan Lu Yahoo Inc. 7 First Avenue, Sunnyvae, Caifornia 9489 {xuian,kcee,wentong,hangqi,qu}@yahooinc.com
More informationOverview of Health and Safety in China
Overview of Heath and Safety in China Hongyuan Wei 1, Leping Dang 1, and Mark Hoye 2 1 Schoo of Chemica Engineering, Tianjin University, Tianjin 300072, P R China, Emai: david.wei@tju.edu.cn 2 AstraZeneca
More informationCOMPARISON OF DIFFUSION MODELS IN ASTRONOMICAL OBJECT LOCALIZATION
COMPARISON OF DIFFUSION MODELS IN ASTRONOMICAL OBJECT LOCALIZATION Františe Mojžíš Department of Computing and Contro Engineering, ICT Prague, Technicá, 8 Prague frantise.mojzis@vscht.cz Abstract This
More informationExample of Credit Card Agreement for Bank of America Visa Signature and World MasterCard accounts
Exampe of Credit Card Agreement for Bank of America Visa Signature and Word MasterCard accounts PRICING INFORMATION Actua pricing wi vary from one cardhoder to another Annua Percentage Rates for Purchases
More informationONE of the most challenging problems addressed by the
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 44, NO. 9, SEPTEMBER 2006 2587 A Mutieve ContextBased System for Cassification of Very High Spatia Resoution Images Lorenzo Bruzzone, Senior Member,
More informationl l ll l l Exploding the Myths about DETC Accreditation A Primer for Students
Expoding the Myths about DETC Accreditation A Primer for Students Distance Education and Training Counci Expoding the Myths about DETC Accreditation: A Primer for Students Prospective distance education
More informationDesign of FollowUp Experiments for Improving Model Discrimination and Parameter Estimation
Design of FoowUp Experiments for Improving Mode Discrimination and Parameter Estimation Szu Hui Ng 1 Stephen E. Chick 2 Nationa University of Singapore, 10 Kent Ridge Crescent, Singapore 119260. Technoogy
More informationBest Practices for Push & Pull Using Oracle Inventory Stock Locators. Introduction to Master Data and Master Data Management (MDM): Part 1
SPECIAL CONFERENCE ISSUE THE OFFICIAL PUBLICATION OF THE Orace Appications USERS GROUP spring 2012 Introduction to Master Data and Master Data Management (MDM): Part 1 Utiizing Orace Upgrade Advisor for
More informationSELECTING THE SUITABLE ERP SYSTEM: A FUZZY AHP APPROACH. Ufuk Cebeci
SELECTING THE SUITABLE ERP SYSTEM: A FUZZY AHP APPROACH Ufuk Cebeci Department of Industria Engineering, Istanbu Technica University, Macka, Istanbu, Turkey  ufuk_cebeci@yahoo.com Abstract An Enterprise
More informationELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES
About ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow, some of which may not appy your account. Some of
More informationLoad Balance vs Energy Efficiency in Traffic Engineering: A Game Theoretical Perspective
Load Baance vs Energy Efficiency in Traffic Engineering: A Game Theoretica Perspective Yangming Zhao, Sheng Wang, Shizhong Xu and Xiong Wang Schoo of Communication and Information Engineering University
More informationMinimizing the Total Weighted Completion Time of Coflows in Datacenter Networks
Minimizing the Tota Weighted Competion Time of Cofows in Datacenter Networks Zhen Qiu Ciff Stein and Yuan Zhong ABSTRACT Communications in datacenter jobs (such as the shuffe operations in MapReduce appications
More informationPricing and hedging of variable annuities
Cutting Edge Pricing and hedging of variabe annuities Variabe annuity products are unitinked investments with some form of guarantee, traditionay sod by insurers or banks into the retirement and investment
More informationComparison of Traditional and OpenAccess Appointment Scheduling for Exponentially Distributed Service Time
Journa of Heathcare Engineering Vo. 6 No. 3 Page 34 376 34 Comparison of Traditiona and OpenAccess Appointment Scheduing for Exponentiay Distributed Service Chongjun Yan, PhD; Jiafu Tang *, PhD; Bowen
More informationConference Paper Service Organizations: Customer Contact and Incentives of Knowledge Managers
econstor www.econstor.eu Der OpenAccessPubikationsserver der ZBW LeibnizInformationszentrum Wirtschaft The Open Access Pubication Server of the ZBW Leibniz Information Centre for Economics Kirchmaier,
More informationINTERNATIONAL PAYMENT INSTRUMENTS
INTERNATIONAL PAYMENT INSTRUMENTS Dr Nguyen Minh Duc 2009 1 THE INTERNATIONAL CHAMBER OF COMMERCE THE ICC AT A GLANCE represent the word business community at nationa and internationa eves promotes word
More information