Efficient Auditing For Complex SQL queries

Size: px
Start display at page:

Download "Efficient Auditing For Complex SQL queries"

Transcription

1 Efficient Auditing For Complex SQL queries Raghav Kaushik Microsoft Research Ravi Ramamurthy Microsoft Research ABSTRACT We address the problem of data auditing that asks for an audit trail of all users and queries that potentially breached information about sensitive data. A lot of the previous work in data auditing has focused on providing strong privacy guarantees and studied the class of queries that can be audited efficiently while retaining the guarantees. In this paper, we approach data auditing from a different perspective. Our goal is to design an auditing system for arbitrary SQL queries containing constructs such as grouping, aggregation and correlated subqueries. Pivoted on the ability to feasibly address arbitrary queries, we study (1) what privacy guarantees we can expect, and (2) how we can efficiently perform auditing. Categories and Subject Descriptors H.2 [Database Management]: Systems General Terms Security, Performance Keywords Security, Auditing, Access Control, Privacy, Query Processing 1. INTRODUCTION Database systems are used today as the primary repository of the most valuable information in any organization. As the volume of data stored in these repositories has increased, protecting the security of the data has gained increasing importance. Further, the responsible management of sensitive data is mandated through laws such as the Sarbanes-Oaxley Act, the United States Fair Information Practices Act, the European Union Privacy Directive and the Health Insurance Portability and Accountability Act (HIPAA). One of the important components of the DBMS security infrastructure is an auditing system that can be used to aposteriori investigate potential security breaches. Accordingly, there has been an increase in database auditing products on the market [1, 2, 3] including from the major database vendors. As the database system Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGMOD 11, June 12 16, 2011, Athens, Greece. Copyright 2011 ACM /11/06...$ is in production, these products monitor various operations such as user logins, queries, data updates and DDL statements to obtain an audit trail. The audit trail is analyzed offline either periodically or when needed to answer questions about access to schema objects such as: (1) find failed login attempts and (2) find queries and corresponding users that accessed columns corresponding to PII (personal identifier information). An important class of auditing is data auditing. A simple example is single tuple auditing where the goal is to find all queries and update statements that accessed a particular tuple, e.g. PII of a specific individual. Such queries potentially reveal sensitive information. While we are not aware of any commercial database auditing system that supports this functionality, there has been prior research that studies this form of auditing. Prior work has proposed two fundamentally different semantics for data auditing which we can classify broadly as (data) instance dependent and (data) instance independent. The basis for all data auditing semantics is to define what it means for a query to have accessed a particular tuple (that is, single-tuple auditing). In the instance-dependent approach [4], a query is said to access a tuple if deleting the tuple changes the query result on the database instance where the query was originally run a tuple accessed by the query is said to be indispensable to the query. Unfortunately, subsequent work has shown that the instance dependent approach can lead to breaches of privacy [5]. In contrast, an instance independent approach can be used to get strong privacy guarantees [5, 6, 7]. Under the instance independent approach, a query is said to have accessed a tuple if there is some database instance where deleting it changes the query result. Previous work developing the instance independent approach has focused on increasing the class of queries that can be audited efficiently while retaining strong privacy guarantees. Efficient auditing techniques have been developed for interesting subclasses of select-project-join (SPJ) queries. However, real-world queries such as the benchmark TPC-H queries are often complex using constructs like grouping, aggregation and correlated subqueries. While it may be acceptable to consider a restricted class of audit expressions, an auditing system that restricts the class of audited queries is fundamentally incomplete. Therefore, in this paper we approach data auditing from a different perspective. Our goal is to design an auditing system for arbitrary SQL queries. Pivoted on the ability to feasibly address arbitrary queries, we ask (1) what privacy guarantees we can expect, and (2) how we can efficiently perform auditing. We first show that previously proposed instance independent semantics are computationally incompatible with complex SQL in the presence of subqueries, enforcing the semantics becomes undecidable ( 3). 697

2 Therefore, we revisit the instance dependent approach to define our auditing semantics. We begin with the special case of singletuple auditing ( 4). There are several real-world scenarios where a single tuple is a natural unit of analysis. For example, in an employee-department database, an employee tuple contains sensitive information such as the employee s salary. We define an auditing semantics by applying the informal notion of an indispensable tuple uniformly to all queries. In contrast, the approach proposed by Agrawal et al. [4] is (1) non-uniform, (2) does not cover arbitrary SQL and (3) becomes similar to the instance independent approach in the presence of groupby and having clauses ( 4.1). By adopting an instance dependent semantics uniformly, we obtain a feasible implementation for an arbitrary query it is possible to perform single tuple auditing by running the query and a rewritten version that excludes the tuple and checking if the results are equal. But in this process we also inherit the known privacy limitations of the instance dependent approach [5]. We then ask the question whether there is at least a weaker privacy guarantee our semantics can provide. We answer in the affirmative. We introduce the notion of a risk-free attack and show that under our instance dependent semantics, no attack is risk-free ( 4.2). Intuitively, this means that an attacker may get access to sensitive information but not without taking a risk of getting detected. While the above guarantee falls short of the stronger privacy guarantees yielded by the instance independent approach, we believe it offers us an interesting way forward in addressing the full complexity of SQL. We then study how we can efficiently audit a workload of queries ( 5). The straightforward implementation suggested by our definition can be inefficient (especially for a workload of complex queries). Even though auditing is typically an offline process, the efficiency of auditing is important. For instance, auditing may be invoked as a response to an information breach in an attempt to narrow down the cause of the breach, in which case the efficiency of auditing is critical. In order to address the whole of SQL, we propose a novel rule-based optimization framework ( 5.2) that attempts to audit a query without any query execution. The idea is to start with an algebraic plan for a query and find if we can reach a plan for the rewritten query by transforming the initial plan. The plan is transformed using equivalence rules in the usual way deployed by any rule-based query optimizer. The main difference is that in addition to the standard rules that hold for all database instances (for example, pushing a selection below join, join commutativity), we also handle rules that are instance specific. Instance specific rules are naturally derived from audit checks a query that passes the audit is equivalent to the rewritten query. We refer to our framework as an audit optimizer. Just as the extensibility of rule based query optimizers is key to ensuring the efficient compilation of arbitrary queries, our audit optimizer is naturally extensible allowing us to add more rules over time. We also present a technique that considers reordering the queries in the workload to further improve the efficiency ( 5.4). In general, we would like to audit not only a single tuple but more complex audit expressions. Similar to previous work, we formulate our audit expression in the form of a forbidden view that captures the sensitive information. We discuss how our semantics, privacy guarantees and optimization techniques naturally extend to cover our class of forbidden views ( 6). We then report the results of our initial empirical evaluation ( 7) conducted over benchmark data in order to study the impact of our optimization techniques for complex queries. Our evaluation shows that using our framework, we can reduce the time taken for auditing by up to an order of magnitude even when the queries are complex (containing correlated subqueries). We discuss related work in 8 and conclude in 9. Auditing Tool Offline Audit Log Online Monitoring Infrastructure DBMS Application Figure 1: Auditing Infrastructure To summarize, we propose an auditing semantics that can be feasibly implemented for all of SQL. We characterize its privacy guarantees and present novel techniques for efficient auditing, while not compromising the class of queries that we can support. We view our paper as an important first step in efficiently auditing complex SQL queries with privacy guarantees. 2. PRELIMINARIES In this section, we discuss the auditing infrastructure and overall auditing algorithm. We defer a discussion of the auditing semantics to Auditing Infrastructure As with most systems that perform database auditing [1, 2, 3, 4, 5] our auditing system consists of two components an online component and an offline component, shown in Figure 1. The online component is used in production to log the query and update statements issued to the database system. This is implemented using the monitoring infrastructure supported by all commercial database systems. Along with each query/update statement, we also log the corresponding user id. (The user id does not necessarily have to be a DBMS user id many applications manage their users internally. We provide a call-back mechanism that can be used to ascertain the application user id.) We refer to the sequence of query and update statements with associated user ids as the workload. The workload is logged in a separate secure database called the audit log (we use previous work to ensure the audit log is tamper proof). The audit log is used to perform auditing in the offline component. In general, auditing is performed not only against the current database state but also over past database states. Accordingly, the auditing component needs to be able to reconstruct past database states. We use the point-in-time recovery API provided by commercial database systems that lets us rewind the database state to any point in the past using the database transaction log. 2.2 Audit Expressions In general, auditing is performed given an audit expression as input. Similar to previous work, our audit expression specifies sensitive data via a forbidden view (defined formally in 6). We define a notion of a query being safe with respect to the given view. We are given the workload W collected by the online component. Our goal is to find all query and update statements in W that are not safe, along with the corresponding user ids. We introduce the formal definition of safety for the special case of single tuple auditing in 4, and the general case in 6. While we are not aware of any commercial database auditing system that supports such a 698

3 functionality, there has been prior research that studies this form of auditing [4, 5, 6, 7]. 2.3 Overall Auditing Algorithm In this paper, we primarily focus on the offline component of the auditing tool (see Figure 1). We now outline our overall algorithm in Algorithm 2.1. Algorithm 2.1 Overall Auditing Algorithm Input: Workload W of query/update statements with corresponding user ids; Audit Expression AE Output: Subset of W that is unsafe 1: Use the database log to rewind to the appropriate state of the database 2: Split W into query-only workloads with alternating updates 3: Proceed in the same sequence as W 4: For each query-only workload QW 5: Call QUERYAUDIT(QW, AE) to find all queries in QW that are unsafe 6: For each update statement U 7: Call UPDATEAUDIT(U) to check if U is unsafe 8: Update the current state of the database 9: Return all queries and updates that are unsafe along with corresponding user ids Our auditing is performed on the state of the database when the query was originally run. Accordingly, we first use the transaction log to rewind to the state of the database when the first query in the workload was run (we can further optimize this step by taking periodic backups of the database). We then process the workload in order splitting it into query-only portions alternating with data updates. We refer to the query-only portion of the workload as a query workload. The query workloads are audited by a query auditing algorithm QUERYAUDIT that is invoked over the corresponding state of the database that we refer to as the current instance in the rest of the paper. The query auditing algorithm returns the unsafe queries. The update statement is audited using an update auditing algorithm UPDATEAUDIT. The update is also replayed to obtain a new state of the database. Updates are handled in our approach by finding the query underlying the update and extending our semantics for queries. Updates can have a cascading effect. For example, a sensitive record may be copied by an update statement and future queries could reference the copy. In this example, our auditing system will flag the original update statement that performs the copy. It is possible to track the copy in a subsequent auditing session. We could think of extending our auditing semantics to automatically include cascading effects but we defer the extension to future work. For ease of exposition, in the rest of the paper we consider a fixed instance of the database and focus only on auditing queries. The details of the auditing algorithms are presented in 4, 5 and INSTANCE INDEPENDENT SEMAN- TICS Previously proposed instance independent semantics are pivoted on strong privacy guarantees. We have the notion of perfect privacy introduced by Miklau and Suciu [7] and the (weaker) notion of weak syntactic suspiciousness introduced by Motwani et al. [5]. For the special case of single tuple auditing, both of the above definitions reduce to checking whether a tuple is critical to a query. A tuple drawn from the domain of the corresponding table is said to be critical to a query if there is some database instance where dropping the tuple changes the query result. We illustrate with an example. Example 3.1 Consider the TPC-H benchmark database. Consider the query: select * from customer where c_custkey = 100 that asks for all details of the customer with id 100. Any customer tuple t with id 100 is critical since we can construct a database with a customer with id 100 where deleting t makes the query result empty. Previous work developing the instance independent approach has focused on increasing the class of queries that can be audited efficiently while retaining strong privacy guarantees. Efficient techniques have been developed for checking whether a tuple is critical to a query for large subclasses of conjunctive queries [6]. As stated in 1, our goal is different we wish to be able to handle arbitrary SQL containing constructs such as grouping, aggregation and correlated subqueries. Even though the notion of a critical tuple applies to the whole of SQL, we show that checking whether a tuple is critical to a complex query involving subqueries is undecidable. THEOREM 3.2. Checking if a tuple is critical to a query that is allowed to contain subqueries is undecidable. PROOF. Recent work shows that the query containment problem under bag semantics in the presence of inequalities is undecidable [8]. We reduce this problem to checking criticality. Suppose we want to check if query Q 1 is contained in query Q 2 under bag semantics, denoted Q 1 b Q 2. We create a new query Q select * from T where EXISTS (Q1 except all Q2) where T is a new table not referenced in either Q 1 or Q 2. We use the except all clause in SQL to compute the bag difference between Q 1 and Q 2. If Q 1 b Q 2, then Q is always empty and hence no tuple in the domain of T is critical. On the other hand, if Q 1 b Q 2, then every tuple in the domain of T is critical to Q. It is known that checking perfect privacy and weak syntactic suspiciousness is at least as hard as checking whether a tuple is critical to a query [5, 7]. Thus, from Theorem 3.2, it follows that the previously proposed instance independent semantics are computationally incompatible with the complexity of full SQL. We therefore revisit the instance dependent semantics for auditing. 4. SINGLE TUPLE AUDITING Motivated by the need to audit complex queries, we reconsider an instance dependent semantics. We present our overall auditing semantics by beginning with the special case of single tuple auditing. As noted in 1, there are several scenarios where a single tuple is a natural unit of auditing. We present our extension to more general audit expressions in Query Differentials We define the notion of query differentials to capture instance dependent single tuple auditing. DEFINITION 4.1. Given a database instance D, a query Q and a tuple t specified by the value v of its primary key, we rewrite Q to exclude t from T by adding the predicate T.id v (we refer to this expression as T t). The rewritten query is called its differential with respect to t, denoted Q t. We say that query Q accesses tuple t if Q(D) Q t(d). If Q(D) = Q t(d), then we say that Q is safe with respect to t. We illustrate Definition 4.1 with an example. 699

4 Example 4.2 Consider the data and query in Example 3.1. Suppose there is a customer tuple t with id 100 in the current instance, then by Definition 4.1, tuple t is accessed by the query. Our notion of a tuple being accessed by a query is identical to the informal notion of an indispensable tuple introduced by Agrawal et al. [4]. For the class of SPJ queries, the formal definitions are also identical. However: 1. The formal definition proposed by Agrawal et al. [4] is nonuniform. In fact, no definition of indispensability is offered for multi-block SQL queries that cannot be decorrelated. 2. For select-project-groupby-having queries, the approach proposed by Agrawal et al. [4] proceeds by dropping the having clause which reduces it to the instance independent semantics (we illustrate in Example 4.3). In contrast, Definition 4.1 is uniformly instance dependent. The following example illustrates the difference between the instance dependent and independent semantics (and also the difference between our semantics and that of Agrawal et al. [4]). Example 4.3 We continue with the database in Example 3.1. Consider the query Q: select o_custkey, count(*) from orders group by o_custkey having count(*) >= 10 that finds the number of orders placed per customer but only for those customers that have at least 10 orders. Suppose that customer with id 100 has placed only 5 (< 10) orders corresponding to tuples {o 1,..., o 5} in the orders table. The output of Q on the current database would not have an entry corresponding to customer 100. Deleting any of the customer s orders does not change the output of Q. Thus, by Definition 4.1, none of the tuples in {o 1,..., o 5} is accessed by Q. However, each of the tuples in {o 1,..., o 5} is critical to Q; for example we can construct an instance of the database where customer 100 has 10 orders including o 1 where deleting o 1 changes the output of Q. We note that the definition offered by Agrawal et al. drops the having clause in the above query; thus their definition reduces to the instance independent semantics. 4.2 Privacy Guarantees As noted in 1, there are known privacy breaches implied by the instance dependent semantics. One class of breaches is what are called negative disclosures. Consider a query Q that finds the subset of tuples satisfying a predicate P in a table T. Suppose an adversary is aware that a tuple t exists in the database. If the output of query Q does not include the tuple t, the adversary can infer that tuple t does not satisfy the predicate P. We now provide a different example of a privacy breach implied by the instance dependent semantics. Example 4.4 Suppose the customer table in the TPC-H database has a credit rating attribute. Suppose that in the current instance of the database, customer John Doe has a credit rating of 700. Consider the following queries, Q 1: select sum(creditrating - 700) from customer and Q 2: select sum(creditrating - 700) from customer where c_custname <> John Doe By checking if the results of the two queries are equal, an adversary can learn that John Doe s credit rating is 700. However, the tuple corresponding to John Doe is not accessed by either Q 1 or Q 2. Thus our auditing semantics would fail to detect the above attack. The question arises whether we can precisely characterize the privacy guarantee that the instance dependent approach yields. Let us re-examine Example 4.4. The attack in Example 4.4 essentially requires knowing the credit rating value apriori if we change the credit rating of John Doe to say 600, query Q 1 accesses the corresponding tuple and is therefore flagged as unsafe. Thus, if the adversary does not know the value of John Doe s credit rating upfront, then by issuing queries Q 1 and Q 2, he is taking a risk of being detected by the audit system. Based on the above intuition, we define the notion of a risk-free attack below. As in the definition of perfect privacy [7], we assume that the adversary views the database as a probability distribution (over instances) obtained from independent tuple probabilities. We first formalize the notion of an attack. Fix a database instance D and a tuple t T of interest. For this discussion we assume that (without loss of generality) there is one sensitive attribute A in table T. Suppose the adversary has knowledge of all the data in the database except tuple t. Since we are assuming that the tuple probabilities for the adversary are independent and since t is not critical to T t, the adversary s knowledge preserves perfect privacy of the value of t.a [7]. The adversary would like to learn some non-trivial boolean property of t.a we call a boolean property non-trivial if there is at least one value in the domain of t.a that makes it true and at least one value that makes it false. In Example 4.4, the attacker would like to know whether John Doe s credit is 700 (or not). We assume that the only interaction the adversary has with the data is running queries and examining their results. The adversary issues a set of queries {Q 1,..., Q n} to the database and computes a boolean function f on their results. Intuitively, the pair <{Q 1,..., Q n}, f> constitutes an attack if the overall computation reveals some non-trivial property of t.a. We formalize this intuition below. DEFINITION 4.5. Consider the pair <{Q 1,..., Q n}, f> where {Q 1,..., Q n} is a set of queries and f is a boolean function. Define the following function h defined over the domain of t.a. Given value α in the domain of t.a: 1. Create database instance D α obtained from the instance D by changing the value of t.a to α 2. Compute h(α) = f(q 1(D α),..., Q n(d α)) The pair <{Q 1,..., Q n}, f> is defined to be an attack if h is some non-trivial boolean property of t.a. Example 4.4 illustrates an example of an attack. The goal of an attack is to cheat the auditing system. In other words, it should infer a non-trivial boolean property of a tuple by running a set of queries that the auditing system deems to be safe. DEFINITION 4.6. An attack <{Q 1,..., Q n}, f> is said to be: 1. Successful for a database instance D α (as defined in Definition 4.5) if each Q i {Q 1,..., Q n} is safe with respect to t in D α. 2. Risk-free with respect to an auditing system if it is successful for all database instances D α. 700

5 Example 4.7 The attack described in Example 4.4 is successful in a database instance where the credit rating of John Doe is 700. The above example illustrates a key difference between the instance dependent and the instance independent semantics (based on perfect privacy and weak syntactic suspiciousness) which does not permit successful attacks. However, as noted above the attack in Example 4.4 is not a risk-free attack. If the credit rating of John Doe is not 700, then query Q 1 would be flagged as unsafe. In fact, we can show that for our auditing system that uses the instance dependent semantics, no risk-free attacks are possible. As a special case, it follows that negative disclosures are not risk-free. THEOREM 4.8. No attack is risk-free with respect to the singletuple auditing semantics introduced in Definition 4.1. PROOF. We provide a proof sketch. Suppose that the tuple of interest is t T. Suppose to the contrary that we have a riskfree attack <{Q 1,..., Q n}, f>. From Definition 4.6, it follows that each Q i could essentially be equivalently rewritten to only go over the view T t (formally, each Q i is conditionally valid [9] with respect to the view T t). This contradicts the fact that <{Q 1,..., Q n}, f> is an attack. In this section, we present a novel characterization of the privacy guarantee that the instance dependent approach yields. While it is a weaker guarantee than what the instance independent approach yields, we think it is interesting given that it does not restrict the set of queries that can be feasibly audited. As mentioned in the introduction, an auditing system that cannot handle arbitrary queries is fundamentally incomplete. At the same time, there are stronger notions of privacy that detect all successful attacks but it is not clear how we can support them efficiently for complex queries. Understanding the spectrum in between is an interesting direction for future work. 4.3 Baseline Query Auditing Algorithm We note that Definition 4.1 lends itself to a feasible implementation for arbitrary SQL queries run the query and its differential and check if the results are the same. This baseline algorithm is illustrated in Algorithm 4.1. As described in 4.2, this algorithm disallows risk-free attacks. The baseline algorithm can be ineffi- Algorithm 4.1 Baseline Query Auditing Algorithm QUERYAUDIT Input: (1) Query Workload QW of query statements with corresponding user ids, (2) Current state of database D, (3) Tuple t Output: Subset of QW that is unsafe 1: For each query Q QW 2: Check if Q(D) and Q t (D) are equal by executing both 3: If (not equal) report that Q is unsafe cient especially given a workload of complex queries. As discussed in 1, even though auditing is typically an offline process, the efficiency of auditing is important. We next present a more efficient algorithm for query auditing. 5. AUDIT OPTIMIZER As discussed in 1, even though auditing is typically an offline process, the efficiency of auditing is important. For instance, auditing may be invoked as a response to an information breach in an attempt to narrow down the cause of the breach, in which case the efficiency of auditing is critical. The baseline implementation outlined in Algorithm 4.1 involves the execution of the query and its differential and a check to examine if the results are identical which can be quite expensive. Efficient auditing is the focus of this section. We first discuss optimizations for simple classes of queries such as select-project-join (SPJ) queries and then discuss an audit optimization framework for general queries. 5.1 Optimization Techniques For Simple Queries Based on previous work on incremental view maintenance [10], we can design various optimizations for special classes of queries such as select-project-join (SPJ) queries. We first discuss technique which we term as substitution based on previous work [4] for auditing SPJ queries without self-joins. Consider a single tuple t in a table T. Suppose we want to check if a self-join-free SPJ query Q accessed t. The substitution technique creates a new query Q from Q by substituting table T with a table T which has exactly the single tuple t in it. This is accomplished by adding a suitable predicate to Q. Checking whether Q accessed t is equivalent to checking if the output of Q is empty. The following example illustrates this technique. Example 5.1 Consider the following query Q on the TPC-H schema. select * from orders where o_orderdate > Suppose the tuple of interest is a specific sensitive order, say o_orderkey = The substitution approach would check if the output of the following query Q is empty. select * from orders where o_orderdate > and o_orderkey = 5000 This check can be very efficiently implemented if there is an index on the o_orderkey column. Sometimes, we can check if the modified query Q is empty without executing it, for instance if Q contains the predicate o_orderkey < 3000 and we are interested in o_orderkey = This technique has been termed static pruning [11]. We can extend the above technique to handle simple grouping and aggregation. However, just as with previous work on incremental view maintenance, there is no simple extension to more complex queries. 5.2 Rule Based Framework In this section, we present a novel rule based framework that is extensible to the whole of SQL and that attempts to audit queries without any query execution. We motivate our framework by using a simple example. Example 5.2 Consider the following workload of two queries. 1. select * from orders where o_orderdate > select * from orders where o_orderdate > Suppose that the tuple of interest is a specific order tuple t. Suppose that we have already verified that the first query is safe. Since the second query is subsumed by the first, we can infer that the second query also is safe without executing it. Example 5.2 suggests that we can use the result of auditing one query in auditing future queries. We propose a rule based framework similar to what is used in rule-based query optimizers [12, 13] to capture the optimization 701

6 illustrated in Example 5.2 (We defer a discussion of related work in answering queries using views to 8.) The idea is that if a query Q does not access tuple t, then Q is equal to its differential on the current database instance. We incorporate this knowledge in the form of a rule. Unlike traditional optimizer rules (e.g., the join commutativity rule) which are valid for all database instances, the new rules are valid only on the current database instance. Hence, we term them instance equivalence rules. We represent the rules using the algebraic representation of a query in the form of a logical plan. DEFINITION 5.3. An instance equivalence rule is an ordered pair of logical plans, with a left-hand side (LHS) and a right-hand side (RHS), whose results are equal on the current database instance. While leveraging instance equivalence rules is reminiscent of semantic query optimization [14, 15], we note that there are important differences. We model the rules algebraically that lets us easily integrate them with traditional rules as well as represent equivalences between complex query plans (in contrast to semantic query optimization that typically relies on exposing soft constraints as semantic rules). Moreover, instance equivalences have not been previously leveraged for the notion of data auditing where the previously audited queries provide a natural source to derive such rules. Example 5.4 Consider the first example query discussed in Example 5.2 which is safe with respect to the tuple t mentioned in the same example. This fact is represented as an instance equivalence rule shown in Figure 2. The example rule shows that the selection!!"""""""""""""""""!!" "" #$%&$'" " (#$%&$')*+" Figure 2: Instance " Equivalence Rule For Select Query predicate (denoted by σ1) on the original table produces the same result as its differential note that the table in the RHS of the rule is - t. Broadly, we have both the standard transformation rules (like those used in an query optimizer) along with instance equivalence rules on the current instance of the database. For a given query, we use these rules to try to reach its differential. If we succeed, we know that the query and its differential are also equal on the current database instance, without performing any query execution. On the other hand, if we fail then we fall back to query execution (perhaps an optimized form as discussed in 5.1). We illustrate this approach through an example. Example 5.5 Consider the second query in Example 5.2. Using the instance equivalence rule in Figure 2, we can start from the logical plan corresponding to the second query and reach the logical plan of its differential. The sequence of transformations is shown in Figure 3. The selection predicate for the second query is denoted by. The individual steps in the derivation are as follows. 1. Rewrite the original plan as a predicate on the LHS of the instance equivalence rule. 2. Replace the LHS of the instance equivalence rule with the RHS.!!!!!!"!!!!!!!!!!!!!!!!!!!!!!!"!!!!!!!!!!!!!!!!!!!!!!!"!!!!!!!!!!!!!!!!!!!!!!"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#!!!!!!!!!!!!!!!!!!#!! $%&'%(!!!!!!!$%&'%(!!!!!!)$%&'%(*+,!!!!)$%&'%(*+,! Figure 3: Inferring Equality of a Query and its Differential for a Select Query 3. Collapse the predicates by using a rule that eliminates redundant selection predicates to obtain the target differential plan. We can see from Example 5.5 that we match against the LHS of a rule in order to use the rule. Once we find a match, we consider applying the rule by replacing the matching portion by the RHS. A plan P is said to be reachable from plan P if we can transform P to P by matching and applying one or more rules. We now define the formal problem addressed by the transformation based audit optimizer. Reachability Problem: Given a set of transformation rules R and a set of instance equivalence rules R, an input plan P and a plan P of its differential, check if P is reachable from P through a sequence of transformation rules in (R R ). Our framework is extensible. Just as commercial optimizers can handle arbitrary SQL using transformation rules, we can easily extend our framework to handle complex SQL queries by adding suitable transformation and instance equivalence rules. Rule matching and application are performed in a manner similar to rule-based query optimizers both for standard transformation rules as well as for instance equivalence rules. The LHS is treated as a pattern and matched exactly. Alternately, we check if any part of the plan under consideration is subsumed by the LHS. For subsumption, we use view matching logic. For instance, in Example 5.5 we invoke the first step of the rewrite based on view matching. View matching as implemented in commercial optimizers is restricted to the class of materializable views. In our implementation, we also match against more complex views as we illustrate below. Example 5.6 Consider the following two queries similar to the previous example but including a nested subquery. 1. select * from orders where o_orderdate > and o_totalprice > select (avg(o_totalprice) from orders) 2. select * from orders where o_orderdate > and o_totalprice > select (avg(o_totalprice) from orders) Let the tuple of interest t be a specific order tuple. Suppose that the first query is safe. The corresponding instance equivalence rule is shown in Figure 4. The LHS of the rule uses an operator [16, 17] to represent the subquery. We normalize the operator by pulling the selection above the operator (we identify the selection above the with the.) Similar to the previous example, we can again establish the equality of the second query and its differential as shown in Figure 5. An interesting point to note is that the first step leverages a more sophisticated form of 702

7 !!&!!&!!!!!!"##$%! '()*(+! ",-! '()*(+ +!!!!!!"##$%!.'()*(+/01! ",-!.'()*(+!"!01! 1! Figure 4: Instance Equivalence rule for a nested subquery!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"!!!!!!"##$%! &'()'*! "+,! &'()'* *!!"!!-!!!!!!"##$%! &'()'*! "+,! &'()'* *!!"!!-!!!!!!"##$%!.&'()'*/01! "+,!!!.&'()'*!#01!!"!!!!!!"##$%!.&'()'*/01! "+,!.&'()'*!#!01! 1! Figure 5: Inferring Equality of a Query and its Differential for a Nested Subquery view matching than Example 5.5. We reason that an operator is subsumed by another if its parent selection is subsumed and its children are identical. We note that this goes beyond existing view matching techniques in commercial systems that do not handle subqueries. The rest of the derivation is similar to the previous example. We finally illustrate an example of a rewriting that involves multiple applications of instance equivalence rules. Example 5.7 We show the instance equivalence rules obtained from prior executions below. We observe that there are two of σ1 σ1 (-t) (-t) Figure 6: Instance Equivalence Rules them. The rule involving the max aggregation corresponds to the fact that the tuple of interest does not have the maximum value in the table. We illustrate how we can leverage the above two rules to derive the safety of a complex query involving a nested subquery in Figure 7. We note in Figure 7, we move away from the normalized form of the operator by pushing the selection down. We then find that both the left and right children of the operator are subsumed by the LHS of some instance equivalence rule. ing both the rules yields the target plan of the differential query. 5.3 Implementation In this section we briefly discuss how the proposed rule based framework can be implemented. If we modify the DBMS code, we can consider the framework to be a special mode of the query optimizer ( the audit optimizer mode) in which we modify a traditional rule based query optimizer to accept as additional input the s (-t) ( t) ) s σ1 (-t) (-t) s (-t) σ1 (-t) s s Figure 7: Inferring equality using multiple instanceequivalence rules instance equivalence rules, the original logical plan and to check if the differential plan is reachable using its rule engine. We would in addition need to augment the view matching rules to match more complex operator trees as discussed above. Alternately, we can also build a client based solution in which we implement a rule engine to manage the set of original transformation rules as well as the instance equivalence rules. Note that we only need the rule application and matching logic in a query optimizer; in particular, we do not need the cardinality estimation and cost estimation modules. In addition, we can leverage the database server for performing predicate subsumption and view matching using the hypothetical views API provided by most commercial database systems [18]. In order to check if a view V 1 is subsumed by V 2, we create a hypothetical view (which is just a metadata entry in the catalog and does not contain any data), optimize the query corresponding to V 1 and check if it can be answered using the hypothetical view V 2. Algorithm 5.1 OPTQUERYAUDIT Input: Query-only Workload QW of query statements with corresponding user ids; Tuple t in table T Output: Subset of QW that is unsafe 1: Let D be the database state corresponding to QW 2: Let I be the set of instance equivalence rules 3: For each query Q QW 4: Let P and P denote the logical plans for Q(D) and Q t 5: (D) Check if P is reachable from P using I 6: If not, Check if Q(D) and Q t (D) are equal 7: If (equal) augment I with a new instance equivalence rule 8: If (not equal) report that Q is unsafe As the number of instance equivalence rules increases, the overhead of checking reachability using the rule engine also increases. We can leverage techniques that are typically used to control the overheads of query optimization, such as timeouts we run the rule based framework till a particular timeout expires. If we have not yet reached the target plan, we resort to query execution. We defer a more thorough study of how the overheads of the rule based framework can be tuned to future work. Another issue to consider is the maintenance of the instance equivalence rules under updates. Recall that these rules are valid only for a particular database instance. Currently, as a first step, we use simple syntactic checks such as only invalidating rules that refer to the table being updated. More advanced techniques for main- 703

8 taining the instance equivalence rules in the presence of updates is an interesting direction for future work. We present the optimized algorithm for single tuple auditing that augments the baseline approach with the rule based framework in Algorithm 5.1. The key differences from the original algorithm are as follows. In Steps 4-5, we use the rule based framework to check if the differential plan is reachable from the original plan. If so, we skip the execution for this query. Otherwise, we generate a new instance equivalence rule (which is added to the set I) if the query is safe. 5.4 Reordering queries We note that in Algorithm 5.1, we do not change the order in which queries are processed. In this section we present an optimization that considers reordering the queries in the workload to further improve the efficiency. We motivate this optimization with an example. Example 5.8 Consider the following workload of three queries presented in order. 1. select * from orders where o_orderdate > and o_totalprice > select (avg(o_totalprice) from orders) 2. select * from orders where o_orderdate > and o_totalprice > select (avg(o_totalprice) from orders) 3. select * from orders where o_orderdate > and o_totalprice > select (avg(o_totalprice) from orders) As discussed previously in Example 5.6, we can use instance equivalence rules to avoid any query execution for the second query. However, we would still need to execute the third query as it is not subsumed by any of the previous queries. Thus, we would only save the execution corresponding to the second query. However, suppose that we reorder the queries in the workload so that we audit the third query first. If the third query is safe, we can then infer the same for both the remaining queries thus saving two query executions. We formally define the Query Reordering Problem below. Query Reordering Problem: Given a query-only workload, find the permutation of the sequence of the queries that leads to the minimal number of executions for single tuple auditing. We solve the above problem by creating a subsumption graph which is a directed acyclic graph (DAG) in which there is one node corresponding to each query in the workload and there is an edge from node Q 1 to node Q 2 if Q 2 is subsumed by Q 1. We order queries in the order yielded by a topological sort of the subsumption graph. The rest of the algorithm is identical to Algorithm AUDIT EXPRESSIONS We now briefly discuss how we extend our solution for single tuple auditing to more general audit expressions. Similar to previous work, our audit expressions are expressed as forbidden views. We begin by considering forbidden views expressed as a predicate over a single table. If the predicate is of the form id = value, then we reduce to single tuple auditing. Example 6.1 Consider the following view over the Patients( patientid, disease) table in a health care database. select * from Patients where disease = cancer The above view expresses the intuition that information about cancer patients is sensitive. We also support a limited class of joins as forbidden views. Our goal in extending forbidden views to joins is to express simple predicates on set-valued attributes. For instance, an extension of Example 6.1 would be where the database has the information about patients in a table called Patients but the diseases are stored in a separate table Diseases to account for the fact that a patient might suffer from multiple diseases over time. In this instance, we can think of a patient as consisting of a set valued attribute containing all diseases they have suffered from over time. More formally, we allow the following class of forbidden views: select * from universal-table where condition list The term universal table above refers to the join of a set of tables where we begin with a key table (e.g., Patients) that intuitively captures the atomic attributes in the set, and join other tables (e.g., Diseases) through foreign key lookups that essentially add the set-valued attributes. The condition list consists of simple predicates that do not involve subqueries. We illustrate with the following example. Example 6.2 Consider the Patients-Diseases example above. We can write the following view: select * from Patients P join Diseases D on patientid where P.zipcode = and D.disease = cancer to capture our desire to hide information about cancer patients in the zip code Although our forbidden views are not as general as the class considered by previous work, they cover most of the examples previously considered [4, 5]. We now briefly discuss how our auditing semantics extend to cover the above class of forbidden views. As noted above, we think of the forbidden view as expressing a boolean predicate on a tuple containing a set-valued attribute. For single tuple auditing, we rewrite a query to exclude the tuple. Given a forbidden view, we similarly rewrite the query to exclude any tuples in the universal table that belong to the forbidden view. The rewritten query is called its differential with respect to the forbidden view. A query is deemed to be safe with respect to the forbidden view if it has the same result as its differential on the current instance of the database. We illustrate with an example. Example 6.3 We continue with the forbidden view in Example 6.2. Consider the query Q: select * from Patients where patientid = Alice The differential of Q is: select * from Patients P where patientid = Alice and not (zipcode = and exists (select * from Disease D where P.patientID = D.patientID and D.disease = cancer )) If Alice happens to live in the zipcode and suffered from cancer, then the differential would produce a different result. Otherwise, Q would be deemed safe with respect to the forbidden view. 704

9 In general, each table referred to in a query is rewritten to exclude the forbidden sets. We omit the details of the rewriting which are straightforward. Interestingly, both our privacy guarantees as well as our audit optimization techniques extend in a straightforward manner to cover the class of forbidden views above. We assume that the adversary views the database as a probability distribution (over instances) obtained from independent universal table tuple probabilities. The elements in a set-valued attribute need not be independent. For instance, in Example 6.2, we do not assume that the diseases afflicting a patient are independent of one another. But there is independence between the universal records (patient records in Example 6.2). With this adversary model, it is not difficult to extend the definition of an attack and a risk-free attack to tuples with setvalued attributes. Just as in Theorem 4.8, we can show that the audit semantics proposed above for forbidden views guarantees that no attack is risk-free. THEOREM 6.4. No attack is risk-free with respect to the forbidden view auditing semantics defined above. Similarly, the baseline algorithm and audit optimization techniques also extend. Instead of an expression of the form T t for a table T and tuple t, we have an expression of the form T σ p(t ) where predicate p is derived from the forbidden view. With that change, the plan transformation proceeds as in 5. For instance, the instance equivalence rule r 1 : σ 1() σ 1(-t) in Figure 2 would instead be the instance equivalence r 1 : σ 1() σ 1( σ p()). The instance equivalence rule r 1 can be used in the same way as r 1 in Example Relationship to Previous Work There are two strains of previous work we now comment on. Our approach to forbidden views outlined above differs from the (instance dependent) semantics adopted by Agrawal et al. [4]. The semantics proposed by Agrawal et al. [4] deems a query to be safe with respect to a forbidden view if they share no indispensable tuples. The following example illustrates how our semantics is different. Example 6.5 Consider the forbidden view expressed in Example 6.2 and the query select * from Disease D where D.disease = heart attack and D.patientID = Alice. Suppose that patient Alice lives in the zip code and has suffered from both heart attack and cancer. Alice s disease record corresponding to heart attack is not indispensable to the forbidden view. Thus, by the semantics proposed by Agrawal et al. [4], the query would be deemed safe. On the other hand, under our definition, the query would be deemed unsafe. Essentially, our semantics for forbidden views assumes that universal table tuples are independent but the elements with a single set-valued attribute (e.g., heart attack and cancer which are both correlated with smoking) are not necessarily independent. Our semantics yields the privacy guarantees discussed above. This is different from the approach proposed by Agrawal et al. [4] where the privacy guarantees are not discussed. The second strain of previous work that is related to our semantics is the recent work by Fabbri et al. [11] that focuses on auditing an authorization policy in an instance dependent approach. As with authorization policies, the auditing semantics presented in this paper also performs query rewriting. However, our input is a forbidden view, not an authorization policy. We can think of our semantics intuitively as deriving an authorization policy based on the complement of the forbidden view in order to rewrite the query. However, we formally analyze the privacy guarantees yielded by such an approach. We also propose a generic optimization framework applicable to arbitrary queries; in contrast, the optimizations proposed by Fabbri et al. [11] focus mostly on SPJ queries. 7. EXPERIMENTAL EVALUATION In this section, we present an experimental evaluation of the techniques presented in 5 to reduce the cost of computing differentials for complex queries. We focus primarily on the offline component of the auditing tool. The goals of the experimental study are to study the impact of our optimization techniques as a function of (1) the data size, (2) the number of queries failing the audit, (3) the selectivity of the audit expression, and (4) the fraction of updates in the workload. We first discuss some details of our experimental setup and then present our results and summarize their implications. 7.1 Implementation and Experimental Setup Our auditing tool is built as a client application over Microsoft SQL Server The online part of the audit relies on the auditing infrastructure of SQL Server [19]. The audit optimizer is built as part of the client application (as described in 5.3). The current prototype has the basic set of transformation rules (covering joins, selects, aggregates, group-by etc.) as well as view matching rules for operator trees. As discussed previously we build on the hypothetical view API of Microsoft SQL Server 2008 for checking subsumption. We also leverage a client-side parser tool to parse SQL queries into logical plans and also to derive instance equivalence rules for queries that are safe. We use the TPC-H benchmark database [20] for our experiments. We use the 1GB version for most of our experiments but also present results on the 100MB as well as the 10GB version when we examine the sensitivity of the optimization techniques to data size. For our query workload, we choose the following stored procedure on the TPC-H schema which obtains information about high-priced orders (which are defined as orders that are close to the highest priced order item) that were shipped after a certain date. The procedure below is based on query 17 in the benchmark. SELECT * FROM orders, lineitem WHERE l_orderkey = o_orderkey and l_shipdate > $1 and o_totalprice > 0.9 *(SELECT max(o_totalprice) from orders) The query workload is generated by using 30 random instances of the above stored procedure. We first consider a scenario for single tuple auditing where we assume that we are interested in auditing for a specific order of a particular customer (that could represent sensitive information) which is specified by a predicate on the o_orderkey column. In other words, we are interested in finding the specific instances of the stored procedure that accessed the information corresponding to an order of a particular customer. For our experiments on audit expressions, we assume that we are interested in auditing for orders in a particular time window which is specified by a predicate on the o_orderdate column. We choose the above stored procedure because it tests the performance of our optimizations for a complex query containing aggregation and a subquery. Moreover, stored procedures are commonly used in databases and it is important to test the performance of our techniques for such scenarios. 7.2 Comparing different alternatives In this experiment, we first compare how the different techniques discussed in 5 perform for the basic case of single tuple auditing when all the queries are safe (we revisit this assumption in a later 705

10 Execution Time (sec.) 1,200 1, Execution Time Number of Executions Baseline Opt Opt-Reorder No. of Executions Speedup Over Baseline Opt Opt-Reorder 100MB 1GB 10GB Database Size Figure 8: (a)comparison of different alternatives (b)varying Data Size experiment). We ensure this by auditing for a customer order which is not among the high priced ones. We present results for the following techniques. 1. Baseline: This is the baseline approach that checks if the query and its differential are equal for every query in the workload 2. Opt: This technique (described in Algorithm 5.1) leverages previous safe queries by deriving and exploiting instance equivalence rules. 3. Opt-Reorder: This technique (described in 5.4) in addition reorders the sequence of queries in order to best exploit the instance equivalence rules Figure 8(a) illustrates (1) the total number of query executions and (2) the total execution time in seconds. The graph indicates that Opt-Reorder reduces the number of executions by a factor of 30 when compared to the Baseline approach. Note that in this experiment all queries are safe, thus Opt-Reorder would in fact execute only the query with the largest predicate range of l_shipdate first and the remaining queries would then be inferred to be equivalent to the corresponding differential queries using the instance equivalence rules. However, since Opt-Reorder would still execute the most expensive query the corresponding ratios for execution times are less (though still significant). The graph indicates that the optimization techniques proposed in this paper have the potential to reduce the time taken for single tuple auditing by even an order of magnitude. 7.3 Varying Data Size In this experiment, we repeat the previous experiment while varying the original database size. We present results for three versions of the TPC-H database (100 MB, 1GB and 10GB). Since the number of executions remain the same for the techniques (independent of the database size) we only illustrate the speedup in execution times over the baseline algorithm. The results in Figure 8(b) illustrate the following points. First, for small databases Opt-Reorder can potentially be worse than Opt. This is because the relative overheads of computing the subsumption graph (recall that our implementation needs to leverage the hypothetical views API for checking predicate/view subsumption) can be significant particularly when the query execution times are small. Second, as the database size increases, the relative overheads of computing the subsumption graph are smaller and Opt-Reorder can provide significant reductions in the time taken for auditing when compared to Opt. 7.4 Sensitivity to Audit Failure Rate Both Opt and Opt-Reorder leverage instance equivalence rules that are derived from previous safe queries. Thus, an important parameter that could influence the performance of the proposed techniques is the Audit Failure Rate which is defined as the fraction of the queries that fail the audit. In order to vary this parameter we repeat the experiment in 7.2 but force the first k% percent of the queries to fail the audit. We note that this experiment is biased against Opt-Reorder (and not Opt) since the first few queries (after reordering) are typically the most expensive queries in the workload and we essentially force Opt-Reorder to execute them. The results in Figure 9(a) indicate that the proposed techniques work well even when the audit failure rate is high. For instance, the speedup obtained using Opt-Reorder is around 3 even for the case when 20% of the queries fail the audit (which is a large fraction). The performance of Opt and Opt-Reorder are relatively close in this example. This is because the experiment is biased against Opt- Reorder as described previously. 7.5 Sensitivity to Audit Expression Selectivity The results presented thus far have focused on the scenario of single tuple auditing. As discussed earlier, our optimization techniques naturally extend to support a larger class of audit expressions. In this section, we examine the sensitivity of the optimization techniques to the selectivity of the audit expression used. Recall that the audit expression we consider is the set of orders in a particular time window which is specified by a predicate on the o_orderdate column. Figure 9(b) shows the results as the predicate range on the o_orderdate column is varied from a week to a year. The results show that the speedup obtained reduces as the date range widens. This is expected because a larger date range in turn increases the audit failure rate. However, the results show that even when the audit expression selects all orders made over a year, our optimization techniques speed up execution by a factor of nearly Sensitivity to Updates Finally, we examine the sensitivity of Opt and Opt-Reorder to updates in the workload. Recall that both Opt and Opt-Reorder utilize the notion of instance equivalence rules. However, these 706

11 Baseline Opt Opt-Reorder Baseline Opt Opt-Reorder Execution Time (sec.) % 5% 10% 20% Fraction of Audit Failures Execution Time (sec.) week 1 month 6 months 1 year Predicate Range Figure 9: Varying (a)audit Failure Rate (b)audit Expression Selectivity Execution Time (sec.) Baseline Opt Opt-Reorder 0% 3% 6% 15% Fraction of Updates (uniform) Figure 10: Effect of Updates rules are only valid for a particular instance of the database and may need to be invalidated in the presence of updates. As discussed in 5, the rules can be still applied for the queries in a workload that are between update statements. In order to study the effect of updates, we add an increasing number of update queries uniformly to the workload of 30 instances of the stored procedure. Figure 10 shows the results for an audit expression that selects all orders made over a range of three months. For instance, the data point corresponding to 6% updates would essentially split the original 30 query sequence into 3 smaller subsequences of 10 queries each. The results indicate that Opt-Reorder yields significant speedup even in the presence of a large fraction of updates (e.g., providing a speedup of a factor of 3 even when the percentage of updates is 15%). We find similar results for the case when the updates are not uniformly distributed but skewed to appear among the last 10 queries of the original workload. 7.7 Summary We now summarize our empirical results. Our initial experiments indicate that Opt-Reorder has the potential to significantly reduce the time taken to audit queries (especially for instances of a stored procedure). It is particularly effective in cases where the database size is large and the audit failure fraction is low (which is likely to be common in practice). Moreover, the performance improvements yielded by Opt-Reorder remain significant as the input characteristics such as the data size, audit failure rate, audit expression selectivity and the fraction of updates in the workload change. We defer a more thorough evaluation of our framework to future work. 8. RELATED WORK The related work in the space of data auditing is discussed earlier in the paper. This section focuses mostly on the work related to the audit optimizer. Our audit optimizer is architected similar to a rule based optimizer [12, 13] and can be extended by adding additional transformation rules. In addition to regular transformation rules, an audit optimizer also leverages instance equivalence rules that are only valid for a current instance. While the notion of leveraging additional semantic rules has previously been studied in the area of semantic query optimization [14, 15], the main difference in our approach is that we model the rules algebraically (in the form of logical plans) that lets us easily integrate the instance equivalence rules with standard transformation rules into the optimizer search space. Further, auditing previous queries is a novel and natural source of instance equivalence rules. Our techniques for efficiently performing an audit are also reminiscent of previous work in query answering using views [21]. For instance, instead of using a rule based framework, we could leverage previously audited queries using the following problem formulation. Given a set of views corresponding to previously audited queries, we can avoid the execution of the current query and its corresponding differential if it can be rewritten completely in terms of the views. However, we do not use this problem formulation for the following reasons: 1) the problem of answering queries using views is known to be a hard problem for complex queries/views [21] and 2) the view language used by database systems is typically very limited. Finally, the techniques proposed in 5.1 for simple queries are based on techniques for incremental view maintenance [10]. Most techniques for incremental view maintenance are restricted to simple classes of queries. Moreover, our rule based framework is complementary in that it attempts to completely avoid any query execution. 9. CONCLUSIONS In this paper we study the problem of auditing real world queries that typically leverage constructs such as subqueries. We establish that the instance-independent semantics is computationally incompatible with complex queries. We therefore revisit the instance de- 707

12 pendent approach and propose a semantics for auditing that is feasible for arbitrary SQL. We formally analyze the privacy guarantees yielded by our approach. We propose a novel audit optimization framework that is general and attempts to perform auditing without any query execution. The key advantage of the semantics and the optimization techniques is the fact that they are applicable to arbitrarily complex SQL queries. Our initial experiments indicate that the audit optimizer can significantly cut down the time taken for auditing complex queries. We view our work as an important first step in auditing complex SQL queries with privacy guarantees. There are several interesting avenues for future work such as: (1) further improving the efficiency of auditing, (2) understanding the interplay between updates and queries, and (3) drilling down into unsafe queries. [19] Microsoft Corporation, SQL Server 2008 Auditing, [20] The TPC-H Benchmark, [21] A. Y. Halevy, Answering queries using views: A survey, VLDB J., vol. 10, no. 4, REFERENCES [1] Guardium, [2] Lumigent, Auditdb, [3] Oracle Corporation, Oracle audit vault, html. [4] R. Agrawal, R. J. Bayardo, C. Faloutsos, J. Kiernan, R. Rantzau, and R. Srikant, Auditing compliance with a hippocratic database, in VLDB, [5] R. Motwani, S. U. Nabar, and D. Thomas, Auditing sql queries, in ICDE, [6] A. Machanavajjhala and J. Gehrke, On the efficiency of checking perfect privacy, in PODS, [7] G. Miklau and D. Suciu, A formal analysis of information disclosure in data exchange, JCSS, vol. 73, no. 3, [8] T. S. Jayram, P. G. Kolaitis, and E. Vee, The containment problem for REAL conjunctive queries with inequalities, in PODS, [9] S. Rizvi, A. O. Mendelzon, S. Sudarshan, and P. Roy, Extending query rewriting techniques for fine-grained access control, in SIGMOD, [10] A. Gupta and I. S. Mumick, Maintenance of Materialized Views: Problems, Techniques, and Applications, IEEE Data Engineering Bulletin, vol. 18, pp. 3 18, [11] D. Fabbri, K. LeFevre, and Q. Zhu, PolicyReplay: Misconfiguration-Response Queries for Data Breach Reporting. in VLDB, [12] G. Graefe, Volcano - an extensible and parallel query evaluation system, IEEE Trans. Knowl. Data Eng., vol. 6, no. 1, [13] H. Pirahesh, T. Y. C. Leung, and W. Hasan, A Rule Engine for Query Transformation in Starburst and IBM DB2 C/S DBMS, in ICDE, [14] M. Hammer and S. B. Zdonik, Knowledge-based query processing, in VLDB, [15] J. J. King, QUIST: A System for Semantic Query Optimization in Relational Databases, in VLDB, [16] C. A. Galindo-Legaria and M. Joshi, Orthogonal optimization of subqueries and aggregation, in SIGMOD, [17] R. Guravannavar, H. S. Ramanujam, and S. Sudarshan, Optimizing nested queries with parameter sort orders, in VLDB, [18] Special issue on self-managing systems, IEEE Data Eng. Bull., vol. 29, no. 3,

Business Intelligence Extensions for SPARQL

Business Intelligence Extensions for SPARQL Business Intelligence Extensions for SPARQL Orri Erling (Program Manager, OpenLink Virtuoso) and Ivan Mikhailov (Lead Developer, OpenLink Virtuoso). OpenLink Software, 10 Burlington Mall Road Suite 265

More information

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer. DBMS Architecture INSTRUCTION OPTIMIZER Database Management Systems MANAGEMENT OF ACCESS METHODS BUFFER MANAGER CONCURRENCY CONTROL RELIABILITY MANAGEMENT Index Files Data Files System Catalog BASE It

More information

Clean Answers over Dirty Databases: A Probabilistic Approach

Clean Answers over Dirty Databases: A Probabilistic Approach Clean Answers over Dirty Databases: A Probabilistic Approach Periklis Andritsos University of Trento periklis@dit.unitn.it Ariel Fuxman University of Toronto afuxman@cs.toronto.edu Renée J. Miller University

More information

Execution Strategies for SQL Subqueries

Execution Strategies for SQL Subqueries Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp With additional slides from material in paper, added by S. Sudarshan 1 Motivation

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Active database systems. Triggers. Triggers. Active database systems.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Active database systems. Triggers. Triggers. Active database systems. Active database systems Database Management Systems Traditional DBMS operation is passive Queries and updates are explicitly requested by users The knowledge of processes operating on data is typically

More information

Chapter 13: Query Optimization

Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Designing and Using Views To Improve Performance of Aggregate Queries

Designing and Using Views To Improve Performance of Aggregate Queries Designing and Using Views To Improve Performance of Aggregate Queries Foto Afrati 1, Rada Chirkova 2, Shalu Gupta 2, and Charles Loftis 2 1 Computer Science Division, National Technical University of Athens,

More information

Raising Authorization Awareness in a DBMS

Raising Authorization Awareness in a DBMS Raising Authorization Awareness in a DBMS Abhijeet Mohapatra 1, Ravi Ramamurthy, Raghav Kaushik, 1 Stanford University, Microsoft Research ABSTRACT Fine-grained authorization (FGA) is a critical feature

More information

BCA. Database Management System

BCA. Database Management System BCA IV Sem Database Management System Multiple choice questions 1. A Database Management System (DBMS) is A. Collection of interrelated data B. Collection of programs to access data C. Collection of data

More information

Keywords: Regression testing, database applications, and impact analysis. Abstract. 1 Introduction

Keywords: Regression testing, database applications, and impact analysis. Abstract. 1 Introduction Regression Testing of Database Applications Bassel Daou, Ramzi A. Haraty, Nash at Mansour Lebanese American University P.O. Box 13-5053 Beirut, Lebanon Email: rharaty, nmansour@lau.edu.lb Keywords: Regression

More information

Database Application Developer Tools Using Static Analysis and Dynamic Profiling

Database Application Developer Tools Using Static Analysis and Dynamic Profiling Database Application Developer Tools Using Static Analysis and Dynamic Profiling Surajit Chaudhuri, Vivek Narasayya, Manoj Syamala Microsoft Research {surajitc,viveknar,manojsy}@microsoft.com Abstract

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

The Relational Model. Why Study the Relational Model?

The Relational Model. Why Study the Relational Model? The Relational Model Chapter 3 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu Information Science Program School of Information Sciences, University of Pittsburgh 1 Why Study the Relational Model?

More information

A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases

A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases Gerardo Canfora Bice Cavallo University of Sannio, Benevento, Italy, {gerardo.canfora,bice.cavallo}@unisannio.it ABSTRACT In

More information

Database Design Patterns. Winter 2006-2007 Lecture 24

Database Design Patterns. Winter 2006-2007 Lecture 24 Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each

More information

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation Objectives Distributed Databases and Client/Server Architecture IT354 @ Peter Lo 2005 1 Understand the advantages and disadvantages of distributed databases Know the design issues involved in distributed

More information

Explainable Security for Relational Databases

Explainable Security for Relational Databases Explainable Security for Relational Databases Gabriel Bender Cornell University Ithaca, NY 14853, USA gbender@cs.cornell.edu Łucja Kot Cornell University Ithaca, NY 14853, USA lucja@cs.cornell.edu Johannes

More information

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database. Physical Design Physical Database Design (Defined): Process of producing a description of the implementation of the database on secondary storage; it describes the base relations, file organizations, and

More information

Efficient Computation of Multiple Group By Queries Zhimin Chen Vivek Narasayya

Efficient Computation of Multiple Group By Queries Zhimin Chen Vivek Narasayya Efficient Computation of Multiple Group By Queries Zhimin Chen Vivek Narasayya Microsoft Research {zmchen, viveknar}@microsoft.com ABSTRACT Data analysts need to understand the quality of data in the warehouse.

More information

Secure cloud access system using JAR ABSTRACT:

Secure cloud access system using JAR ABSTRACT: Secure cloud access system using JAR ABSTRACT: Cloud computing enables highly scalable services to be easily consumed over the Internet on an as-needed basis. A major feature of the cloud services is that

More information

Leveraging Aggregate Constraints For Deduplication

Leveraging Aggregate Constraints For Deduplication Leveraging Aggregate Constraints For Deduplication Surajit Chaudhuri Anish Das Sarma Venkatesh Ganti Raghav Kaushik Microsoft Research Stanford University Microsoft Research Microsoft Research surajitc@microsoft.com

More information

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1 The Relational Model Ramakrishnan&Gehrke, Chapter 3 CS4320 1 Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. Legacy systems in older models

More information

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014 AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014 Career Details Duration 105 hours Prerequisites This career requires that you meet the following prerequisites: Working knowledge

More information

Lecture 6. SQL, Logical DB Design

Lecture 6. SQL, Logical DB Design Lecture 6 SQL, Logical DB Design Relational Query Languages A major strength of the relational model: supports simple, powerful querying of data. Queries can be written intuitively, and the DBMS is responsible

More information

SQL Auditing. Introduction. SQL Auditing. Team i-protect. December 10, 2009. Denition

SQL Auditing. Introduction. SQL Auditing. Team i-protect. December 10, 2009. Denition SQL Auditing Team i-protect December 10, 2009 Introduction We introduce an auditing algorithm for determining whether a database system is adhering to its data disclosure policies [3]. Consider Bob (homeless,

More information

Chapter 14: Query Optimization

Chapter 14: Query Optimization Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

SQL DATA DEFINITION: KEY CONSTRAINTS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 7

SQL DATA DEFINITION: KEY CONSTRAINTS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 7 SQL DATA DEFINITION: KEY CONSTRAINTS CS121: Introduction to Relational Database Systems Fall 2015 Lecture 7 Data Definition 2 Covered most of SQL data manipulation operations Continue exploration of SQL

More information

Database Access Control & Privacy: Is There A Common Ground?

Database Access Control & Privacy: Is There A Common Ground? Database Access Control & Privacy: Is There A Common Ground? Surajit Chaudhuri Microsoft Research surajitc@microsoft.com Raghav Kaushik Microsoft Research skaushi@microsoft.com Ravi Ramamurthy Microsoft

More information

DLDB: Extending Relational Databases to Support Semantic Web Queries

DLDB: Extending Relational Databases to Support Semantic Web Queries DLDB: Extending Relational Databases to Support Semantic Web Queries Zhengxiang Pan (Lehigh University, USA zhp2@cse.lehigh.edu) Jeff Heflin (Lehigh University, USA heflin@cse.lehigh.edu) Abstract: We

More information

Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

More information

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu

More information

SQL: Queries, Programming, Triggers

SQL: Queries, Programming, Triggers SQL: Queries, Programming, Triggers CSC343 Introduction to Databases - A. Vaisman 1 R1 Example Instances We will use these instances of the Sailors and Reserves relations in our examples. If the key for

More information

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24 Data Federation Administration Tool Guide Content 1 What's new in the.... 5 2 Introduction to administration

More information

Access Control Models Part I. Murat Kantarcioglu UT Dallas

Access Control Models Part I. Murat Kantarcioglu UT Dallas UT DALLAS Erik Jonsson School of Engineering & Computer Science Access Control Models Part I Murat Kantarcioglu UT Dallas Introduction Two main categories: Discretionary Access Control Models (DAC) Definition:

More information

Efficient Algorithms for Masking and Finding Quasi-Identifiers

Efficient Algorithms for Masking and Finding Quasi-Identifiers Efficient Algorithms for Masking and Finding Quasi-Identifiers Rajeev Motwani Stanford University rajeev@cs.stanford.edu Ying Xu Stanford University xuying@cs.stanford.edu ABSTRACT A quasi-identifier refers

More information

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML? CS2Bh: Current Technologies Introduction to XML and Relational Databases Spring 2005 Introduction to Databases CS2 Spring 2005 (LN5) 1 Why databases? Why not use XML? What is missing from XML: Consistency

More information

Database Design and Normalization

Database Design and Normalization Database Design and Normalization CPS352: Database Systems Simon Miner Gordon College Last Revised: 9/27/12 Agenda Check-in Functional Dependencies (continued) Design Project E-R Diagram Presentations

More information

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive

More information

Data exchange. L. Libkin 1 Data Integration and Exchange

Data exchange. L. Libkin 1 Data Integration and Exchange Data exchange Source schema, target schema; need to transfer data between them. A typical scenario: Two organizations have their legacy databases, schemas cannot be changed. Data from one organization

More information

Repair Checking in Inconsistent Databases: Algorithms and Complexity

Repair Checking in Inconsistent Databases: Algorithms and Complexity Repair Checking in Inconsistent Databases: Algorithms and Complexity Foto Afrati 1 Phokion G. Kolaitis 2 1 National Technical University of Athens 2 UC Santa Cruz and IBM Almaden Research Center Oxford,

More information

Bootstrapping Pay-As-You-Go Data Integration Systems

Bootstrapping Pay-As-You-Go Data Integration Systems Bootstrapping Pay-As-You-Go Data Integration Systems Anish Das Sarma Stanford University California, USA anish@cs.stanford.edu Xin Dong AT&T Labs Research New Jersey, USA lunadong@research.att.com Alon

More information

Maintaining Stored Procedures in Database Application

Maintaining Stored Procedures in Database Application Maintaining Stored Procedures in Database Application Santosh Kakade 1, Rohan Thakare 2, Bhushan Sapare 3, Dr. B.B. Meshram 4 Computer Department VJTI, Mumbai 1,2,3. Head of Computer Department VJTI, Mumbai

More information

Database Security. The Need for Database Security

Database Security. The Need for Database Security Database Security Public domain NASA image L-1957-00989 of people working with an IBM type 704 electronic data processing machine. 1 The Need for Database Security Because databases play such an important

More information

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner 24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India

More information

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3 The Relational Model Chapter 3 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase,

More information

Example Instances. SQL: Queries, Programming, Triggers. Conceptual Evaluation Strategy. Basic SQL Query. A Note on Range Variables

Example Instances. SQL: Queries, Programming, Triggers. Conceptual Evaluation Strategy. Basic SQL Query. A Note on Range Variables SQL: Queries, Programming, Triggers Chapter 5 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Example Instances We will use these instances of the Sailors and Reserves relations in our

More information

SQL: Queries, Programming, Triggers

SQL: Queries, Programming, Triggers SQL: Queries, Programming, Triggers Chapter 5 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 R1 Example Instances We will use these instances of the Sailors and Reserves relations in

More information

Overview of Scalable Distributed Database System SD-SQL Server

Overview of Scalable Distributed Database System SD-SQL Server Overview of Scalable Distributed Database System Server Witold Litwin 1, Soror Sahri 2, Thomas Schwarz 3 CERIA, Paris-Dauphine University 75016 Paris, France Abstract. We present a scalable distributed

More information

Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques

Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques Sean Thorpe 1, Indrajit Ray 2, and Tyrone Grandison 3 1 Faculty of Engineering and Computing,

More information

Features Security. File Versioning. Intuitive User Interface. Fast and efficient Backups

Features Security. File Versioning. Intuitive User Interface. Fast and efficient Backups IBackup Professional provides a secure, efficient, reliable, cost effective and easy to use Internet based backup solution with additional emphasis on security and data retention. IBackup Professional

More information

1 File Processing Systems

1 File Processing Systems COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.

More information

Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan?

Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan? Why Query Optimization? Access Path Selection in a Relational Database Management System P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, T. Price Peyman Talebifard Queries must be executed and execution

More information

Functional Dependencies and Normalization

Functional Dependencies and Normalization Functional Dependencies and Normalization 5DV119 Introduction to Database Management Umeå University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Functional

More information

The process of database development. Logical model: relational DBMS. Relation

The process of database development. Logical model: relational DBMS. Relation The process of database development Reality (Universe of Discourse) Relational Databases and SQL Basic Concepts The 3rd normal form Structured Query Language (SQL) Conceptual model (e.g. Entity-Relationship

More information

SQL Server Parallel Data Warehouse: Architecture Overview. José Blakeley Database Systems Group, Microsoft Corporation

SQL Server Parallel Data Warehouse: Architecture Overview. José Blakeley Database Systems Group, Microsoft Corporation SQL Server Parallel Data Warehouse: Architecture Overview José Blakeley Database Systems Group, Microsoft Corporation Outline Motivation MPP DBMS system architecture HW and SW Key components Query processing

More information

InvGen: An Efficient Invariant Generator

InvGen: An Efficient Invariant Generator InvGen: An Efficient Invariant Generator Ashutosh Gupta and Andrey Rybalchenko Max Planck Institute for Software Systems (MPI-SWS) Abstract. In this paper we present InvGen, an automatic linear arithmetic

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

Fairness in Routing and Load Balancing

Fairness in Routing and Load Balancing Fairness in Routing and Load Balancing Jon Kleinberg Yuval Rabani Éva Tardos Abstract We consider the issue of network routing subject to explicit fairness conditions. The optimization of fairness criteria

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute

More information

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation: CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm

More information

2. Basic Relational Data Model

2. Basic Relational Data Model 2. Basic Relational Data Model 2.1 Introduction Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS s that

More information

Optimization of SQL Queries in Main-Memory Databases

Optimization of SQL Queries in Main-Memory Databases Optimization of SQL Queries in Main-Memory Databases Ladislav Vastag and Ján Genči Department of Computers and Informatics Technical University of Košice, Letná 9, 042 00 Košice, Slovakia lvastag@netkosice.sk

More information

Chapter 5. SQL: Queries, Constraints, Triggers

Chapter 5. SQL: Queries, Constraints, Triggers Chapter 5 SQL: Queries, Constraints, Triggers 1 Overview: aspects of SQL DML: Data Management Language. Pose queries (Ch. 5) and insert, delete, modify rows (Ch. 3) DDL: Data Definition Language. Creation,

More information

Procedia Computer Science 00 (2012) 1 21. Trieu Minh Nhut Le, Jinli Cao, and Zhen He. trieule@sgu.edu.vn, j.cao@latrobe.edu.au, z.he@latrobe.edu.

Procedia Computer Science 00 (2012) 1 21. Trieu Minh Nhut Le, Jinli Cao, and Zhen He. trieule@sgu.edu.vn, j.cao@latrobe.edu.au, z.he@latrobe.edu. Procedia Computer Science 00 (2012) 1 21 Procedia Computer Science Top-k best probability queries and semantics ranking properties on probabilistic databases Trieu Minh Nhut Le, Jinli Cao, and Zhen He

More information

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added? DBMS Questions 1.) Which type of file is part of the Oracle database? A.) B.) C.) D.) Control file Password file Parameter files Archived log files 2.) Which statements are use to UNLOCK the user? A.)

More information

Distributed Databases in a Nutshell

Distributed Databases in a Nutshell Distributed Databases in a Nutshell Marc Pouly Marc.Pouly@unifr.ch Department of Informatics University of Fribourg, Switzerland Priciples of Distributed Database Systems M. T. Özsu, P. Valduriez Prentice

More information

Guide to Performance and Tuning: Query Performance and Sampled Selectivity

Guide to Performance and Tuning: Query Performance and Sampled Selectivity Guide to Performance and Tuning: Query Performance and Sampled Selectivity A feature of Oracle Rdb By Claude Proteau Oracle Rdb Relational Technology Group Oracle Corporation 1 Oracle Rdb Journal Sampled

More information

www.gr8ambitionz.com

www.gr8ambitionz.com Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1

More information

Postgres Plus xdb Replication Server with Multi-Master User s Guide

Postgres Plus xdb Replication Server with Multi-Master User s Guide Postgres Plus xdb Replication Server with Multi-Master User s Guide Postgres Plus xdb Replication Server with Multi-Master build 57 August 22, 2012 , Version 5.0 by EnterpriseDB Corporation Copyright 2012

More information

Functional Dependencies and Finding a Minimal Cover

Functional Dependencies and Finding a Minimal Cover Functional Dependencies and Finding a Minimal Cover Robert Soulé 1 Normalization An anomaly occurs in a database when you can update, insert, or delete data, and get undesired side-effects. These side

More information

Chapter 23. Database Security. Security Issues. Database Security

Chapter 23. Database Security. Security Issues. Database Security Chapter 23 Database Security Security Issues Legal and ethical issues Policy issues System-related issues The need to identify multiple security levels 2 Database Security A DBMS typically includes a database

More information

Sybilproof Reputation Mechanisms

Sybilproof Reputation Mechanisms Sybilproof Reputation Mechanisms Alice Cheng Center for Applied Mathematics Cornell University, Ithaca, NY 14853 alice@cam.cornell.edu Eric Friedman School of Operations Research and Industrial Engineering

More information

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel

More information

OPRE 6201 : 2. Simplex Method

OPRE 6201 : 2. Simplex Method OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2

More information

Part A: Data Definition Language (DDL) Schema and Catalog CREAT TABLE. Referential Triggered Actions. CSC 742 Database Management Systems

Part A: Data Definition Language (DDL) Schema and Catalog CREAT TABLE. Referential Triggered Actions. CSC 742 Database Management Systems CSC 74 Database Management Systems Topic #0: SQL Part A: Data Definition Language (DDL) Spring 00 CSC 74: DBMS by Dr. Peng Ning Spring 00 CSC 74: DBMS by Dr. Peng Ning Schema and Catalog Schema A collection

More information

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations 1 Topics for this week: 1. Good Design 2. Functional Dependencies 3. Normalization Readings for this week: 1. E&N, Ch. 10.1-10.6; 12.2 2. Quickstart, Ch. 3 3. Complete the tutorial at http://sqlcourse2.com/

More information

Instant SQL Programming

Instant SQL Programming Instant SQL Programming Joe Celko Wrox Press Ltd. INSTANT Table of Contents Introduction 1 What Can SQL Do for Me? 2 Who Should Use This Book? 2 How To Use This Book 3 What You Should Know 3 Conventions

More information

DBMS / Business Intelligence, SQL Server

DBMS / Business Intelligence, SQL Server DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.

More information

Introduction to Microsoft Jet SQL

Introduction to Microsoft Jet SQL Introduction to Microsoft Jet SQL Microsoft Jet SQL is a relational database language based on the SQL 1989 standard of the American Standards Institute (ANSI). Microsoft Jet SQL contains two kinds of

More information

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Querying RDF data stored in DBMS: SPARQL to SQL Conversion Yale University technical report #1409 ABSTRACT This paper discusses the design and implementation

More information

Self-Tuning Database Systems: A Decade of Progress Surajit Chaudhuri Microsoft Research

Self-Tuning Database Systems: A Decade of Progress Surajit Chaudhuri Microsoft Research Self-Tuning Database Systems: A Decade of Progress Surajit Chaudhuri Microsoft Research surajitc@microsoft.com Vivek Narasayya Microsoft Research viveknar@microsoft.com ABSTRACT In this paper we discuss

More information

Purpose Based Access Control for Privacy Protection in Relational Database Systems

Purpose Based Access Control for Privacy Protection in Relational Database Systems VLDB Journal manuscript No. (will be inserted by the editor) Purpose Based Access Control for Privacy Protection in Relational Database Systems Ji-Won Byun, Ninghui Li CERIAS and Department of Computer

More information

CHAPTER 7 GENERAL PROOF SYSTEMS

CHAPTER 7 GENERAL PROOF SYSTEMS CHAPTER 7 GENERAL PROOF SYSTEMS 1 Introduction Proof systems are built to prove statements. They can be thought as an inference machine with special statements, called provable statements, or sometimes

More information

SQL Tables, Keys, Views, Indexes

SQL Tables, Keys, Views, Indexes CS145 Lecture Notes #8 SQL Tables, Keys, Views, Indexes Creating & Dropping Tables Basic syntax: CREATE TABLE ( DROP TABLE ;,,..., ); Types available: INT or INTEGER REAL or FLOAT CHAR( ), VARCHAR( ) DATE,

More information

Query Processing in Encrypted Cloud Databases

Query Processing in Encrypted Cloud Databases Query Processing in Encrypted Cloud Databases A Project Report Submitted in partial fulfilment of the requirements for the Degree of Master of Engineering in Computer Science and Engineering by Akshar

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES Constantin Brâncuşi University of Târgu Jiu ENGINEERING FACULTY SCIENTIFIC CONFERENCE 13 th edition with international participation November 07-08, 2008 Târgu Jiu TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED

More information

Data integration general setting

Data integration general setting Data integration general setting A source schema S: relational schema XML Schema (DTD), etc. A global schema G: could be of many different types too A mapping M between S and G: many ways to specify it,

More information

Extending Data Processing Capabilities of Relational Database Management Systems.

Extending Data Processing Capabilities of Relational Database Management Systems. Extending Data Processing Capabilities of Relational Database Management Systems. Igor Wojnicki University of Missouri St. Louis Department of Mathematics and Computer Science 8001 Natural Bridge Road

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs. Phases of database design Application requirements Conceptual design Database Management Systems Conceptual schema Logical design ER or UML Physical Design Relational tables Logical schema Physical design

More information

[Refer Slide Time: 05:10]

[Refer Slide Time: 05:10] Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture

More information

Differential privacy in health care analytics and medical research An interactive tutorial

Differential privacy in health care analytics and medical research An interactive tutorial Differential privacy in health care analytics and medical research An interactive tutorial Speaker: Moritz Hardt Theory Group, IBM Almaden February 21, 2012 Overview 1. Releasing medical data: What could

More information

Relational model. Relational model - practice. Relational Database Definitions 9/27/11. Relational model. Relational Database: Terminology

Relational model. Relational model - practice. Relational Database Definitions 9/27/11. Relational model. Relational Database: Terminology COS 597A: Principles of Database and Information Systems elational model elational model A formal (mathematical) model to represent objects (data/information), relationships between objects Constraints

More information

IMPLEMENTING FORENSIC READINESS USING PERFORMANCE MONITORING TOOLS

IMPLEMENTING FORENSIC READINESS USING PERFORMANCE MONITORING TOOLS Chapter 18 IMPLEMENTING FORENSIC READINESS USING PERFORMANCE MONITORING TOOLS Franscois van Staden and Hein Venter Abstract This paper proposes the use of monitoring tools to record data in support of

More information

The Recovery of a Schema Mapping: Bringing Exchanged Data Back

The Recovery of a Schema Mapping: Bringing Exchanged Data Back The Recovery of a Schema Mapping: Bringing Exchanged Data Back MARCELO ARENAS and JORGE PÉREZ Pontificia Universidad Católica de Chile and CRISTIAN RIVEROS R&M Tech Ingeniería y Servicios Limitada A schema

More information

Databases 2011 The Relational Model and SQL

Databases 2011 The Relational Model and SQL Databases 2011 Christian S. Jensen Computer Science, Aarhus University What is a Database? Main Entry: da ta base Pronunciation: \ˈdā-tə-ˌbās, ˈda- also ˈdä-\ Function: noun Date: circa 1962 : a usually

More information

Incorporating Evidence in Bayesian networks with the Select Operator

Incorporating Evidence in Bayesian networks with the Select Operator Incorporating Evidence in Bayesian networks with the Select Operator C.J. Butz and F. Fang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada SAS 0A2 {butz, fang11fa}@cs.uregina.ca

More information

Security and Scalability in a Biomunicipal Network - Part 1

Security and Scalability in a Biomunicipal Network - Part 1 Simultaneous Scalability and Security for Data-Intensive Web Applications Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, Anthony Tomasic Carnegie Mellon University

More information