Data mining successfully extracts knowledge to

Size: px
Start display at page:

Download "Data mining successfully extracts knowledge to"

Transcription

1 C O V E R F E A T U R E Privacy-Preserving Data Mining Systems Nan Zhang University of Texas at Arlington Wei Zhao Rensselaer Polytechnic Institute Although successful in many applications, data mining poses special concerns for private data. An integrated architecture takes a systemic view of the problem, implementing established protocols for data collection, inference control, and information sharing. Data mining successfully extracts knowledge to support a variety of domains marketing, weather forecasting, medical diagnosis, and national security but it is still a challenge to mine certain kinds of data without violating the data owners privacy. 1 How to mine patients private data, for example, is an ongoing problem in healthcare applications. In recognition of the growing privacy concern, directives such as the US Health Insurance Portability and Accountability Act (HIPAA) and the European Union Privacy Directive mandate privacy protection for data management and analysis systems. As data mining becomes more pervasive, such concerns are increasing. Online data collection systems are an example of new applications that threaten individual privacy. Already companies are sharing data mining models to obtain a richer set of data about mutual customers and their buying habits. The computing community must address data mining privacy before data mining techniques become widespread and the threat to private information spirals out of control. The sticking point is how to protect privacy while preserving the usefulness of data mining results. Much research is under way to address obstacles, but practical privacy-preserving data mining systems are largely in the research and prototyping stages. Many techniques for privacy-preserving data mining concentrate on algorithmic solutions and underlying mathematical tools, 2,3 rather than focusing on system issues. Our goal in investigating privacy preservation issues was to take a systemic view of architectural requirements and design principles and explore possible solutions that would lead to guidelines for building practical privacypreserving data mining systems. FOUNDATIONAL DESIGN As Figure 1 shows, privacy-preserving data mining usually has multiple steps that translate to a three-tiered architecture: At the bottom tier are the data providers, the data owners, which are often physically distributed. The data providers submit their private data to the data warehouse server. This server, which constitutes the middle tier, supports online analytical data processing to facilitate data mining by translating raw data from the data providers into aggregate data that the data mining servers can more quickly process. The data warehouse server stores the data collected in disciplined physical structures, such as a multidimensional data cube, and aggregates and precomputes the data in various forms, such as sum, average, max, and min. In an online survey system, for example, the survey respondents would be data providers who submit their data to the survey analyzer s data warehouse server; an aggregated data point might be the average age of all survey respondents. The aggregated data is more efficient to process than raw data from the providers. At the top tier are the data mining servers, which perform the actual data mining. In a privacy-preserving data 52 Computer Published by the IEEE Computer Society /07/$ IEEE

2 mining system, these servers do not have free access to all data in the data warehouse. In a hospital system, the accounting department can mine patients financial data, for example, but cannot access patients medical records. Developing and validating effective rules for the data mining servers access to the data warehouse is an open research problem. 4 Besides constructing data mining models on its local data warehouse server, a data mining server might share information with data mining servers from other systems. The motivation for this sharing is to build data mining models that span systems. For example, several retail companies might opt to share their local data mining models on customer Data Mining System 1 Data mining servers Data warehouse server Data providers records to build a global data mining model about consumer behavior that would benefit all the companies. As Figure 1 shows, sharing occurs in the top tier, where each data mining server holds the data mining model of its own system. Thus, sharing means sharing local data mining models rather than raw data. Minimum necessary design principle Any design of a privacy-preserving data mining system requires a clear definition of privacy. The common interpretation is that a data point is private if its owner has the right to choose whether or not, to what extent, and for what purpose to disclose the data point to others. In privacy-preserving data mining literature, most authors assume (either implicitly or explicitly) that a data owner generally chooses not to disclose its private data unless data mining requires it. This assumption and the accepted information-privacy definition form the basis of the minimum necessary design principle: In a data mining system, disclosed private information (from one entity to another) should be the minimum necessary for data mining. Minimum in this context is a qualitative, not a quantitative, measure. Since the quantitative measure of privacy disclosure varies among systems, minimum captures the idea that all unnecessary private information (unnecessary in the context of how accurate the data mining results must be) should not be disclosed. Information sharing Data Mining System 2 Data mining servers Data warehouse server Data providers Figure 1. Basic architecture for privacy-preserving data mining.the architecture typically has three tiers: data providers, which are the data owners; the data warehouse server, which supports online analytical processing; and the data mining servers that perform data mining tasks and share information. The challenge is to control private information transmitted among entities without impeding data mining. Minimum thus means that privacy disclosure is on a need-to-know basis. Many privacy regulations, including HIPAA, mandate this minimum necessary rule. Privacy protocols On the basis of the architecture in Figure 1 and the minimum necessary design principle, we have evolved a basic strategy for building a privacy-preserving data mining system. Central to the strategy are three protocols that govern privacy disclosure among entities: Data collection protects privacy during data transmission from the data providers to the data warehouse server. Inference control manages privacy protection between the data warehouse server and data mining servers. Information sharing controls information shared among the data mining servers in different systems. Given the minimum necessary rule, a common goal of these protocols is to transmit the minimum private information necessary for data mining from one entity to another to build accurate data mining models. In reality, it is often difficult to build an efficient system that protects private information perfectly. Consequently, there are always tradeoffs between data privacy and data mining model accuracy. These protocols are based on established methods that the system designer can tailor to particular requirements, choosing the most beneficial tradeoffs. The data collection protocol, for example, can April

3 Perturbation-based Value-based method Aggregation-based Data collection protocol draw from one of two established collection methods, each with its advantages and drawbacks. DATA COLLECTION PROTOCOL The data collection protocol lets data providers identify the minimum necessary part of private information what must be disclosed to build accurate data mining models and ensures that they transmit only that part of the information to the data warehouse server. Several requirements shape the data collection protocol. First, it must be scalable, since a data warehouse server can deal with as many as hundreds of thousands of data providers, as in an online survey system. Second, the computational cost to data providers must be small because they have considerably lower computational power than the data warehouse server, and a higher cost could discourage them from participating in data mining. Finally, the protocol must be robust; it must deliver relatively accurate data mining results while protecting data providers privacy, even if data providers behave erratically. For example, if some data providers in an online survey system deviate from the protocol or submit meaningless data, the data collection protocol must control the influence of such erroneous behavior and ensure that global data mining results remain sufficiently accurate. Figure 2 shows a data collection protocol taxonomy based on two data collection methods. Value-based method With the value-based method, 5 a data provider manipulates the value of each data attribute or item independently using one of two es. The perturbation-based 3 adds noise directly to the original data values, such as changing age 23 to 30 or Texas to California. The aggregation-based generalizes data according to the relevant domain hierarchy, such as changing age 23 to age range or Texas to the US. The perturbation-based is highly suitable for arbitrary data, while the aggregation-based relies on knowledge of the domain hierarchy, but Dimension-based method Blocking-based Projection-based Figure 2. Data collection protocol taxonomy. A designer can choose which of two methods value- or dimension-based and its attendant es best serve the design. can be effective in guaranteeing the data s anonymity 6 k-anonymity, for example, means that each perturbed data record is indistinguishable from the perturbed values of at least k 1 other data records. The value-based method assumes that it would be difficult, if not impossible, for the data warehouse server to rediscover the original private data from the manipulated values but that the server would still be able to recover the original data distribution from the perturbed data, thereby supporting the construction of accurate data mining models. 5 Dimension-based method The dimension-based method is so called because the data to be mined usually has many attributes, or dimensions. The basic idea is to remove part of the private information from the original data by reducing the number of dimensions. The blocking-based 3 accomplishes this by truncating some private attributes without releasing them to the data warehouse server. However, this could result in information loss, preventing data mining servers from constructing accurate data mining models. The more complicated projection-based 7 overcomes this problem by projecting the original data into a carefully designed, low-dimensional subspace in a way that retains only the minimum information necessary to construct accurate data mining models. Advantages and drawbacks Each method and attendant has pluses and minuses. The value-based method is independent of the data mining task, which makes it suitable for applications involving multiple data mining tasks or tasks unknown at data collection. In contrast, the dimensionbased method fits better with individual data mining tasks because the information to be retained after dimension reduction usually depends on the particular task. So far, research has not defined an effective and universally applicable projection-based. Even so, the projection-based promises strong advantages over value-based methods in terms of the tradeoff between accuracy and privacy protection. Most value-based es treat different attributes independently and separately, so at least some attributes that are less necessary for data mining are always disclosed to the data warehouse server to the same extent as other attributes. Indeed a recent study 54 Computer

4 revealed that, with the perturbation-based randomization, the data warehouse server could use privacy intrusion techniques to filter noise from the perturbed data, thereby rediscovering part of the original private data. 8 The projection-based avoids this problem by exploiting the relationship among attributes and disclosing only those necessary for data mining. Guiding data submission can also reduce unnecessary privacy disclosure, enhancing the performance of data perturbation. In earlier work, 7 we and colleague Shengquan Wang proposed a guidance-based dimension reduction scheme for dynamic systems, such as online survey systems, in which data providers (survey respondents and so on) join the system and submit their data asynchronously. To guide data providers that have not yet submitted data, the scheme analyzes the data already collected and estimates the attributes necessary for data mining. The system then sends the estimated useful attributes to data providers as guidance. Our work shows that this guidance-based scheme is more effective than es without such guidance. INFERENCE CONTROL PROTOCOL Protecting private data in the data warehouse server requires controlling the information disclosed to the data mining servers which is the aim of the inference control protocol. Following the minimum necessary rule, the inference control protocol ensures that the data warehouse server answers the queries necessary for data mining yet minimizes privacy disclosure. Several requirements drive the inference control protocol s design and implementation. One is the need to block inferences. If a data mining server becomes an adversary, it will try to infer private information from the query answers it has already received. Figure 3 gives an example. Further, the inference control protocol must be efficient enough to satisfy the data warehouse server s required online response time the time between issuing a query and answering it. The time that an inference control protocol uses is part of that response time. It must be controlled so that the data warehouse server can maintain its reduced response time. To meet these requirements, inference control protocols must restrict the information included in the query answers so that the data mining server cannot infer private data from received query answers. Classify safe and unsafe sets offline Item April May June July Sum Book 10 Known 15 Known Q 5 = 25 CD 20 Known 27 Known Q 6 = 47 DVD Known Q 7 = 87 Game Known 25 Known 14 Q 8 = 39 Sum Q 1 = 30 Q 2 = 60 Q 3 = 58 Q 4 = 50 Figure 3. Inference that discloses private information. If the data mining server becomes an adversary, it might be able to infer from the query answers and certain cells (Known) the number of DVDs a data provider sold in June (which is private and should not be disclosed) by computing Q 1 + Q 3 (Q 5 + Q 6 ) = = 16, where Q 1 to Q 8 are query answers. Query-oriented method Figure 4 shows an inference control protocol taxonomy based on two inference control methods. Query-oriented method The query-oriented method 4 is centered on the concept of a safe query set, which says that query set <Q 1, Q 2,, Q n > is safe if a data mining server cannot infer private data from the answers to Q 1, Q 2,, Q n. Thus, query-oriented inference control means that when the data warehouse server receives a query, it will answer the query only if the union set of query history the set of all queries already answered and the recently received query are safe. Otherwise, it will reject the query. Relative to query-oriented inference control in statistical databases, inference control in data warehouses involves significantly more data. Consequently, the burden is on inference control protocols to process queries more efficiently. Because dynamically determining a query set s safety (online query history check) can be time-consuming, a static version of the query-oriented method might be more suitable. The static version determines a safe set of queries offline (before any query is actually received). If a query set is safe, then any one of its subsets is also safe. At runtime, when the data warehouse server Check query history online Inference control protocol Do perturbation by data collection Data-oriented method Do perturbation online when query received Figure 4. An inference control protocol taxonomy. A designer can choose which of two methods query- or data-oriented best serves the design. April

5 receives the query, it answers only if the query is in the predetermined safe set. Otherwise, it will reject the query. On the downside, the static method is conservative in selecting a safe set, which might cause it to reject some queries unnecessarily. Data-oriented method With the data-oriented method of inference control, 9 the data warehouse server perturbs the stored raw data and estimates the query answers as accurately as possible on the basis of the perturbed data. As Figure 4 shows, the data collection protocol can handle perturbation unless the application requires storing original data in the data warehouse server. In that case, the data warehouse server might have to perturb the data when processing the query. The data-oriented method assumes that perturbation can protect private information from being disclosed, enabling the data warehouse server to answer all queries freely on the basis of the perturbed data. Research has shown that the query answers estimated from the perturbed data can still support the construction of accurate data mining models. 5 Advantages and disadvantages The two methods have unique performance considerations. The data-oriented method offers query responsiveness, since the data warehouse server will answer all queries. The query-oriented method, in contrast, normally rejects a substantial number of queries, 9 which means that some data mining servers might be unable to complete their data mining tasks. On the plus side, the query-oriented method can provide more accurate answers than the data-oriented method. When the data warehouse server answers a query, its answer will always be precise. The data-oriented method, in contrast, answers queries with estimation, so it might not be accurate enough to support data mining, particularly when the construction of data mining models requires highly accurate query answers. Efficiency is an important advantage for the static version of the query-oriented method, which has the shortest response time because most of its computational cost is offline. The dynamic version must trade off efficiency and query responsiveness: To answer more queries, the data warehouse server must spend more time analyzing the query history. The data-oriented method also suffers from low efficiency, since the computational overhead for query estimation can be several orders of magnitude higher than for query answering. One way to enhance inference control protocol performance is to integrate query- and data-oriented methods. Introducing the query answer-or-reject scheme to the data-oriented method would let the data warehouse The query-oriented method can provide more accurate answers than the data-oriented method. server reject some privacy-divulging queries (such as Q 3 in Figure 3). This, in turn, would effectively downgrade the data perturbation level yet retain the same degree of privacy protection. Because the data is perturbed, the server would have to reject far fewer queries and could thus answer most queries fairly accurately while continuing to protect private information. INFORMATION SHARING PROTOCOL Because each data mining server constructs local data mining models in its own system, these servers are likely to share their local data mining models rather than the raw data in the data warehouses. Local data mining models can be sensitive, especially when the local models are not globally valid. To protect the privacy of individual data mining systems, some mechanism must control the disclosure of private information in local data mining models. This mechanism is the information sharing protocol, which again follows the minimum necessary rule. The protocol s objective is to enable data mining servers across multiple systems to construct global data mining models while disclosing only the minimum private information about local data mining models necessary for information sharing. Many information sharing protocols exist for applications other than data mining, such as database interoperation or data integration. 10 Information sharing is necessary for most distributed data mining systems, and much work has focused on designing specific information sharing protocols for data mining tasks. A major design concern of the information sharing protocol is defending against adversaries that behave arbitrarily within the capability allocated to them. The defense strategy depends on the adversary model the set of assumptions about an adversary s intent and behavior. Two of the more popular adversary models are semihonest 10 and beyond semihonest. Semihonest adversaries An adversary is semihonest if it properly follows the designated protocol but records all intermediate computation and communication, thereby providing a way to derive private information. Cryptographic encryption has proved effective in defending against semihonest adversaries. 2,10,11 In this method, each data mining server encrypts its local data mining model and exchanges the encrypted model with other data mining servers. Some encryption scheme properties, such as the Rivest- Shamir-Adleman (RSA) cryptosystem s commutative encryption property, make it possible to design algorithms for data mining servers to perform certain data mining tasks and set operations without knowing the 56 Computer

6 private keys of other entities. 2,10,11 Tasks include classification, association rule mining, clustering, and collaborative filtering; set operations include set intersection, set union, and element reduction. Because it is not possible to recover the original (local) data mining models from their encrypted values without knowing the private keys, this method is a secure defense against semihonest adversaries. Researchers have already evolved a detailed taxonomy and cryptographic encryption methods for various system settings. 2,3 Beyond semihonest adversaries An adversary is considered beyond semihonest if it deviates from the designated protocol, changes its input data, or both. Because it is difficult if not impossible to defend against an adversary that is behaving arbitrarily, dealing with beyond semihonest adversaries requires more refined models. One such model is the intent-based adversary model, 12 which formulates an adversary s intent as combining the intent to obtain accurate data mining results with compromising other entities private information. A game-theoretic method is then developed to defend against adversaries that weigh the accuracy of data mining results over compromising other parties privacy. 12 The basic idea is to design the information sharing protocol in a way that no adversary can both obtain accurate data mining results and intrude on other servers privacy. Adversaries that are more concerned with the accuracy of data mining results will be forced not to intrude on the privacy of others to get that accuracy. OPEN RESEARCH ISSUES Several issues require additional research to ensure the optimum performance of the techniques described. Protocol integration Many systems need a seamless integration of the three protocols, yet little research has addressed this need. Our proposed integrated architecture could serve as a platform for studying protocol interaction. Such insights can pave the way for effective and efficient integration. Research on anomaly detection can contribute to multiple disciplines, such as security, biology, and finance. Heterogeneous privacy requirements Privacy-preserving data mining techniques depend on respecting the privacy protection levels that data providers require. Most existing studies assume homogenous privacy requirements that all data owners need the same privacy level for all their data and its attributes. This assumption is unrealistic in practice and could even degrade system performance unnecessarily. Designing and implementing techniques that exploit heterogeneous privacy requirements is a challenge with much potential return. Privacy measurements The accuracy versus protection tradeoff inherent in privacy-preserving data mining means that some mechanism must accurately measure the degree of privacy protection. Although extensive work has focused on privacy measurement, as yet no one has proposed a commonly accepted measurement technique for generic privacy-preserving data mining systems. Proper privacy protection measurement has three criteria: It must reflect system settings (adversaries might have different levels of interest in different data values, such as being more concerned with patients that have contagious diseases than other diseases), account for data providers diverse privacy concerns (some might consider age as private information, while others are willing to disclose it publicly), and satisfy the minimum necessary rule. A comprehensive study of privacy measurement for all three protocols would be a huge step toward improving the performance of privacy-preserving data mining techniques. Anomaly detection A common application of data mining is to detect data-set anomalies, as in mining log file data to detect intrusions. However, few researchers have considered privacy protection in detecting anomalies. Research on anomaly detection is an important part of data mining and can contribute to multiple disciplines, such as security, biology, and finance. Thoroughly investigating issues related to the design of privacy-preserving data mining techniques for anomaly detection would be extremely beneficial. Multiple protection levels In some cases, multiple levels of private information must be protected. The first level might be a data point value, and the second level, the data point sensitivity (knowledge of whether or not a data point is private). Most existing studies focus on protecting the first level and assume that all entities already know the second level. Research has yet to answer how to protect the second level (and higher levels) of private information. Our work is an important first step in addressing the critical systemic issues of privacy preservation in data mining. Much research remains to realize the April

7 potential of the architecture and design principles we have described. Much literature already addresses privacy-preserving data mining, but clearly the ideas must cross considerable ground to become practical systems. Studies are needed for the design of privacy-preserving data mining techniques in real-world scenarios, in which data owners can freely address their individual privacy concerns without the data miner s consent. Also critical is work that more closely incorporates designs with specialized applications such as healthcare, market analysis, and finance. Our hope is that others will continue efforts in this important area. References 1. J. Han and M. Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann, C. Clifton et al., Tools for Privacy Preserving Distributed Data Mining, SIGKDD Explorations, vol. 4, no. 2, 2003, pp V.S. Verykios et al., State-of-the-Art in Privacy Preserving Data Mining, SIGMOD Record, vol. 33, no. 1, 2004, pp L. Wang, S. Jajodia, and D. Wijesekera, Securing OLAP Data Cubes against Privacy Breaches, Proc. 25th IEEE Symp. Security and Privacy, IEEE Press, 2004, pp R. Agrawal and R. Srikant, Privacy-Preserving Data Mining, Proc. 19th ACM SIGMOD Int l Conf. Management of Data, ACM Press, 2000, pp R.J. Bayardo and R. Agrawal, Data Privacy through Optimal k-anonymization, Proc. 21st Int l Conf. Data Eng., IEEE Press, 2005, pp N. Zhang, S. Wang, and W. Zhao, A New Scheme on Privacy-Preserving Data Classification, Proc. 11th ACM SIGKDD Int l Conf. Knowledge Discovery and Data Mining, ACM Press, 2005, pp Z. Huang, W. Du, and B. Chen, Deriving Private Information from Randomized Data, Proc. 24th ACM SIGMOD Int l Conf. Management of Data, ACM Press, 2005, pp R. Agrawal, R. Srikant, and D. Thomas, Privacy-Preserving OLAP, Proc. 25th ACM SIGMOD Int l Conf. Management of Data, ACM Press, 2005, pp R. Agrawal, A. Evfimievski, and R. Srikant, Information Sharing across Private Databases, Proc. 22nd ACM SIG- MOD Int l Conf. Management of Data, ACM Press, 2003, pp Y. Lindell and B. Pinkas, Privacy Preserving Data Mining, Proc. 12th Ann. Int l Conf. Advances in Cryptology, Springer- Verlag, 2000, pp N. Zhang and W. Zhao, Distributed Privacy Preserving Information Sharing, Proc. 31st Int l Conf. Very Large Data Bases, ACM Press, 2005, pp Nan Zhang is an assistant professor of computer science and engineering at the University of Texas at Arlington. His research interests include databases and data mining, information security and privacy, and distributed systems. Zhang received a PhD in computer science from Texas A&M University. He is a member of the IEEE. Contact him at nzhang@cse.uta.edu. Wei Zhao is a professor of computer science and the dean for the School of Science at Rensselaer Polytechnic Institute. His research interests include distributed computing, real-time systems, computer networks, and cyberspace security. Zhao received a PhD in computer and information sciences from the University of Massachusetts, Amherst. He is a Fellow of the IEEE and a member of the IEEE Computer Society and the ACM. Contact him at zhaow3@rpi.edu. Engineering and Applying the Internet IEEE Internet Computing reports emerging tools, technologies, and applications implemented through the Internet to support a worldwide computing environment. In 2007, we ll look at: Autonomic Computing Roaming Distance Learning Dynamic Information Dissemination Knowledge Management Media Search 58 Computer

On the Performance Measurements for Privacy Preserving Data Mining

On the Performance Measurements for Privacy Preserving Data Mining On the Performance Measurements for Privacy Preserving Data Mining Nan Zhang, Wei Zhao, and Jianer Chen Department of Computer Science, Texas A&M University College Station, TX 77843, USA {nzhang, zhao,

More information

Information Security in Big Data using Encryption and Decryption

Information Security in Big Data using Encryption and Decryption International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Information Security in Big Data using Encryption and Decryption SHASHANK -PG Student II year MCA S.K.Saravanan, Assistant Professor

More information

DATA MINING - 1DL360

DATA MINING - 1DL360 DATA MINING - 1DL360 Fall 2013" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/per1ht13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

OLAP Online Privacy Control

OLAP Online Privacy Control OLAP Online Privacy Control M. Ragul Vignesh and C. Senthil Kumar Abstract--- The major issue related to the protection of private information in online analytical processing system (OLAP), is the privacy

More information

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment www.ijcsi.org 434 A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment V.THAVAVEL and S.SIVAKUMAR* Department of Computer Applications, Karunya University,

More information

CHAPTER 4 Data Warehouse Architecture

CHAPTER 4 Data Warehouse Architecture CHAPTER 4 Data Warehouse Architecture 4.1 Data Warehouse Architecture 4.2 Three-tier data warehouse architecture 4.3 Types of OLAP servers: ROLAP versus MOLAP versus HOLAP 4.4 Further development of Data

More information

International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS

International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, chittiprolumounika@gmail.com; Third C.

More information

A Secure Model for Medical Data Sharing

A Secure Model for Medical Data Sharing International Journal of Database Theory and Application 45 A Secure Model for Medical Data Sharing Wong Kok Seng 1,1,Myung Ho Kim 1, Rosli Besar 2, Fazly Salleh 2 1 Department of Computer, Soongsil University,

More information

Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules

Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules M.Sangeetha 1, P. Anishprabu 2, S. Shanmathi 3 Department of Computer Science and Engineering SriGuru Institute of Technology

More information

Li Xiong, Emory University

Li Xiong, Emory University Healthcare Industry Skills Innovation Award Proposal Hippocratic Database Technology Li Xiong, Emory University I propose to design and develop a course focused on the values and principles of the Hippocratic

More information

Privacy Preserved Association Rule Mining For Attack Detection and Prevention

Privacy Preserved Association Rule Mining For Attack Detection and Prevention Privacy Preserved Association Rule Mining For Attack Detection and Prevention V.Ragunath 1, C.R.Dhivya 2 P.G Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,

More information

UNDERSTAND YOUR CLIENTS BETTER WITH DATA How Data-Driven Decision Making Improves the Way Advisors Do Business

UNDERSTAND YOUR CLIENTS BETTER WITH DATA How Data-Driven Decision Making Improves the Way Advisors Do Business UNDERSTAND YOUR CLIENTS BETTER WITH DATA How Data-Driven Decision Making Improves the Way Advisors Do Business Executive Summary Financial advisors have long been charged with knowing the investors they

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

(b) How data mining is different from knowledge discovery in databases (KDD)? Explain.

(b) How data mining is different from knowledge discovery in databases (KDD)? Explain. Q2. (a) List and describe the five primitives for specifying a data mining task. Data Mining Task Primitives (b) How data mining is different from knowledge discovery in databases (KDD)? Explain. IETE

More information

DATA MINING - 1DL105, 1DL025

DATA MINING - 1DL105, 1DL025 DATA MINING - 1DL105, 1DL025 Fall 2009 An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht09 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

A UPS Framework for Providing Privacy Protection in Personalized Web Search

A UPS Framework for Providing Privacy Protection in Personalized Web Search A UPS Framework for Providing Privacy Protection in Personalized Web Search V. Sai kumar 1, P.N.V.S. Pavan Kumar 2 PG Scholar, Dept. of CSE, G Pulla Reddy Engineering College, Kurnool, Andhra Pradesh,

More information

Appendix B Data Quality Dimensions

Appendix B Data Quality Dimensions Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 ISSN 2229-5518 1582

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 ISSN 2229-5518 1582 1582 AN EFFICIENT CRYPTOGRAPHIC APPROACH FOR PRESERVING PRIVACY IN DATA MINING T.Sujitha 1, V.Saravanakumar 2, C.Saravanabhavan 3 1. M.E. Student, Sujiraj.me@gmail.com 2. Assistant Professor, visaranams@yahoo.co.in

More information

NSF Workshop on Big Data Security and Privacy

NSF Workshop on Big Data Security and Privacy NSF Workshop on Big Data Security and Privacy Report Summary Bhavani Thuraisingham The University of Texas at Dallas (UTD) February 19, 2015 Acknowledgement NSF SaTC Program for support Chris Clifton and

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

Cloud Based Distributed Databases: The Future Ahead

Cloud Based Distributed Databases: The Future Ahead Cloud Based Distributed Databases: The Future Ahead Arpita Mathur Mridul Mathur Pallavi Upadhyay Abstract Fault tolerant systems are necessary to be there for distributed databases for data centers or

More information

A SECURE DECISION SUPPORT ESTIMATION USING GAUSSIAN BAYES CLASSIFICATION IN HEALTH CARE SERVICES

A SECURE DECISION SUPPORT ESTIMATION USING GAUSSIAN BAYES CLASSIFICATION IN HEALTH CARE SERVICES A SECURE DECISION SUPPORT ESTIMATION USING GAUSSIAN BAYES CLASSIFICATION IN HEALTH CARE SERVICES K.M.Ruba Malini #1 and R.Lakshmi *2 # P.G.Scholar, Computer Science and Engineering, K. L. N College Of

More information

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6 International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: editor@ijermt.org November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering

More information

PRIVACY PRESERVING ASSOCIATION RULE MINING

PRIVACY PRESERVING ASSOCIATION RULE MINING Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,

More information

Privacy Preserving Outsourcing for Frequent Itemset Mining

Privacy Preserving Outsourcing for Frequent Itemset Mining Privacy Preserving Outsourcing for Frequent Itemset Mining M. Arunadevi 1, R. Anuradha 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College, Coimbatore, India 1 Assistant

More information

Obfuscation of sensitive data in network flows 1

Obfuscation of sensitive data in network flows 1 Obfuscation of sensitive data in network flows 1 D. Riboni 2, A. Villani 1, D. Vitali 1 C. Bettini 2, L.V. Mancini 1 1 Dipartimento di Informatica,Universitá di Roma, Sapienza. E-mail: {villani, vitali,

More information

INTEROPERABILITY IN DATA WAREHOUSES

INTEROPERABILITY IN DATA WAREHOUSES INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content

More information

Privacy Preserving Mining of Transaction Databases Sunil R 1 Dr. N P Kavya 2

Privacy Preserving Mining of Transaction Databases Sunil R 1 Dr. N P Kavya 2 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 04, 2014 ISSN (online): 2321-0613 Privacy Preserving Mining of Transaction Databases Sunil R 1 Dr. N P Kavya 2 1 M.Tech

More information

Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications

Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications Rouven Kreb 1 and Manuel Loesch 2 1 SAP AG, Walldorf, Germany 2 FZI Research Center for Information

More information

PBKM: A Secure Knowledge Management Framework

PBKM: A Secure Knowledge Management Framework PBKM: A Secure Knowledge Management Framework (extended abstract) Shouhuai Xu and Weining Zhang Department of Computer Science, University of Texas at San Antonio {shxu,wzhang}@cs.utsa.edu Abstract In

More information

How To Create A Multi-Keyword Ranked Search Over Encrypted Cloud Data (Mrse)

How To Create A Multi-Keyword Ranked Search Over Encrypted Cloud Data (Mrse) JJT-029-2015 SEARCHABLE SYMMETRIC ENCRYPTION METHOD FOR ENCRYPTED DATA IN CLOUD P.Vidyasagar, R.Karthikeyan, Dr.C.Nalini M.Tech Student, Dept of CSE,Bharath University, Email.Id: vsagarp@rediffmail.com

More information

Personalization of Web Search With Protected Privacy

Personalization of Web Search With Protected Privacy Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information

More information

Website Personalization using Data Mining and Active Database Techniques Richard S. Saxe

Website Personalization using Data Mining and Active Database Techniques Richard S. Saxe Website Personalization using Data Mining and Active Database Techniques Richard S. Saxe Abstract Effective website personalization is at the heart of many e-commerce applications. To ensure that customers

More information

Optimal Replacement of Underground Distribution Cables

Optimal Replacement of Underground Distribution Cables 1 Optimal Replacement of Underground Distribution Cables Jeremy A. Bloom, Member, IEEE, Charles Feinstein, and Peter Morris Abstract This paper presents a general decision model that enables utilities

More information

INTRUSION PREVENTION AND EXPERT SYSTEMS

INTRUSION PREVENTION AND EXPERT SYSTEMS INTRUSION PREVENTION AND EXPERT SYSTEMS By Avi Chesla avic@v-secure.com Introduction Over the past few years, the market has developed new expectations from the security industry, especially from the intrusion

More information

A Secure Decentralized Access Control Scheme for Data stored in Clouds

A Secure Decentralized Access Control Scheme for Data stored in Clouds A Secure Decentralized Access Control Scheme for Data stored in Clouds Priyanka Palekar 1, Abhijeet Bharate 2, Nisar Anjum 3 1 SKNSITS, University of Pune 2 SKNSITS, University of Pune 3 SKNSITS, University

More information

Big Data - Security and Privacy

Big Data - Security and Privacy Big Data - Security and Privacy Elisa Bertino CS Department, Cyber Center, and CERIAS Purdue University Cyber Center! Big Data EveryWhere! Lots of data is being collected, warehoused, and mined Web data,

More information

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,

More information

Associate Prof. Dr. Victor Onomza Waziri

Associate Prof. Dr. Victor Onomza Waziri BIG DATA ANALYTICS AND DATA SECURITY IN THE CLOUD VIA FULLY HOMOMORPHIC ENCRYPTION Associate Prof. Dr. Victor Onomza Waziri Department of Cyber Security Science, School of ICT, Federal University of Technology,

More information

Enhancing Data Security in Cloud Storage Auditing With Key Abstraction

Enhancing Data Security in Cloud Storage Auditing With Key Abstraction Enhancing Data Security in Cloud Storage Auditing With Key Abstraction 1 Priyadharshni.A, 2 Geo Jenefer.G 1 Master of engineering in computer science, Ponjesly College of Engineering 2 Assistant Professor,

More information

Healthcare, transportation,

Healthcare, transportation, Smart IT Argus456 Dreamstime.com From Data to Decisions: A Value Chain for Big Data H. Gilbert Miller and Peter Mork, Noblis Healthcare, transportation, finance, energy and resource conservation, environmental

More information

OLAP Services. MicroStrategy Products. MicroStrategy OLAP Services Delivers Economic Savings, Analytical Insight, and up to 50x Faster Performance

OLAP Services. MicroStrategy Products. MicroStrategy OLAP Services Delivers Economic Savings, Analytical Insight, and up to 50x Faster Performance OLAP Services MicroStrategy Products MicroStrategy OLAP Services Delivers Economic Savings, Analytical Insight, and up to 50x Faster Performance MicroStrategy OLAP Services brings In-memory Business Intelligence

More information

A Privacy-preserving Approach for Records Management in Cloud Computing. Eun Park and Benjamin Fung. School of Information Studies McGill University

A Privacy-preserving Approach for Records Management in Cloud Computing. Eun Park and Benjamin Fung. School of Information Studies McGill University A Privacy-preserving Approach for Records Management in Cloud Computing Eun Park and Benjamin Fung School of Information Studies McGill University Digital transformation Privacy Conflict? Health service

More information

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social

More information

Institute of Southern Punjab, Multan

Institute of Southern Punjab, Multan Institute of Southern Punjab, Multan Network Security Brief Introduction Lecture#1 Mazhar Hussain E-mail: mazhar.hussain@isp.edu.pk Blog https://mazharhussainatisp.wordpress.com/ Grading Policy Classification

More information

Cybersecurity Analytics for a Smarter Planet

Cybersecurity Analytics for a Smarter Planet IBM Institute for Advanced Security December 2010 White Paper Cybersecurity Analytics for a Smarter Planet Enabling complex analytics with ultra-low latencies on cybersecurity data in motion 2 Cybersecurity

More information

AN EFFICIENT STRATEGY OF AGGREGATE SECURE DATA TRANSMISSION

AN EFFICIENT STRATEGY OF AGGREGATE SECURE DATA TRANSMISSION INTERNATIONAL JOURNAL OF REVIEWS ON RECENT ELECTRONICS AND COMPUTER SCIENCE AN EFFICIENT STRATEGY OF AGGREGATE SECURE DATA TRANSMISSION K.Anusha 1, K.Sudha 2 1 M.Tech Student, Dept of CSE, Aurora's Technological

More information

DATA WAREHOUSE AND DATA MINING NECCESSITY OR USELESS INVESTMENT

DATA WAREHOUSE AND DATA MINING NECCESSITY OR USELESS INVESTMENT Scientific Bulletin Economic Sciences, Vol. 9 (15) - Information technology - DATA WAREHOUSE AND DATA MINING NECCESSITY OR USELESS INVESTMENT Associate Professor, Ph.D. Emil BURTESCU University of Pitesti,

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

LDA Based Security in Personalized Web Search

LDA Based Security in Personalized Web Search LDA Based Security in Personalized Web Search R. Dhivya 1 / PG Scholar, B. Vinodhini 2 /Assistant Professor, S. Karthik 3 /Prof & Dean Department of Computer Science & Engineering SNS College of Technology

More information

Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation

Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation Keng-Pei Lin Ming-Syan Chen Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan Research Center

More information

Experimental Analysis of Privacy-Preserving Statistics Computation

Experimental Analysis of Privacy-Preserving Statistics Computation Experimental Analysis of Privacy-Preserving Statistics Computation Hiranmayee Subramaniam 1, Rebecca N. Wright 2, and Zhiqiang Yang 2 1 Stevens Institute of Technology graduate, hiran@polypaths.com. 2

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume, Issue, March 201 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient Approach

More information

4-06-35. John R. Vacca INSIDE

4-06-35. John R. Vacca INSIDE 4-06-35 INFORMATION MANAGEMENT: STRATEGY, SYSTEMS, AND TECHNOLOGIES ONLINE DATA MINING John R. Vacca INSIDE Online Analytical Modeling (OLAM); OLAM Architecture and Features; Implementation Mechanisms;

More information

How To Write A Privacy Preserving Firewall Optimization Protocol

How To Write A Privacy Preserving Firewall Optimization Protocol Asia-pacific Journal of Multimedia Services Convergence with Art, Humanities and Sociology Vol.1, No.2 (2011), pp. 93-100 http://dx.doi.org/10.14257/ajmscahs.2011.12.06 Secure Multi-Party Computation in

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Data Discovery, Analytics, and the Enterprise Data Hub

Data Discovery, Analytics, and the Enterprise Data Hub Data Discovery, Analytics, and the Enterprise Data Hub Version: 101 Table of Contents Summary 3 Used Data and Limitations of Legacy Analytic Architecture 3 The Meaning of Data Discovery & Analytics 4 Machine

More information

IEEE IoT IoT Scenario & Use Cases: Social Sensors

IEEE IoT IoT Scenario & Use Cases: Social Sensors IEEE IoT IoT Scenario & Use Cases: Social Sensors Service Description More and more, people have the possibility to monitor important parameters in their home or in their surrounding environment. As an

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

More information

Boarding to Big data

Boarding to Big data Database Systems Journal vol. VI, no. 4/2015 11 Boarding to Big data Oana Claudia BRATOSIN University of Economic Studies, Bucharest, Romania oc.bratosin@gmail.com Today Big data is an emerging topic,

More information

College information system research based on data mining

College information system research based on data mining 2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei

More information

ETPL Extract, Transform, Predict and Load

ETPL Extract, Transform, Predict and Load ETPL Extract, Transform, Predict and Load An Oracle White Paper March 2006 ETPL Extract, Transform, Predict and Load. Executive summary... 2 Why Extract, transform, predict and load?... 4 Basic requirements

More information

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS Subbarao Jasti #1, Dr.D.Vasumathi *2 1 Student & Department of CS & JNTU, AP, India 2 Professor & Department

More information

List of Promising Concepts EA6: BIG DATA

List of Promising Concepts EA6: BIG DATA List of Promising Concepts EA6: BIG DATA Project acronym Project title Project number 611961 Starting date 01/10/2013 Duration in months 24 Call identifier FP7-ICT-2013-10 CAPITAL security research Agenda

More information

Taxonomy for Privacy Policies of Social Networks Sites

Taxonomy for Privacy Policies of Social Networks Sites Social Networking, 2013, 2, 157-164 http://dx.doi.org/10.4236/sn.2013.24015 Published Online October 2013 (http://www.scirp.org/journal/sn) Taxonomy for Privacy Policies of Social Networks Sites Sergio

More information

A Novel Technique of Privacy Protection. Mining of Association Rules from Outsourced. Transaction Databases

A Novel Technique of Privacy Protection. Mining of Association Rules from Outsourced. Transaction Databases A Novel Technique of Privacy Protection Mining of Association Rules from Outsource Transaction Databases 1 Dhananjay D. Wadkar, 2 Santosh N. Shelke 1 Computer Engineering, Sinhgad Academy of Engineering

More information

ISSN: 2348 9510. A Review: Image Retrieval Using Web Multimedia Mining

ISSN: 2348 9510. A Review: Image Retrieval Using Web Multimedia Mining A Review: Image Retrieval Using Web Multimedia Satish Bansal*, K K Yadav** *, **Assistant Professor Prestige Institute Of Management, Gwalior (MP), India Abstract Multimedia object include audio, video,

More information

Intrusion Detection System using Log Files and Reinforcement Learning

Intrusion Detection System using Log Files and Reinforcement Learning Intrusion Detection System using Log Files and Reinforcement Learning Bhagyashree Deokar, Ambarish Hazarnis Department of Computer Engineering K. J. Somaiya College of Engineering, Mumbai, India ABSTRACT

More information

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner 24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India

More information

A Brief Tutorial on Database Queries, Data Mining, and OLAP

A Brief Tutorial on Database Queries, Data Mining, and OLAP A Brief Tutorial on Database Queries, Data Mining, and OLAP Lutz Hamel Department of Computer Science and Statistics University of Rhode Island Tyler Hall Kingston, RI 02881 Tel: (401) 480-9499 Fax: (401)

More information

Formal Methods for Preserving Privacy for Big Data Extraction Software

Formal Methods for Preserving Privacy for Big Data Extraction Software Formal Methods for Preserving Privacy for Big Data Extraction Software M. Brian Blake and Iman Saleh Abstract University of Miami, Coral Gables, FL Given the inexpensive nature and increasing availability

More information

KEY FACTORS AND BARRIERS OF BUSINESS INTELLIGENCE IMPLEMENTATION

KEY FACTORS AND BARRIERS OF BUSINESS INTELLIGENCE IMPLEMENTATION KEY FACTORS AND BARRIERS OF BUSINESS INTELLIGENCE IMPLEMENTATION Peter Mesároš, Štefan Čarnický & Tomáš Mandičák The business environment is constantly changing and becoming more complex and difficult.

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

Data Deduplication Scheme for Cloud Storage

Data Deduplication Scheme for Cloud Storage 26 Data Deduplication Scheme for Cloud Storage 1 Iuon-Chang Lin and 2 Po-Ching Chien Abstract Nowadays, the utilization of storage capacity becomes an important issue in cloud storage. In this paper, we

More information

Privacy Protection in Personalized Web Search- A Survey

Privacy Protection in Personalized Web Search- A Survey Privacy Protection in Personalized Web Search- A Survey Greeshma A S. * Lekshmy P. L. M.Tech Student Assistant Professor Dept. of CSE & Kerala University Dept. of CSE & Kerala University Thiruvananthapuram

More information

Design and Implementation of Supermarket Management System Yongchang Rena, Mengyao Chenb

Design and Implementation of Supermarket Management System Yongchang Rena, Mengyao Chenb 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2015) Design and Implementation of Supermarket Management System Yongchang Rena, Mengyao Chenb College

More information

Secure Computation Martin Beck

Secure Computation Martin Beck Institute of Systems Architecture, Chair of Privacy and Data Security Secure Computation Martin Beck Dresden, 05.02.2015 Index Homomorphic Encryption The Cloud problem (overview & example) System properties

More information

A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS

A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS Nitin Trivedi, Research Scholar, Manav Bharti University, Solan HP ABSTRACT The purpose of this study is not to delve deeply into the technical

More information

NON-PROBABILITY SAMPLING TECHNIQUES

NON-PROBABILITY SAMPLING TECHNIQUES NON-PROBABILITY SAMPLING TECHNIQUES PRESENTED BY Name: WINNIE MUGERA Reg No: L50/62004/2013 RESEARCH METHODS LDP 603 UNIVERSITY OF NAIROBI Date: APRIL 2013 SAMPLING Sampling is the use of a subset of the

More information

Business Intelligence meets Big Data: An Overview on Security and Privacy

Business Intelligence meets Big Data: An Overview on Security and Privacy Business Intelligence meets Big Data: An Overview on Security and Privacy Claudio A. Ardagna Ernesto Damiani Dipartimento di Informatica - Università degli Studi di Milano NSF Workshop on Big Data Security

More information

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer. REVIEW ARTICLE ISSN: 2321-7758 UPS EFFICIENT SEARCH ENGINE BASED ON WEB-SNIPPET HIERARCHICAL CLUSTERING MS.MANISHA DESHMUKH, PROF. UMESH KULKARNI Department of Computer Engineering, ARMIET, Department

More information

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013 ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION, Fuel Consulting, LLC May 2013 DATA AND ANALYSIS INTERACTION Understanding the content, accuracy, source, and completeness of data is critical to the

More information

ACL Based Dynamic Network Reachability in Cross Domain

ACL Based Dynamic Network Reachability in Cross Domain South Asian Journal of Engineering and Technology Vol.2, No.15 (2016) 68 72 ISSN No: 2454-9614 ACL Based Dynamic Network Reachability in Cross Domain P. Nandhini a, K. Sankar a* a) Department Of Computer

More information

User research for information architecture projects

User research for information architecture projects Donna Maurer Maadmob Interaction Design http://maadmob.com.au/ Unpublished article User research provides a vital input to information architecture projects. It helps us to understand what information

More information

Data Mining and Sensitive Inferences

Data Mining and Sensitive Inferences Template-Based Privacy Preservation in Classification Problems Ke Wang Simon Fraser University BC, Canada V5A S6 wangk@cs.sfu.ca Benjamin C. M. Fung Simon Fraser University BC, Canada V5A S6 bfung@cs.sfu.ca

More information

Marketing Science Institute 2014-2016 Research Priorities

Marketing Science Institute 2014-2016 Research Priorities Marketing Science Institute 2014-2016 Research Priorities Source: www.msi.org Every two years, the Marketing Science Institute asks member companies to help select the priorities that will drive research

More information

Task Scheduling in Hadoop

Task Scheduling in Hadoop Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed

More information

Operations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology Madras

Operations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology Madras Operations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology Madras Lecture - 41 Value of Information In this lecture, we look at the Value

More information

EFFICIENT AND SECURE DATA PRESERVING IN CLOUD USING ENHANCED SECURITY

EFFICIENT AND SECURE DATA PRESERVING IN CLOUD USING ENHANCED SECURITY EFFICIENT AND SECURE DATA PRESERVING IN CLOUD USING ENHANCED SECURITY Siliveru Ashok kumar* S.G. Nawaz ## and M.Harathi # * Student of M.Tech, Sri Krishna Devaraya Engineering College, Gooty # Department

More information

Application of Data Mining Techniques in Intrusion Detection

Application of Data Mining Techniques in Intrusion Detection Application of Data Mining Techniques in Intrusion Detection LI Min An Yang Institute of Technology leiminxuan@sohu.com Abstract: The article introduced the importance of intrusion detection, as well as

More information

Data W a Ware r house house and and OLAP II Week 6 1

Data W a Ware r house house and and OLAP II Week 6 1 Data Warehouse and OLAP II Week 6 1 Team Homework Assignment #8 Using a data warehousing tool and a data set, play four OLAP operations (Roll up (drill up), Drill down (roll down), Slice and dice, Pivot

More information

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects Abstract: Build a model to investigate system and discovering relations that connect variables in a database

More information

Assumption Busters Workshop - Cloud Computing

Assumption Busters Workshop - Cloud Computing Assumption Busters Workshop - Cloud Computing Background: In 2011, the U.S. Federal Cyber Research Community conducted a series of four workshops designed to examine key assumptions that underlie current

More information