Ethical Issues in Data Mining Mandana Mir Moftakhari PhD Student at Hacettepe University, Department of Information Management. Email: mir_moftakhari@yahoo.com Güleda Doğan PhD Student & Research Assistant of Hacettepe University Department of Information Management. Email: gduzyol@hacettepe.edu.tr
We will discuss about: Big Data Knowledge Discovery Data mining Ethical issues in Data mining
Big Data! Data overload is a serious problem that has been grown by technical advances. Human beings have to cope with such overwhelming amounts of data and manage it in order to obtain relevant information and knowledge to solve their problems.
Big Data! Organizations have to overcome with massive data volume to achieve opportunities for: better decision-making gaining competitive advantages
knowledge Discovery in Databases as a Solution knowledge discovery in databases (KDD) is the nontrivial process of identifying valid novel potentially useful and ultimately understandable patterns in data (Fayyad, Piatetsky-Shapiro, Smyth and Uthurusamy,1996)
KDD Involves Different Steps Selection Preprocessing Transformation Data mining Interpretation or evaluation (Fayyad et al., 1996)
What Is Data Mining? Data mining as the center process of KDD is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. (Hand, Mannila and Smyth, 2001)
Data mining Using Areas Customer service support Prediction Estimation Forecasting Decision support Manage budget
Data Mining Process Identifying the aim areas Determining sources of data Gathering and cleaning the data into a data warehouse Choosing proper analyzing tools Finding new patterns Prepare reports and implementing the results.
Ethical issues in Data mining Individuals not only expect qualified services, but also they require high level privacy and security of their personal details. These issues cannot be overlooked because of their consequences and effects on consumers, individuals and society.
Ethical issues Privacy Data accuracy Database security Consent
Privacy Threats Many consumers feel that their privacy is violated by information-gathering practices. Secondary use of the personal information Handling misinformation Granulated access to personal information
Secondary Use of the Personal Information Recent surveys on privacy show a great concern about the use of personal data for purposes other than the one for which data has been collected.
Handling Misinformation Misinformation can cause serious and long-term damage, so individuals should be able challenge the correctness of data about themselves.
Granulated Access to Personal Information The access to personal data should be on a need-to-know basis, and limited to relevant information only.
Type of data Some types of personal information are seen as being more sensitive than others. What complicates this issue is that sensitivity level varies according to the individual.
Database security Database security inhibits the unauthorized dissemination of personal data.
Data accuracy Collected data have originated from many diverse, possibly external, sources. Might be noisy, obsolete, inaccurate, or incomplete Not enough new Different from the present situation of individuals
Consent The purpose of data mining is to discover new insights and new uses for the information that companies already have. This makes it nearly impossible to allow the consumer to have the right of giving informed consent for each use of his data.
Conclusion and Recommendations Data mining is the process of searching in order to discover relationships between data sets and find useful information. Ethical issues should be observed in all steps of the process.
Conclusion and Recommendations Consider the expectations of the customers Develop a customer-oriented privacy policy Research and understand all laws that may have jurisdiction over sensitive data Control access to data warehouses Give customers more control over their data Evaluate the quality of source data
References American Library Association. (1995). Code of ethics of the American Library Association. Bhambri, V. &Gagandeep, (2012).Coexistence of data mining and privacy of data.international Journal of Research in IT & Management,2(2). Brankovic, L., &Estivill-Castro, V. (1999, July). Privacy issues in knowledge discovery and data mining. In Australian institute of computer ethics conference. 89-99. Cary, C., Wen, H.J. &Mahatanankoon, P. (2003).Data mining: consumer privacy, ethical policy, and systems development practices. Human Systems Management, 22(4), 157 168. Cavoukian, A. (1998). Data mining: staking a claim on your privacy. Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P. &Uthurusamy, R.. (1996). Advances in knowledge discovery and data mining. Cambridge, Menlo Park, Calif.: AAAI Press.Retrieved on May,10, 2014, from http://www.amazon.ca/gp/product/0262560976 Fayyad, U. M., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases.ai Magazine, 17, 37 54. Nicholson, S. (2003). The bibliomining process: Data warehousing and data mining for library decision-making. Information Technology and Libraries, 22(4), 146-151. Nicholson, S. & Stanton, J. (2003). Gaining strategic advantage through bibliomining: Data mining for management decisions in corporate, special, digital, and traditional libraries. In Nemati, H. &Barko, C. (Eds.).Organizational data mining: Leveraging enterprise data resources for optimal performance. Hershey, PA: Idea Group Publishing. 247-262. Payne, D. &. Trumbach, C. C. (2009). Data mining: proprietary rights, people and proposals, Business Ethics Quarterly, vol. 18(3).