Survey on Data Cleaning Prerna S.Kulkarni, Dr. J.W.Bakal
|
|
|
- Darrell Craig
- 9 years ago
- Views:
Transcription
1 Survey on Data Cleaning Prerna S.Kulkarni, Dr. J.W.Bakal Abstract DATA warehouse of an enterprise consolidates the data from multiple sources of the organization/enterprise in order to support enterprise wide decision making, reporting, analyzing and planning. The processes performed on data warehouse are highly sensitive to maintain the quality of data. They depend on the accuracy and consistency of data. Degraded quality of data leads to wrong conclusions of these processes which ultimately lead to wastage of all kinds of resources and assets. Data received at the data warehouse from external sources usually contains errors, e.g. spelling mistakes, inconsistent conventions across data sources, and/or missing fields. Significant amounts of time and money are consequently spent on data cleaning, the task of detecting and correcting errors in data. We aim to increase the awareness by providing a summary of the impacts of poor data quality on a typical enterprise. These impacts include customer disappointment, increased working cost, less effective decision-making, and a reduced ability to make and execute the plan. More subtly perhaps, poor data quality hurts employee morale, breeds organizational mistrust, and makes it more difficult to align the enterprise. Index Terms Data, Homonyms, Synonyms, Quality, Data Warehouse. I. INTRODUCTION Recent literature proposes several state-of-the-art solutions for matching and merging data sources. Relevant work and approaches exist in the field of data integration [3], [4], [7], [8], data deduplication [5]. As it is known, creating awareness of a problem and its impact is a critical first step toward resolution of the problem [9]. The needed awareness of poor data quality, while growing, has not yet been achieved in many enterprises. After all, the typical executive is already besieged by too many problems, low customer satisfaction, high costs, a data warehouse project that is late, and so forth. This article aims to increase awareness by providing a summary of the impacts of poor data quality on a typical enterprise. These impacts include customer dissatisfaction, increased operational cost, less effective decision-making, and a reduced ability to make and execute strategy. More subtly perhaps, poor data quality hurts employee morale, breeds organizational mistrust, and makes it more difficult to align the enterprise. Poor data quality and its underlying causes are potent contributors to an information ecology [10] inappropriate for the Information Age. Further, leading enterprises have demonstrated that data quality can be dramatically improved and the impacts mitigated. One study estimates this combined cost due to bad data to be over US$ 30 billion in year 2006 alone [17]. As business operations rely more and more on computerized systems, this cost is bound to increase at an alarming rate. Companies use up millions of dollars per year to detect the errors in the data. Data quality refers that data is exactly fit for the purpose of business use; that it is consistent, accurate, complete and uniform. Cleaning of data refers to an activity which determines and detects the unwanted, corrupt, inconsistent and faulty data to enhance the quality of data [15]. II. DATA WAREHOUSE The term Data Warehouse (DW) was coined by Bill Inmon in 1990, which he defined in the following way: "A warehouse is a subject oriented, integrated, time-variant and nonvolatile collection of data in support of management s decision making process. Ralph Kimball provided a much simpler definition of a data warehouse as "a copy of transaction data specifically structured for query and analysis" [23]. Fig.1 shows the data warehouse architecture. A. Data Acquisition Data extraction is one of the most time-consuming tasks of Data Warehouse development. Data consolidated from heterogeneous systems may have problems, and may need to be first transformed and cleaned before loaded into the Data Warehouse. Data gathered from operational systems may be incorrect, inconsistent, unreadable or incomplete. Data cleaning is an essential task in data warehousing process in order to get correct and qualitative data into the Data Warehouse. This process contains basically the following tasks [24]: 615
2 Converting data from heterogeneous data sources with various external representations into a common structure suitable for the Data Warehouse. Identifying and eliminating redundant or irrelevant data. Transforming data to correct values (e.g., by looking up parameter usage and consolidating these values into a common format). Reconciling differences between multiple sources, due to the use of homonyms (same name for different things), synonyms (different names for same things) or different units of measurement Fig. 1 Data warehouse [Source 26] B. Extraction, Cleansing, and Transformation Tools The tasks of capturing data from a source system, cleaning, and transforming the data and loading the consolidated data into a target system can be done either by separate products or by a single integrated solution. III. DATA CLEANING FRAMEWORKS Madnick et al. [19] analyze the TDQM framework, which advocates continuous data quality improvement by following the cycles of Define, Measure, Analyze, and Improve. Subsequent research developed theories, methods, and techniques for the four cycles of the TDQM framework. The strong points are: The TDQM is very easy to implement and manageable in enterprise environment for data cleansing. To define data quality from user perspective is also important that helps user to clean data that is fit for use in business. The short comings are: The proposed framework mainly focused for characterizing data quality research along the dimensions of topic and method rather than specifically on data cleansing framework. Hao et al. [20] analyzes the framework as based on rules-base, rule scheduling and log management. Data cleaning process is divided into four parts: data access interface, data quality analysis, data transformation and results assessment. The strong points are: The framework design is unified as all data cleaning process performed at single place. The data access interface provides unified data extraction interface for single source and multi-source data. The process log management records the operation information of whole data cleaning process. The short comings are: The data quality analysis should be done only once and should not be a repetition work. The process should be sequential and in one go not iterative. 616
3 As per Jebamalar et al. [21] the main objective of data cleaning is to reduce the time and complexity of the mining process and increase the quality of datum in the data warehouse. This new framework consists of six elements: Selection of attributes, Formation of tokens, Selection of clustering algorithm, Similarity computation for selected attributes, Selection of elimination function and Merge. As per Arora et al. [15] the algorithm proposed in the paper deals with the error of duplicity of data of string type in data warehouse in different data marts. The proposed algorithm was deduplication in the name field of a data warehouse. The strong points are: The proposed alliance rules for data cleansing can be easily changeable. The short comings are: This framework is only limited to duplicate records elimination of name field. The manual intervention in this alliance rules for data cleaning is almost zero that makes this algorithm very less interactive from user point of view. According to Yu et al. [22] the framework consists of access to database objects for user model, definition of user model and definition of quality model based on user model. The user model is a data model which is abstracted from the real model in perspective of the user. The strong points are: The proposed framework is simple and interactive as three steps process. The short comings are: This framework requires user model to work for cleansing that is basically lengthy process. This framework does not allow selection of attributes that actually makes processing very lengthy and wastage of time and resources. IV. DATA QUALITY Data quality is the degree to which data meet the specific needs of specific customers, which contains several dimensions. Poor data quality costs businesses vast amounts of money every year. Defective data lead to breakdowns in the supply chain, poor business decisions, and inferior customer relationship management. Data are the core business asset that needs to be managed if an organization is to generate a return from it. The following are characteristics and measures of data quality [23]: Definition Conformance: The chosen object is of most important and its definition should have complete details and meaning of the real world object. Completeness (of values): It is the characteristic of having all required values for the data fields. Validity (Business rule conformance): It is a measure of degree of conformance of data values to its domain and business rules. This includes Domain values, Ranges, reasonability tests, Primary key uniqueness, Referential Integrity. Accuracy (to the Source): It is a measure of the degree to which data agrees with data contained in an original source. Precision: The domain value which specifies business should have correct precisions as per specifications. Non-duplication (of occurrences): It is the degree to which there is a one-to-one correlation between records and the real world object or events being represented. Derivation Integrity: It is the correctness with which two or more pieces of data are combined to create new data. Accessibility: Is the characteristic of being able to access data on demand. Timeliness: It is the relative availability of data to support a given process within the timetable required to perform the process. V. DATA CLEANING Data cleaning is the process used to determine inaccurate, incomplete, or unreasonable data and then improving the quality through correction of detected errors and omissions. This process may include format checks, completeness checks, reasonableness checks, limit checks, review of the data to identify outliers (geographic, statistical, temporal or environmental) or other errors [25]. A. Data Cleaning Problems There exist severe data quality problems that can be solved by data cleaning and transformation. These problems are closely related and should thus be treated in a uniform way. Data transformations are needed to support any changes in the structure, representation or content of data. These transformations become necessary in many 617
4 situations, e.g., to deal with schema evolution migrating a legacy system to a new information system, or when multiple data sources are to be integrated. B. Sources of Error in Data Data errors can creep in at every step of the process from initial data acquisition to archival storage. An understanding of the sources of data errors can be useful in designing data collection, and in developing appropriate post-hoc data cleaning techniques to detect and ameliorate errors. Many of the sources of error in data fall into one or more of the following categories [27]: Data entry errors: It remains common in many environments for data entry to be done by humans, who typically extract information from speech or copying data from notebook, document or other written or printed sources. In these environments, data errors are often happened at entry time by typographic errors or misunderstanding of the data source. Measurement errors: In many cases data is intended to measure some physical process in the world: the depth of a bore well, the density of the mud, the value of the pressure, etc. In some cases these measurements are undertaken by human processes that can have errors in their design (e.g., improper surveys or sampling strategies) and execution (e.g., misuse of instruments). In the measurement of physical properties, the increasing proliferation of sensor technology has led to large volumes of data that is never manipulated via human intervention. While this avoids various human errors in data acquisition and entry, data errors are still quite common: the human design of a sensor deployment (e.g., selection and placement of sensors) often affects data quality, and many sensors are always worked in complicated environment, they were influenced by noise signals. Distillation errors: In many conditions, raw data are preprocessed before they entered into the database. This data distillation is done for a variety of reasons: to reduce the complexity or noise in the raw data. All these processes have the potential to produce errors in the distilled data, or in the way that the distillation technique interacts with the final analysis. Data integration errors: In almost all conditions, a database contains information collected from multiple sources via many ways over time. Any procedure that integrates data from multiple sources may lead to errors. This is because the sources often contain redundant data in different representations. In order to provide access to accurate and consistent data, consolidation of different data representations and elimination of duplicate information become necessary. VI. DATA QUALITY ISSUE AND DATA ACCURACY Over the last several years, more and more references to poor data quality and its impact have appeared in the news media, general-readership publications, and technical literature [7, 11]. An enterprise may have a wide array of data quality problems. One way to categorize these issues is as follows [7]: Issues associated with data views (the models of the real world captured in the data), such as relevancy, granularity, and level of detail. Issues associated with data values, such as accuracy, consistency, currency, and completeness. Issues associated with the presentation of data, such as the appropriateness of the format, ease of interpretation, and so forth. Other issues such as privacy, security, and ownership. The science of data quality has not yet advanced to the point where there are standard measurement methods for any of these issues, and few enterprises routinely measure data quality. But many case studies feature accuracy measures. Measured at the field level, error rates range wildly, with reported error rates of %. Naturally there are difficulties in comparing these error rates. For the purposes of this article, the following statements may be useful: Unless an enterprise has made extraordinary efforts, it should expect data (field) error rates of approximately 1 5%. (Error rate = number of erred fields/number of total fields.) That which doesn t get measured, doesn t get managed, so the enterprise should expect that it has other serious data quality problems as well. Specifically, enterprises have redundant and inconsistent databases and they do not have the data they really need. 618
5 VII. CONCLUSION Poor data quality impacts the typical enterprise in many ways. At the operational level, poor data leads directly to customer dissatisfaction, increased cost, and lowered employee job satisfaction. It also increases operational cost because time and other resources are spent detecting and correcting errors. With the wrong data the organizations morale can be lost. ACKNOWLEDGMENT We thank to all the authors for the information provided. REFERENCES [1] T. Redman, The impact of poor data quality of typical enterprise, Communications of ACM, vol. 41, no. 3, pp.79-82, [2] Rajiv Arora, Payal Pahwa and Shubha Bansal, Alliance Rules for Data Warehouse Cleansing, 2009.IEEE Press, Pages [3] M. Hernandez and S. Stolfo, The merge/purge problem for large databases, Proceedings of the ACM SIGMOD International Conference on Management of Data, pp , [4] M. Lenzerini, Data integration: A theoretical perspective, in Proceedings of the ACM SIGMOD Symposium on Principles of Database Systems, 2002, pp [5] R. Ananthakrishna, S. Chaudhuri, and V. Ganti, Eliminating fuzzy duplicates in data warehouses, in Proceedings of the International Conference on Very Large Data Bases, 2002, pp [6] J.Jebamalar Tamilselvi, Dr.V.Saravanan, Handling Noisy Data using Attribute Selection and Smart Tokens, 2008.IEEE Press, Pages [7] I. Bhattacharya and L. Getoor, Iterative record linkage for cleaning and integration, in Proceedings of the ACM SIGKDD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2004, pp [8] W. W. Cohen, Data integration using similarity joins and a word-based information representation language, ACM Transactions on Information Systems, vol. 18, no. 3, pp , [9] Kotter, J.P. Leading Change. Harvard Business School Press, Boston, MA, [10] Davenport, T.H. Information Ecology. Oxford University Press, New York, [11] Redman, T.C. Data Quality for the Information Age. Artech House, Boston, MA, [12] E.Rahm, H.H.Do, "Data cleaning: problems and current approaches." IEEE Data Engineering Bulletin, 23(4), 2000, pp [13] J. Hipp, and U. Grimmer, "Data Quality Mining - Making a Virtue of Necessity" In Proc of the 6th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2001), Santa Barbara, California, USA, pp [14] P.Vassiliadis, Z.Vagena, and S.Skiadopoulos "ARKTOS: A Tool for Data Cleaning and Transformation in Data Warehouse Environments", Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 28, no. 4, pp , December [15] R. Arora, P. Pahwa, S. Bansal, "Alliance Rules of Data Warehouse Cleansing", IEEE, International Conference on Signal Processing Systems, Singapore, May 2009, Page(s): [16] S. Chaudhuri, K. Ganjam, V. Ganti, Data Cleaning in Microsoft SQL Server 2005, In Proceedings of the ACM SIGMOD Conference, Baltimore, MD, [17] Sang-goo Lee, Seoul Nat, Univ Seoul, Challenges and Opportunities in Information Quality, E-Commerce Technology and the 4th IEEE International Conference on Enterprise Computing, E-Commerce, and E-Services, 2007, Tokyo, Jul 2007, Page(s): [18] Deaton, Thao Doan, T. Schweiger, Semantic Data Matching Principles and Performance, Data Engineering - International Series in Operations Research & Management Science, Springer US, vol. 132, pp , [19] S.E. Madnick, Y.W. Lee, R.Y. Wang, and H. Zhu, Overview and framework for data and information quality research, ACM, Journal of Data and Information Quality, Vol. 1, No. 1, Article 2, June
6 [20] Hao Yan, Xing-chun Diao, Kai-qi Li, Research on Information Quality Driven Data Cleaning Framework, IEEE, FITME '08. International Seminar - Future Information Technology and Management Engineering, China, Nov 2008, Page(s): [21] J. Jebamalar Tamilselvi and Dr. V. Saravanan, A Unified Framework and Sequential Data Cleaning Approach for a Data Warehouse, ACM, IJCSNS International Journal of Computer Science and Network Security, Vol.8 No.5, May 2008, Page(s): [22] Yu Huang, Xiao-yi Zhang, Zhen Yuan, Guo-quan Jiang, A universal data cleaning framework based on user model, IEEE, ISECS International - Computing, Communication, Control, and Management, Sanya, China, Aug 2009, Page(s): [23] T. Manjunath, S. Ravindra, and G. Ravikumar, Analysis of data quality aspects in data warehouse systems, International Journal of Computer Science and Information Technologies, Vol. 2, No. 1, 2010, pp [24] B. Pinar, A Comparison of Data Warehouse Design Models, Master Thesis, Atilim University, Jan [25] I. Ahmed and A. Aziz, Dynamic approach for data scrubbing process, (IJCSE) International Journal on Computer Science and Engineering, ISSN: , [26] E. Rahm and H. Do, Data cleaning: Problems and current approaches, IEEE Bulletin of the Technical -Committee on Data Engineering, Vol. 23, No. 4, December [27] J. M. Hellerstein, Quantitative Data Cleaning for Large Databases, United Nations Economic Commission for Europe, [28] Manning CD, Schutze H (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA. 620
CHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1. Introduction 1.1 Data Warehouse In the 1990's as organizations of scale began to need more timely data for their business, they found that traditional information systems technology
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
Removing Fully and Partially Duplicated Records through K-Means Clustering
IACSIT International Journal of Engineering and Technology, Vol. 4, No. 6, December 2012 Removing Fully and Partially Duplicated Records through K-Means Clustering Bilal Khan, Azhar Rauf, Huma Javed, Shah
A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems
A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems Agusthiyar.R, 1, Dr. K. Narashiman 2 Assistant Professor (Sr.G), Department of Computer Applications,
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 [email protected]
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,
Optimization of ETL Work Flow in Data Warehouse
Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. [email protected] P Srinivasu
Hybrid Technique for Data Cleaning
Hybrid Technique for Data Cleaning Ashwini M. Save P.G. Student, Department of Computer Engineering, Thadomal Shahani Engineering College, Bandra, Mumbai, India Seema Kolkur Assistant Professor, Department
Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya
Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data
Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
Lection 3-4 WAREHOUSING
Lection 3-4 DATA WAREHOUSING Learning Objectives Understand d the basic definitions iti and concepts of data warehouses Understand data warehousing architectures Describe the processes used in developing
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Chapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of
An Introduction to Data Warehousing An organization manages information in two dominant forms: operational systems of record and data warehouses. Operational systems are designed to support online transaction
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management
Data Warehousing and Data Mining
Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong [email protected] Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge
DATA WAREHOUSE DESIGN AND IMPLEMENTATION BASED
DATA WAREHOUSE DESIGN AND IMPLEMENTATION BASED ON QUALITY REQUIREMENTS Khalid Ibrahim Mohammed Department of Computer Science, College of Computer, University of Anbar, Iraq. ABSTRACT The data warehouses
A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment
DOI: 10.15415/jotitt.2014.22021 A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment Rupali Gill 1, Jaiteg Singh 2 1 Assistant Professor, School of Computer Sciences, 2 Associate
Enhancing Quality of Data using Data Mining Method
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
A Framework for Data Migration between Various Types of Relational Database Management Systems
A Framework for Data Migration between Various Types of Relational Database Management Systems Ahlam Mohammad Al Balushi Sultanate of Oman, International Maritime College Oman ABSTRACT Data Migration is
Enterprise Data Quality
Enterprise Data Quality An Approach to Improve the Trust Factor of Operational Data Sivaprakasam S.R. Given the poor quality of data, Communication Service Providers (CSPs) face challenges of order fallout,
Course 103402 MIS. Foundations of Business Intelligence
Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second
Data Quality Assessment
Data Quality Assessment Leo L. Pipino, Yang W. Lee, and Richard Y. Wang How good is a company s data quality? Answering this question requires usable data quality metrics. Currently, most data quality
Data Profiling and Mapping The Essential First Step in Data Migration and Integration Projects
Data Profiling and Mapping The Essential First Step in Data Migration and Integration Projects An Evoke Software White Paper Summary At any given time, according to industry analyst estimates, roughly
Foundations of Business Intelligence: Databases and Information Management
Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,
www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28
Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT
A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan
, pp.217-222 http://dx.doi.org/10.14257/ijbsbt.2015.7.3.23 A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan Muhammad Arif 1,2, Asad Khatak
Data warehouse Architectures and processes
Database and data mining group, Data warehouse Architectures and processes DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 1 Database and data mining group, Data warehouse architectures Separation between
A Survey on Data Warehouse Architecture
A Survey on Data Warehouse Architecture Rajiv Senapati 1, D.Anil Kumar 2 1 Assistant Professor, Department of IT, G.I.E.T, Gunupur, India 2 Associate Professor, Department of CSE, G.I.E.T, Gunupur, India
Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.
Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc. Introduction Abstract warehousing has been around for over a decade. Therefore, when you read the articles
Turkish Journal of Engineering, Science and Technology
Turkish Journal of Engineering, Science and Technology 03 (2014) 106-110 Turkish Journal of Engineering, Science and Technology journal homepage: www.tujest.com Integrating Data Warehouse with OLAP Server
Data Warehousing and OLAP Technology for Knowledge Discovery
542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories
DATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
Data Warehouse: Introduction
Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
Dynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives
Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Describe how the problems of managing data resources in a traditional file environment are solved
Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract
224 Business Intelligence Journal July DATA WAREHOUSING Ofori Boateng, PhD Professor, University of Northern Virginia BMGT531 1900- SU 2011 Business Intelligence Project Jagir Singh, Greeshma, P Singh
Report on the Dagstuhl Seminar Data Quality on the Web
Report on the Dagstuhl Seminar Data Quality on the Web Michael Gertz M. Tamer Özsu Gunter Saake Kai-Uwe Sattler U of California at Davis, U.S.A. U of Waterloo, Canada U of Magdeburg, Germany TU Ilmenau,
Application of Data Mining Techniques in Intrusion Detection
Application of Data Mining Techniques in Intrusion Detection LI Min An Yang Institute of Technology [email protected] Abstract: The article introduced the importance of intrusion detection, as well as
Data Warehousing Systems: Foundations and Architectures
Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
A Design and implementation of a data warehouse for research administration universities
A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon
10426: Large Scale Project Accounting Data Migration in E-Business Suite
10426: Large Scale Project Accounting Data Migration in E-Business Suite Objective of this Paper Large engineering, procurement and construction firms leveraging Oracle Project Accounting cannot withstand
DATA QUALITY IN BUSINESS INTELLIGENCE APPLICATIONS
DATA QUALITY IN BUSINESS INTELLIGENCE APPLICATIONS Gorgan Vasile Academy of Economic Studies Bucharest, Faculty of Accounting and Management Information Systems, Academia de Studii Economice, Catedra de
Data Quality Assessment. Approach
Approach Prepared By: Sanjay Seth Data Quality Assessment Approach-Review.doc Page 1 of 15 Introduction Data quality is crucial to the success of Business Intelligence initiatives. Unless data in source
Proper study of Data Warehousing and Data Mining Intelligence Application in Education Domain
Journal of The International Association of Advanced Technology and Science Proper study of Data Warehousing and Data Mining Intelligence Application in Education Domain AMAN KADYAAN JITIN Abstract Data-driven
Dimensional Modeling for Data Warehouse
Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or
Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:
Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
Study of Data Cleaning & Comparison of Data Cleaning Tools
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
MASTER DATA MANAGEMENT TEST ENABLER
MASTER DATA MANAGEMENT TEST ENABLER Sagar Porov 1, Arupratan Santra 2, Sundaresvaran J 3 Infosys, (India) ABSTRACT All Organization needs to handle important data (customer, employee, product, stores,
Master Data Management and Data Warehousing. Zahra Mansoori
Master Data Management and Data Warehousing Zahra Mansoori 1 1. Preference 2 IT landscape growth IT landscapes have grown into complex arrays of different systems, applications, and technologies over the
DATA GOVERNANCE AND DATA QUALITY
DATA GOVERNANCE AND DATA QUALITY Kevin Lewis Partner Enterprise Management COE Barb Swartz Account Manager Teradata Government Systems Objectives of the Presentation Show that Governance and Quality are
CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved
CHAPTER SIX DATA Business Intelligence 2011 The McGraw-Hill Companies, All Rights Reserved 2 CHAPTER OVERVIEW SECTION 6.1 Data, Information, Databases The Business Benefits of High-Quality Information
Chapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
A Data Warehouse Design for A Typical University Information System
(JCSCR) - ISSN 2227-328X A Data Warehouse Design for A Typical University Information System Youssef Bassil LACSC Lebanese Association for Computational Sciences Registered under No. 957, 2011, Beirut,
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Wienand Omta Fabiano Dalpiaz 1 drs. ing. Wienand Omta Learning Objectives Describe how the problems of managing data resources
DATA QUALITY PROBLEMS IN DATA WAREHOUSING
DATA QUALITY PROBLEMS IN DATA WAREHOUSING Diksha Verma, Anjli Tyagi, Deepak Sharma Department of Information Technology, Dronacharya college of Engineering Abstract- Data quality is critical to data warehouse
Application of Clustering and Association Methods in Data Cleaning
Proceedings of the International Multiconference on ISBN 978-83-60810-14-9 Computer Science and Information Technology, pp. 97 103 ISSN 1896-7094 Application of Clustering and Association Methods in Data
Agile Business Intelligence Data Lake Architecture
Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
Framework for Data warehouse architectural components
Framework for Data warehouse architectural components Author: Jim Wendt Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 04/08/11 Email: [email protected] Abstract:
THE QUALITY OF DATA AND METADATA IN A DATAWAREHOUSE
THE QUALITY OF DATA AND METADATA IN A DATAWAREHOUSE Carmen Răduţ 1 Summary: Data quality is an important concept for the economic applications used in the process of analysis. Databases were revolutionized
An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies
An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies Ashish Gahlot, Manoj Yadav Dronacharya college of engineering Farrukhnagar, Gurgaon,Haryana Abstract- Data warehousing, Data Mining,
A technical paper for Microsoft Dynamics AX users
s c i t y l a n a g n i Implement. d e d e e N is h c a o r Why a New app A technical paper for Microsoft Dynamics AX users ABOUT THIS WHITEPAPER 03 06 A TRADITIONAL APPROACH TO BI A NEW APPROACH This
Appendix B Data Quality Dimensions
Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational
Customer Analysis - Customer analysis is done by analyzing the customer's buying preferences, buying time, budget cycles, etc.
Data Warehouses Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical
Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management 6.1 2010 by Prentice Hall LEARNING OBJECTIVES Describe how the problems of managing data resources in a traditional
Deriving Business Intelligence from Unstructured Data
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 9 (2013), pp. 971-976 International Research Publications House http://www. irphouse.com /ijict.htm Deriving
Record Deduplication By Evolutionary Means
Record Deduplication By Evolutionary Means Marco Modesto, Moisés G. de Carvalho, Walter dos Santos Departamento de Ciência da Computação Universidade Federal de Minas Gerais 31270-901 - Belo Horizonte
Facilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat [email protected] Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
Speeding ETL Processing in Data Warehouses White Paper
Speeding ETL Processing in Data Warehouses White Paper 020607dmxwpADM High-Performance Aggregations and Joins for Faster Data Warehouse Processing Data Processing Challenges... 1 Joins and Aggregates are
Dr. Osama E.Sheta Department of Mathematics (Computer Science) Faculty of Science, Zagazig University Zagazig, Elsharkia, Egypt oesheta75@gmail.
Evaluating a Healthcare Data Warehouse For Cancer Diseases Dr. Osama E.Sheta Department of Mathematics (Computer Science) Faculty of Science, Zagazig University Zagazig, Elsharkia, Egypt [email protected]
E-government Supported by Data warehouse Techniques for Higher education: Case study Malaysian universities
E-government Supported by Data warehouse Techniques for Higher education: Case study Malaysian universities Mohammed Abdulameer Mohammed Universiti Utara Malaysia Kedah, Malaysia Ahmed Rasol Hasson Babylon
JOURNAL OF OBJECT TECHNOLOGY
JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,
DATA MINING AND WAREHOUSING CONCEPTS
CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation
Business Intelligence in E-Learning
Business Intelligence in E-Learning (Case Study of Iran University of Science and Technology) Mohammad Hassan Falakmasir 1, Jafar Habibi 2, Shahrouz Moaven 1, Hassan Abolhassani 2 Department of Computer
DATA ANALYSIS USING BUSINESS INTELLIGENCE TOOL. A Thesis. Presented to the. Faculty of. San Diego State University. In Partial Fulfillment
DATA ANALYSIS USING BUSINESS INTELLIGENCE TOOL A Thesis Presented to the Faculty of San Diego State University In Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science
Data Quality in Data warehouse: problems and solution
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 1, Ver. IV (Jan. 2014), PP 18-24 Data Quality in Data warehouse: problems and solution *Rahul Kumar
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Institute of Computing Science Laboratory of Intelligent Decision Support Systems Politechnika Poznańska (Poznań University of Technology) Software
Fuzzy Duplicate Detection on XML Data
Fuzzy Duplicate Detection on XML Data Melanie Weis Humboldt-Universität zu Berlin Unter den Linden 6, Berlin, Germany [email protected] Abstract XML is popular for data exchange and data publishing
Integrated Data Management: Discovering what you may not know
Integrated Data Management: Discovering what you may not know Eric Naiburg [email protected] Agenda Discovering existing data assets is hard What is Discovery Discovery and archiving Discovery, test
Master Data Management
Master Data Management David Loshin AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO Ик^И V^ SAN FRANCISCO SINGAPORE SYDNEY TOKYO W*m k^ MORGAN KAUFMANN PUBLISHERS IS AN IMPRINT OF ELSEVIER
BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining
BUSINESS INTELLIGENCE Bogdan Mohor Dumitrita 1 Abstract A Business Intelligence (BI)-driven approach can be very effective in implementing business transformation programs within an enterprise framework.
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Metadata Technique with E-government for Malaysian Universities
www.ijcsi.org 234 Metadata Technique with E-government for Malaysian Universities Mohammed Mohammed 1, Ahmed Hasson 2 1 Faculty of Information and Communication Technology Universiti Teknikal Malaysia
Enabling Data Quality
Enabling Data Quality Establishing Master Data Management (MDM) using Business Architecture supported by Information Architecture & Application Architecture (SOA) to enable Data Quality. 1 Background &
