Analyzing and Improving Data Quality
|
|
|
- Amber Barrett
- 10 years ago
- Views:
Transcription
1 Analyzing and Improving Data Quality Agustina Buccella and Alejandra Cechich GIISCO Research Group Departamento de Ciencias de la Computación Universidad Nacional del Comahue Neuquen, Argentina and Gonzalo Domingo Proyectos de Telesupervisión y Geociencias D.S.I. Cuenta E&P - Argentina Sur Repsol YPF [email protected] Abstract Data quality is a research area strongly investigated during the 90 s. However, few companies in Argentina apply data quality methodologies or tools during the analysis, design or implementation phases of software development process. Developers generally use techniques to design systems such as UML without considering mechanisms for future data quality problems. In this work we propose a methodology in which the data quality is an essential part of the whole software development process. Early design decisions on data quality strongly impact on the system. Our methodology defines a set of practices to be applied on the software life cycle. In addition these practices act as a means to evaluate if systems already running fulfill with minimal data quality requirements. 1 Introduction Many definitions for the term Data Quality have been proposed in the literature, each considering data from different perspectives. However all definitions converge on only one theory: data quality is strongly related to the use of data [3, 7, 9]. In this way, depending on the use of data, quality can be considered enough for some uses and not enough for others. Rules as garbage in, garbage out, if inaccurate information is entered, only inaccurate information will result, pay now or pay more later are still true within the data quality issues. In our work, we consider the definition of model quality of FUNDIBQ (Iberoamer- This work is partially supported by the UNComa project 04/E072 (Identificación, Evaluación y Uso de Composiciones Software). ican Foundation of Quality) in which data quality is defined as a set of essential characteristics of a product, service, system, or process to meet needs and expectations of interested parties. In conclusion data quality represents a point of agreement among parties. An organization containing low data quality generates dissatisfied clients when their invoices or requests contain errors in their personal data, dissatisfied employees when they make errors because of both incorrect and uncertain information, dissatisfied managers when they have to take decisions based on these data, etc. Therefore, considering data quality as an important characteristic within an organization will avoid this type of problems and will take new benefits. For instance, improvements to support the decision making process, decrease in time to obtain information, replacement of expensive activities for cheaper ones, and a better concept of the organization s image are only some of the benefits that a good data quality provides. Several approaches for data quality have been proposed in the literature [1, 4, 5, 6, 8, 10]. In particular, works related to our proposal can be found in [5, 6, 10]. The proposal of Wang et. al [8, 10] defines a methodology, named Total Data Quality Management (TDQM), aimed at generating high quality information products for consumers of this information. The methodology analyzes and conceptualizes information products in order to build information manufacturing systems (IMS). Thus, these IMS contain the set of functionalities of a system together with the quality controls that must be implemented. In this way, the methodology can identify possible data quality 57 1
2 problems by analyzing the way data are produced. In the proposal described in [5], the data quality concept is based on the use of data. The author assumes that the only way to truly improve data quality is to increase the use of that data. In this way, his work defines six rules for data quality including unused data do not remain correct for a long time, data quality is defined as how the data is used instead of how they are obtained, data quality will not be better than their more rigorous use, etc. Based on these rules, four activities are defined in order to evaluate and analyze data quality: auditory, redesign, training, and measure. The first rule consists of determining how good are data today. Redesign activity refers to evaluate critical data managed by current applications analyzing carefully the use given for each of them. Training activity aims to prepare users for understanding how important data quality is. Thus, training and educational tasks are carried out. Finally, measure activity refers to constantly judge data quality, that is, all previous activities must be repeated generating an iterative process. In comparison with our work, we propose a methodology defined as a practical guide that must be applied to every system during early phases in the software development process. A set of practices or recommendations are proposed as rules to be evaluated. These rules are not based on the use of data but on the way they are influenced over a set of dimensions of data quality. However, several of these rules or practices included in our approach are based on rules defined in [5]. Finally, in [6] data quality is measured through multiple quality dimensions such as, accessibility, completeness, reliability, consistency, etc. Control matrices are used to combine problems on data quality with quality controls, and thus allowing developers to evaluate information products. Columns of the matrix enumerate the problems in data quality that affect information products; and rows represent quality controls applied on the manufacturing information process to prevent, detect, or correct data quality problems. In this way, these controls tend to avoid errors within the information product. As in this proposal, our work is based on the fact that data quality must be an activity considered during the data life cycle. In addition, we implement the evaluation of applications by using a matrix. Then, we classify different quality practices, each one as part of one phase of the life cycle of data, analyzing how preestablished parameters are met. This paper is organized as follows: the next section briefly introduces data quality concepts used in our approach. Section 3 presents our methodology describing a set of practices within the data life cycle. Section 4 presents a real case study Figure 1. The data life cycle applying the methodology on a real application. Future work and conclusions are discussed afterwards. 2 Data Quality In [7] the data life cycle is composed of four main phases: data modeling, obtaining values, storing data, and visualizing data. Figure 1 shows these phases graphically. The first phase refers to the process of modeling a universe of discourse. The resultant abstraction represents the reality described by user requirements. This abstraction defines a logic model representing data and the flow of information between different roles of users and applications. In the obtaining values phase, data is obtained from reality through user interfaces of the system or through interfaces with other systems. Here, aspects such as masks, dates, validations, and/or rules implemented on interfaces must be taken into account. In the storing data phase, data obtained in the last phase are stored on repositories. Finally, the visualizing data phase refers to the way data are presented to users. Aspects such as inconsistencies, robustness, or identification of errors, must be considered. In order to analyze and evaluate data quality within a software development process we consider four main dimensions [2]: Accuracy: Do data represent exactly the reality or verified sources?. It is related to the source, that is, the level of correspondence among data and the real word. Completeness: Are all needed data available? Which data are absent?. It refers to data that must be available within an information system. Consistency: Were data consistently defined and understood? It refers to the definition of standards and protocols for data. All data must be presented in a compatible format defined as the best format for the task in development. 58 2
3 Timeliness: Are data available when required? Within this dimension the concept of volatility is defined denoting the time data remain valid. These four dimensions are used in our work to evaluate the data quality of an organization together with practices applied on each phase of the data life cycle. Next section describes each of these practices and how they are influenced by the dimensions. 3 A New Approach for Improving Data Quality In order to propose a new methodology, which consists of a set of practices, every activity within a software development process has been analyzed with respect to data quality. The different phases of data life cycle are crucial to classify and evaluate each practice in accordance with the four dimensions described in the last section. As a result of our analysis, we define 64 practices where 27 are in the data modeling phase, 22 in the obtaining values phase, 4 in the storing data phase and 11 in the visualizing data phase. For brevity we only describe some of these practices in this paper. Data Life Cycle: Data Modeling When a user account is deactivated, the user in charge of the flow must be notified in order to take preventive actions: In this way, actions deriving from this change maintain the business flow updated. This practice affects timeliness because the data will remain valid during time. If the data change, users in charge of these data will be notified allowing them to early detect differences with respect to the reality. Accuracy is also affected, as this practice detects in an early phase variances between stored data and reality, minimizing the negative impact of redundant data. Data must be retrieved from a data source: Applications capturing data closer to their generation are considered as data sources. Then, an interface between a data source and this application must be implemented when the data are required by another application. In this way, the uniqueness of the process is guaranteed, improving the use and the implementation of new systems. This practice generates data that remain valid for more time supporting the timeliness dimension. In addition, consistency is also supported because changes can be early detected. The definition of data modeling does not changed when these types of systems are related to each other. This practice also benefits the completeness dimension because all elements of each data can be obtained from another system. In regards to accuracy, the better use and implementation of data improves the relation between the data and the reality. In addition, the practice tries to avoid redundant data by reducing the probability of error. Avoiding redundant data among systems: This practice guarantees not only uniqueness but also improves the use and implementation of systems. Data remain valid for more time and changes can be early detected. Thus, timeliness is supported. With respect to accuracy, this practice intents to improve the relation between data and reality avoiding same information through different systems. Again, as in the last practice, by avoiding redundant data, the probability of error is reduced. Data Life Cycle: Obtaining Values Before executing an insertion, use like queries: This practice refers to verify if a datum exists in a database before inserting it. For instance, if we want to insert a street with the name of Rivadavia we must first check coincidences between this name and the street name stored in the database. Then, if there are two instances with the same name, the user must choose which of them is the right one in each case. In this way, consistency is fully supported because typical problems of data formats can be avoided. The probability of errors is reduced by ruling out the possibility of typing errors. Evaluating the input of data more than once when they are critical: Critical data must be previously identified to avoid data insertion becoming tedious. Accuracy is supported because typing errors within critical data are minimized. Data Life Cycle: Storing Data An inferred data must never be entered manually: To avoid typing errors during data inputs applications must implement this practice. The four dimensions are supported. Timeliness because inferred data from these data also change when data sources change; consistency because the data format will be defined within the application; completeness 59 3
4 because these data are not entered manually and the practice must check that data sources are complete; and accuracy because the inferred rules must be modeled correctly guaranteeing also the accuracy of data in which they are based on. Business rules must be part of the applications so generated data can be stored and filtered by these rules: In this way, the rule can reject the input whether input data are not what the application is expecting. Timeliness is supported because the rule checks for unupdated inputs; consistency is supported because rules also consider the format of data; and accuracy is supported because typing errors are intended to be reduced. Data Life Cycle: Visualizing data The system must alert about expiration dates: The system must inform the person who is in charge of the data, that some data are loosing validity in accordance with the logic of the application. Timeliness is supported because preventive actions can be implemented; and accuracy because the reality is again verified when data become inexact. The system must verify and inform about changes on data tendency: In this way, a change on data tendency can be identified. A functional analysis will be required by the owner of the data to determine whether the change was in fact a variation on the tendency of reality or just a mere error. Accuracy is improved by this practice because it is possible to early detect and localize errors in the data registry. The supported business process must be opened to other processes (sharing data): This practice is more related to business than to systems. However, it is important to identify when a process must interact with other processes. In this way, the use of data is intensified. The set of practices that we defined in our approach are classified by two criteria depending on the independence of the person performing the analysis, Objective and Subjective. The first one is performed by a quality committee in charge of evaluating the practices. The second one depends on the person performing the evaluation; it can vary in accordance with the knowledge of the domain and his/her previous experience. A score table determines an error value when a practice is not fulfilled with respect to each criteria. The Objective Classification is defined as follows: Standard practice (E): It is considered as a standard in the industry and practices in this classification must be implemented. Good practice (B): Practices here must be implemented when possible. They are very important for guaranteeing data quality over each dimension they affect. However, they are not mandatory as in the first classification as sometimes they are difficult to apply. When practices are not implemented the reasons must be documented. Recommendation (R): These practices are considered favorable. The project leader is responsible for evaluating the cost/benefit of the application for each practice. The Subjective Classification contains the following values: No error: Practices are implemented and correctly applied on the application. No effect: The recommended practice is not implemented. However it is not an error; this decision is taken during design time and it must be documented. Slight: Situations in which data quality is slightly damaged. For instance, when the application allows entering low data quality without affecting either critical data or the success of the task that the user is performing. Grave: Situations in which low data quality can affect the success of the task. For instance an incorrect data generating an erroneous report. Fatal: Conceptual errors, incorrect application of a model, or errors hindering the finalization of a task successfully. They are dangerous errors because users never know when and why they happen. 3.1 Improving the Software Development Process Our work was developed in a company in Neuquén, Argentina, named it El Petróleo SA (for confidentiality reasons). The company has established a predefined methodology for the software development process that must be followed by every development. However, data quality was not considered as a task during its early phases. 60 4
5 Table 1. The score table for the objective classification Score Standard practice 3 Good practice 2 Recommendation 1 No action 0 Figure 2. Part of the the new software development process During our exhaustive analysis about how to improve data quality within the company, a set of set of practices and recommendations were defined. These practices changed the predefined methodology including quality controls applied during the whole process. Figure 2 shows part of the new software development process with these controls. The Figure shows how different stakeholders interact with the development of an application. The stakeholders are: Referent User, Functional Analyst, Application Provider, and Functional Support. During the analysis and requirements specification phase, the referent user delivers scripts describing functional requirement of the system to be built. At the same time (but in a separate process) the functional analyst determines quality guidelines. In order to define which practices will be applied, the functional analyst, together with the referent user, evaluate weights and cost/benefit of each of the practices. As a result, a document containing these quality guidelines is created and delivered to the application provider. Then, the project team analyzes if there is a corporative solution to apply to the above needs (this step is not shown in the figure due to lack of space). If a solution exists, the project manager must agree with it. However, if a predefined solution is not available, a new document has to be generated enumerating the needs of the business, the goals of the solution and their scope. The requirement specification and design are the first phases to be performed when the development is local. The scripts are delivered to the programming team in order to estimate the time that must be committed to each script. Then, the project team creates the schedule and the plan to be delivered. New interviews with the referent user define final priorities and delivered times. Finally, a document is written and signed by each party, i.e. the referent user and the software manager. This document is then received by the programming team together with quality guidelines. The document contains the 64 practices except for the ones that have been deleted by a design decision. After that, unit and integration tests are performed by the programming team. In conformity with the schedule, the source code of the application and the installation manual is delivered. To perform functional tests, the project team installs the application in a test server. In addition, the functional support uses these functional tests as guides to perform data quality tests. To do so, he/she applies the two criteria described before to obtain error values, and the subsequent recommendation for accepting or rejecting the tests. Finally, the functional support creates, based on these results, a document with recommendations in regards to data quality. In this document each practice, together with each dimension, are evaluated describing whether they are being applied or not. An explanation must be provided where a practice is not applied. The functional analyst and the referent user have to decide then whether the recommendation document of data quality satisfies a minimal set of acceptable quality guidelines. The application finishes its development when the referent user approves the document. In the case of the document being rejected, the application returns to the programming team in order to perform all the required changes. 3.2 Verifying Applications In our approach each practice is scored with respect to the objective classification (Table 1). During the verification phase, the functional support evaluates each practice and each dimension by using the score table for the subjective classification (Table 2). After that, the error value of each practice (Table 1) times the error value for each dimension (Table 2) give us the final value for each practice. Finally, all these final error values 61 5
6 Table 2. The score table for the subjective classification Score No error 0 No effect 0 Slight 5 Grave 15 Fatal 40 are added to determine the final result of the verification. If the final result is lower than 46 the application passes the test; if it is between 46 and 119 the application is accepted with observations; and if it is higher than 120 the application does not pass the test. These thresholds have been chosen to generate a rejection (application does not pass the test) in the case where a standard practice contains a fatal error (equivalent to 40 x 3). Accordingly, any error value greater than this will be rejected. A pass will be granted with any error value lower than 46. Therefore in order to pass the test only one grave error in a standard practice will be accepted. Any combination of intermediate values will pass the test with observations. The results of the test will be validated with the referent user. He/she can accept the application in spite of a negative result of the test. However, this must be a decision made by consensus between the referent user and the project manager. They can assume that some data quality problems will be fixed during next phases. The decision of approving or rejecting the application, the reason and even the test with the results must be documented in the recommendation document of data quality previously described. 4 Case Study Our approach has been applied to an application managing the electricity of our company El Petróleo SA. This application contains information about the electricity the company generates, buys and sells. In addition, it stores information about the electricity generators. One of the goals of this application is to generate reports for the Argentinean Secretary of Energy. Before beginning the development of this application, a training activity was performed to explain the new software development process. In this activity the programming team, the functional analyst, and functional support learned about their role in this new process in which data quality is strongly involved. Thus, before programming the code of the application, the team received the 64 practices together with the classification of the four dimensions for each data life cycle. Once the programming team finished the unit and integration tests, the functional support performed the functional, usability and data quality tests defined in our new development process. To do so, he/she analyzed each practice verifying errors. The final task was the writing and delivery of the recommendation document of data quality. Some observations written in this document are described below. Data Life Cycle: Data Modeling When a user account is deactivated, the user in charge of the flow must be notified in order to take preventive actions: The timeliness dimension contains a slight value due to the lack of controls related to this dimension. However, as the number of users is low in this phase a harder assessment was not necessary. For the same reason, the accuracy dimension contains a no effect value. An observation explaining the reasons was documented. Data must be retrieved from a data source: In the application there are data extracted from source data of another application. For instance, the market contracts in SAP. The application defines an interface to extract these data. Therefore, the practice contains a no error value. Avoiding redundant data among systems: The timeliness dimension contains a slight value and the accuracy contains a no effect value. As the data source extracted from SAP cannot be accessed on-line, once a day data are exported to the application. Thus, the timeliness is affected because the application can contain invalid data for at most 24 hours. However, the accuracy is not affected because imported data are slightly dynamics. Data Life Cycle: Obtaining Values Before executing an insertion, use like queries: The accuracy dimension contains a slight value and the consistency and completeness a no effect value because in this case redundant data are verified by the database. Evaluating the input of data more than once when they are critical: The accuracy dimension contains a no effect value because there are implemented mechanisms to manage critical data. For instance, a mail is triggered by the application when an invoice is entered. The mail is received by the people in charge of verifying and approving the payment. 62 6
7 Data Life Cycle: Storing Data An inferred data must never be entered manually: This practice contains a no error value. Business rules must be part of the applications so generated data can be stored and filtered by these rules: This practice contains a no error value. Data Life Cycle: Visualizing data The system must alert about expiration dates: This practice contains a no error value. For instance, a defined rule is that once an invoice is entered into the application there are only 5daystopayit. The system must verified and inform about changes on data tendency: The accuracy dimension contains a slight value because this practice is not implemented yet. However, it will be implemented during next phases in the development process. The supported business process must be opened to other processes (sharing data): This practice contains a no error value. The final result of the error values gave a score of 90 points composed of: 15 points for a standard practice with slight error, 10 points for a good practice with slight error, 10 points for a good practice with slight error, 5 points for a recommendation with slight error, 15 points for a standard practice with slight error, 10 points for a good practice with slight error, 15 points for a standard practice with an slight error, 10 points for a good practice with slight error. With this final result of 90 points, the data quality test generates an assessment of Approve with observations. Taking into account the above result, the application continues with its development process. Next phases should implement at least some of the observations described in the recommendation document of data quality. 5 Conclusions and Future Work In our work we have assumed that the definition of data quality depends on an agreement among parties. That is, the set of features a final product must meet to satisfy the expectations of the parties. However, a poor data quality assessment is likely to be obtained when there is a lack of mechanisms to prevent and improve data quality. In order to face the data quality problem we have proposed changes within the software development process including data quality as an activity in early phases. In addition, these activities are performed through the implementation of a set of practices to apply on both system to be developed and system in production. We have defined also a data quality test in which each practice and each dimension is scored. Accordingly, applications can be evaluated with respect to data quality on each phase of the data life cycle. Our approach should be an integral component of an organization, of the development team, and of the mentality of their components. In order to achieve this, processes must be improved generating a change within the culture of the participants. Business-persons have to understand why time and money must be spent to check data quality. Data quality must not be something additional to the system, it must form part of the beginning of the software development process. As future work, we are extending our approach through all systems of the company. As a result, we will be able to perform more experimental validation and to propose new practices according to the complexity and the type of application. References [1] D. Ballou and H. Pazer. Modeling data and process quality in multi-input, multi-output information systems. Management Science, 31(2): , [2] G. Brackstone. Managing data quality in a statistical agency. Survey Methodology, (25): , [3] E. M. Burns, O. MacDonald, and A. Champaneri. Data quality assesment methodology: A framework. In Joint Statistical Meetings - Section on Government Statistics, pages , [4] L.Pipino, Y. W. Lee, and R. Y. Wang. Data quality assessment. Communications of the ACM, 45(4): , [5] K. Orr. Data quality and systems theory. Communications of the ACM, 41(2):66 71, February [6] E. Pierce. Assesing data quality with control matrices. Communications of the ACM, 47(2):82 86, February [7] T. Redman. Data Quality: The Field Guide. Digital Press, January [8] G. Shankaranarayanan, R. Y. Wang, and M. Ziad. Ip-map: Representing the manufacture of an information product. MIT Conference on Information Quality, [9] G. Tayi and D. Ballou. Examining data quality. Communications of the ACM, 41(2):54 57, February [10] R. Y. Wang. A product perspective on total data quality managment. Communications of the ACM, 41(2):58 65, February
TOWARD A FRAMEWORK FOR DATA QUALITY IN ELECTRONIC HEALTH RECORD
TOWARD A FRAMEWORK FOR DATA QUALITY IN ELECTRONIC HEALTH RECORD Omar Almutiry, Gary Wills and Richard Crowder School of Electronics and Computer Science, University of Southampton, Southampton, UK. {osa1a11,gbw,rmc}@ecs.soton.ac.uk
Data Quality Assessment. Approach
Approach Prepared By: Sanjay Seth Data Quality Assessment Approach-Review.doc Page 1 of 15 Introduction Data quality is crucial to the success of Business Intelligence initiatives. Unless data in source
Data Quality Assessment
Data Quality Assessment Leo L. Pipino, Yang W. Lee, and Richard Y. Wang How good is a company s data quality? Answering this question requires usable data quality metrics. Currently, most data quality
Do you know? "7 Practices" for a Reliable Requirements Management. by Software Process Engineering Inc. translated by Sparx Systems Japan Co., Ltd.
Do you know? "7 Practices" for a Reliable Requirements Management by Software Process Engineering Inc. translated by Sparx Systems Japan Co., Ltd. In this white paper, we focus on the "Requirements Management,"
Building a Data Quality Scorecard for Operational Data Governance
Building a Data Quality Scorecard for Operational Data Governance A White Paper by David Loshin WHITE PAPER Table of Contents Introduction.... 1 Establishing Business Objectives.... 1 Business Drivers...
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management
PUBLIC Model Manager User Guide
SAP Predictive Analytics 2.4 2015-11-23 PUBLIC Content 1 Introduction....4 2 Concepts....5 2.1 Roles....5 2.2 Rights....6 2.3 Schedules....7 2.4 Tasks.... 7 3....8 3.1 My Model Manager....8 Overview....
A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms
A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms Ian Davidson 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl country code 1st
The Requirements Compliance Matrix columns are defined as follows:
1 DETAILED REQUIREMENTS AND REQUIREMENTS COMPLIANCE The following s Compliance Matrices present the detailed requirements for the P&I System. Completion of all matrices is required; proposals submitted
10. System Evaluation
200 Chapter 10. System Evaluation 10. System Evaluation 10.1. Introduction Evaluation of software and databases is a practice established for so long as that of developing software itself. (Martyn & Unwin
Product-Line Instantiation Guided By Subdomain Characterization: A Case Study
Product-Line Instantiation Guided By Subdomain Characterization: A Case Study Patricia Pernich and Agustina Buccella and Alejandra Cechich and Maximiliano Arias and Matias Pol la GIISCO Research Group
Course 103402 MIS. Foundations of Business Intelligence
Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:
Appendix B Data Quality Dimensions
Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational
Metadata Repositories in Health Care. Discussion Paper
Health Care and Informatics Review Online, 2008, 12(3), pp 37-44, Published online at www.hinz.org.nz ISSN 1174-3379 Metadata Repositories in Health Care Discussion Paper Dr Karolyn Kerr [email protected]
ElegantJ BI. White Paper. Considering the Alternatives Business Intelligence Solutions vs. Spreadsheets
ElegantJ BI White Paper Considering the Alternatives Integrated Business Intelligence and Reporting for Performance Management, Operational Business Intelligence and Data Management www.elegantjbi.com
Data Discovery & Documentation PROCEDURE
Data Discovery & Documentation PROCEDURE Document Version: 1.0 Date of Issue: June 28, 2013 Table of Contents 1. Introduction... 3 1.1 Purpose... 3 1.2 Scope... 3 2. Option 1: Current Process No metadata
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
E-MAIL DEFENDER SERVICES
E-MAIL DEFENDER SERVICES Email Defender User Guide 2015-02-12 What does E-Mail Defender do? Anti-Virus testing to eliminate known and suspected viruses. Blacklist services check distributed lists for fingerprints
Registrar Ramp Up Process. Prepared by Afilias
Registrar Ramp Up Process Prepared by Afilias December 2013 Contents Introduction... 2 Get Started By Having Someone Contact You... 2 Become a Registrar... 3 Step One Business and Legal Process... 3 Step
Five Fundamental Data Quality Practices
Five Fundamental Data Quality Practices W H I T E PA P E R : DATA QUALITY & DATA INTEGRATION David Loshin WHITE PAPER: DATA QUALITY & DATA INTEGRATION Five Fundamental Data Quality Practices 2 INTRODUCTION
Audit TM. The Security Auditing Component of. Out-of-the-Box
Audit TM The Security Auditing Component of Out-of-the-Box This guide is intended to provide a quick reference and tutorial to the principal features of Audit. Please refer to the User Manual for more
BizTalk Server 2006. Business Activity Monitoring. Microsoft Corporation Published: April 2005. Abstract
BizTalk Server 2006 Business Activity Monitoring Microsoft Corporation Published: April 2005 Abstract This paper provides a detailed description of two new Business Activity Monitoring (BAM) features in
BPMN by example. Bizagi Suite. Copyright 2014 Bizagi
BPMN by example Bizagi Suite Recruitment and Selection 1 Table of Contents Scope... 2 BPMN 2.0 Business Process Modeling Notation... 2 Why Is It Important To Model With Bpmn?... 2 Introduction to BPMN...
RapidResponse Training Catalog
RapidResponse Training Catalog Contents About RapidResponse Training... 4 RapidResponse Roles... 4 Consumers... 5 Contributors... 6 Contributors + RapidResponse Applications... 6 Authors... 8 Basic Authors...
ETL PROCESS IN DATA WAREHOUSE
ETL PROCESS IN DATA WAREHOUSE OUTLINE ETL : Extraction, Transformation, Loading Capture/Extract Scrub or data cleansing Transform Load and Index ETL OVERVIEW Extraction Transformation Loading ETL ETL is
Change Management. Bizagi Suite
Change Management Bizagi Suite Change Management 2 Table of Contents Change Management... 5 Process Elements... 6 RFC Entering... 6 Enter RFC... 6 Technical Review Required?... 9 Technical Review and Sign
NASCIO EA Development Tool-Kit Solution Architecture. Version 3.0
NASCIO EA Development Tool-Kit Solution Architecture Version 3.0 October 2004 TABLE OF CONTENTS SOLUTION ARCHITECTURE...1 Introduction...1 Benefits...3 Link to Implementation Planning...4 Definitions...5
Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments.
Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments Anuraj Gupta Department of Electronics and Communication Oriental Institute
Welcome to Metafile. Solving document issues for over 30 years. Matt Akin [email protected] 800-638-2445 x 301
Welcome to Metafile Solving document issues for over 30 years Matt Akin [email protected] 800-638-2445 x 301 Janine Peck [email protected] 800-638-2445 x 303 Metafile helps many companies with their AP,
PROJECT AUDIT METHODOLOGY
PROJECT AUDIT METHODOLOGY 1 "Your career as a project manager begins here!" Content Introduction... 3 1. Definition of the project audit... 3 2. Objectives of the project audit... 3 3. Benefit of the audit
INFO 1400. Koffka Khan. Tutorial 6
INFO 1400 Koffka Khan Tutorial 6 Running Case Assignment: Improving Decision Making: Redesigning the Customer Database Dirt Bikes U.S.A. sells primarily through its distributors. It maintains a small customer
BillQuick Agent 2010 Getting Started Guide
Time Billing and Project Management Software Built With Your Industry Knowledge BillQuick Agent 2010 Getting Started Guide BQE Software, Inc. 2601 Airport Drive Suite 380 Torrance CA 90505 Support: (310)
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Wienand Omta Fabiano Dalpiaz 1 drs. ing. Wienand Omta Learning Objectives Describe how the problems of managing data resources
Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives
Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Describe how the problems of managing data resources in a traditional file environment are solved
A Framework for Data Migration between Various Types of Relational Database Management Systems
A Framework for Data Migration between Various Types of Relational Database Management Systems Ahlam Mohammad Al Balushi Sultanate of Oman, International Maritime College Oman ABSTRACT Data Migration is
Toward A Framework For Data Quality In Cloud- Based Health Information System
Toward A Framework For Data Quality In Cloud- Based Health Information System Omar Almutiry, Gary Wills, Abdulelah Alwabel, Richard Crowder and Robert Walters Electronics and Computer Science University
Data Quality Policy. Appendix A. 1. Why do we need a Data Quality Policy?... 2. 2 Scope of this Policy... 2. 3 Principles of data quality...
Data Quality Policy Appendix A Updated August 2011 Contents 1. Why do we need a Data Quality Policy?... 2 2 Scope of this Policy... 2 3 Principles of data quality... 3 4 Applying the policy... 4 5. Roles
Measuring Data Quality
Departamento de Computación Facultad de Ciencias Exactas y Naturales Universidad de Buenos Aires INFORME TÉCNICO Measuring Data Quality Mónica Bobrowski, Martina Marré, Daniel Yankelevich Report n.: 99-002
CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved
CHAPTER SIX DATA Business Intelligence 2011 The McGraw-Hill Companies, All Rights Reserved 2 CHAPTER OVERVIEW SECTION 6.1 Data, Information, Databases The Business Benefits of High-Quality Information
Internal Control Deliverables. For. System Development Projects
DIVISION OF AUDIT SERVICES Internal Control Deliverables For System Development Projects Table of Contents Introduction... 3 Process Flow... 3 Controls Objectives... 4 Environmental and General IT Controls...
A Comprehensive Approach to Master Data Management Testing
A Comprehensive Approach to Master Data Management Testing Abstract Testing plays an important role in the SDLC of any Software Product. Testing is vital in Data Warehousing Projects because of the criticality
Chapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
Keywords : Data Warehouse, Data Warehouse Testing, Lifecycle based Testing
Volume 4, Issue 12, December 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Lifecycle
Risk Management Service Guide. Version 4.2 August 2013 Business Gateway
Risk Management Service Guide Version 4.2 August 2013 Business Gateway This page is intentionally blank. Table Of Contents About this Guide... 1 Change History... 1 Copyright... 1 Introduction... 3 What
Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:
Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
How To Use Gps Navigator On A Mobile Phone
Software Requirements Specification Amazing Lunch Indicator Sarah Geagea 881024-4940 Sheng Zhang 850820-4735 Niclas Sahlin 880314-5658 Faegheh Hasibi 870625-5166 Farhan Hameed 851007-9695 Elmira Rafiyan
The SAS Transformation Project Deploying SAS Customer Intelligence for a Single View of the Customer
Paper 3353-2015 The SAS Transformation Project Deploying SAS Customer Intelligence for a Single View of the Customer ABSTRACT Pallavi Tyagi, Jack Miller and Navneet Tuteja, Slalom Consulting. Building
Tutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
ACCOUNTING POLICIES AND PROCEDURES
Unit: Subject: Sarbanes-Oxley Act Review - Financial Reporting Title: Risk & Control Identification Year end: ACCOUNTING POLICIES AND PROCEDURES Management should define and communicate accounting principles.
Sage 200 Business Intelligence Datasheet
Sage 200 Business Intelligence Datasheet Business Intelligence comes as standard as part of the Sage 200 Suite giving you a unified and integrated view of all your data, with complete management dashboards,
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for
TOWARDS IMPLEMENTING TOTAL DATA QUALITY MANAGEMENT IN A DATA WAREHOUSE
Journal of Information Technology Management ISSN #1042-1319 A Publication of the Association of Management TOWARDS IMPLEMENTING TOTAL DATA QUALITY MANAGEMENT IN A DATA WAREHOUSE G. SHANKARANARAYANAN INFORMATION
Re-Design an Operational Database Author: Sovan Sinha (Business Intelligence Architect) May 4 th, 2009
Re-design an Operational Database Introduction In today s world it is seen that lot of organizations go for a complete re-design of there database. Let s have a look why do we need to technically re-design
LAW OF MONGOLIA ON ELECTRONIC SIGNATURE
LAW OF MONGOLIA ON ELECTRONIC SIGNATURE December 15, 2011 Ulaanbaatar CHAPTER ONE GENERAL PROVISIONS Article 1. Purpose of the law 1.1 The purpose of this Law is to determine the legal base of using electronic
CFSD 21 ST CENTURY SKILL RUBRIC CRITICAL & CREATIVE THINKING
Critical and creative thinking (higher order thinking) refer to a set of cognitive skills or strategies that increases the probability of a desired outcome. In an information- rich society, the quality
Enterprise and Process Architecture Patterns
Introduction Oscar Barros and Cristian Julio For more than 30 years, many authors have attempted to synthesize the knowledge about how an enterprise should structure its business processes, the people
Efficient database auditing
Topicus Fincare Efficient database auditing And entity reversion Dennis Windhouwer Supervised by: Pim van den Broek, Jasper Laagland and Johan te Winkel 9 April 2014 SUMMARY Topicus wants their current
Université du Québec à Montréal. Financial Services Logical Data Model for Social Economy based on Universal Data Models. Project
Université du Québec à Montréal Financial Services Logical Data Model for Social Economy based on Universal Data Models Project In partial fulfillment of the requirements for the degree of Master in Software
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 [email protected]
Agile Business Intelligence Data Lake Architecture
Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step
Errors in Operational Spreadsheets: A Review of the State of the Art
Errors in Operational Spreadsheets: A Review of the State of the Art Stephen G. Powell Tuck School of Business Dartmouth College [email protected] Kenneth R. Baker Tuck School of Business Dartmouth College
THE BCS PROFESSIONAL EXAMINATION Diploma. October 2004 EXAMINERS REPORT. Database Systems
THE BCS PROFESSIONAL EXAMINATION Diploma October 2004 EXAMINERS REPORT Database Systems Question 1 1. a) In your own words, briefly describe why a relational database design must be normalised prior to
Clinical Data Management (Process and practical guide) Dr Nguyen Thi My Huong WHO/RHR/RCP/SIS
Clinical Data Management (Process and practical guide) Dr Nguyen Thi My Huong WHO/RHR/RCP/SIS Training Course in Sexual and Reproductive Health Research Geneva 2012 OUTLINE Clinical Data Management CDM
Business User driven Scorecards to measure Data Quality using SAP BusinessObjects Information Steward
September 10-13, 2012 Orlando, Florida Business User driven Scorecards to measure Data Quality using SAP BusinessObjects Information Steward Asif Pradhan Learning Points SAP BusinessObjects Information
Keywords: Data Warehouse, Data Warehouse testing, Lifecycle based testing, performance testing.
DOI 10.4010/2016.493 ISSN2321 3361 2015 IJESC Research Article December 2015 Issue Performance Testing Of Data Warehouse Lifecycle Surekha.M 1, Dr. Sanjay Srivastava 2, Dr. Vineeta Khemchandani 3 IV Sem,
The Information Management Body of Knowledge
The Information Management Body of Knowledge 2 Allen Lee s simple model (see Fig. 1.2) was the foundation of the IMBOK, but it masks huge complexities in the territory where information technology, business
Data Domain Discovery in Test Data Management
Data Domain Discovery in Test Data Management 1993-2016 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording
USER S MANUAL Cloud Email Firewall 4.3.2.4 1. Cloud Email & Web Security
USER S MANUAL Cloud Email Firewall 4.3.2.4 1 Contents 1. INTRODUCTION TO CLOUD EMAIL FIREWALL... 4 1.1. WHAT IS CLOUD EMAIL FIREWALL?... 4 1.1.1. What makes Cloud Email Firewall different?... 4 1.1.2.
Analysis and Design of a Simplified Patient Care System, DNS
Analysis and Design of a Simplified Patient Care System, DNS Info 620: Information Systems Analysis and Design Claire King, Christie McHargue, Adelaida Montanez, Sarah Neergaard Project Category: Analysis
Studio Designer 80 Guide
Table Of Contents Introduction... 1 Installation... 3 Installation... 3 Getting started... 5 Enter your company information... 5 Enter employees... 6 Enter clients... 7 Enter vendors... 8 Customize the
Methods and tools for data and software integration Enterprise Service Bus
Methods and tools for data and software integration Enterprise Service Bus Roman Hauptvogl Cleverlance Enterprise Solutions a.s Czech Republic [email protected] Abstract Enterprise Service Bus (ESB)
Microsoft Axapta Inventory Closing White Paper
Microsoft Axapta Inventory Closing White Paper Microsoft Axapta 3.0 and Service Packs Version: Second edition Published: May, 2005 CONFIDENTIAL DRAFT INTERNAL USE ONLY Contents Introduction...1 Inventory
SAP Business Objects XIR3.0/3.1, BI 4.0 & 4.1 Course Content
SAP Business Objects XIR3.0/3.1, BI 4.0 & 4.1 Course Content SAP Business Objects Web Intelligence and BI Launch Pad 4.0 Introducing Web Intelligence BI launch pad: What's new in 4.0 Customizing BI launch
Sage 200 Business Intelligence Datasheet
Sage 200 Business Intelligence Datasheet Business Intelligence comes as standard as part of the Sage 200 Suite giving you a unified and integrated view of important data, with complete management dashboards,
1.1.1. What makes Panda Cloud Email Protection different?... 4. 1.1.2. Is it secure?... 4. 1.2.1. How messages are classified... 5
Contents 1. INTRODUCTION TO PANDA CLOUD EMAIL PROTECTION... 4 1.1. WHAT IS PANDA CLOUD EMAIL PROTECTION?... 4 1.1.1. What makes Panda Cloud Email Protection different?... 4 1.1.2. Is it secure?... 4 1.2.
HP Quality Center. Upgrade Preparation Guide
HP Quality Center Upgrade Preparation Guide Document Release Date: November 2008 Software Release Date: November 2008 Legal Notices Warranty The only warranties for HP products and services are set forth
ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013
ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION, Fuel Consulting, LLC May 2013 DATA AND ANALYSIS INTERACTION Understanding the content, accuracy, source, and completeness of data is critical to the
Ontology-Based Discovery of Workflow Activity Patterns
Ontology-Based Discovery of Workflow Activity Patterns Diogo R. Ferreira 1, Susana Alves 1, Lucinéia H. Thom 2 1 IST Technical University of Lisbon, Portugal {diogo.ferreira,susana.alves}@ist.utl.pt 2
Data Modeling Basics
Information Technology Standard Commonwealth of Pennsylvania Governor's Office of Administration/Office for Information Technology STD Number: STD-INF003B STD Title: Data Modeling Basics Issued by: Deputy
CA Nimsoft Monitor. Probe Guide for Apache HTTP Server Monitoring. apache v1.5 series
CA Nimsoft Monitor Probe Guide for Apache HTTP Server Monitoring apache v1.5 series Legal Notices This online help system (the "System") is for your informational purposes only and is subject to change
Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution
Warehouse and Business Intelligence : Challenges, Best Practices & the Solution Prepared by datagaps http://www.datagaps.com http://www.youtube.com/datagaps http://www.twitter.com/datagaps Contact [email protected]
Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management 6.1 2010 by Prentice Hall LEARNING OBJECTIVES Describe how the problems of managing data resources in a traditional
WebSphere Business Monitor
WebSphere Business Monitor Dashboards 2010 IBM Corporation This presentation should provide an overview of the dashboard widgets for use with WebSphere Business Monitor. WBPM_Monitor_Dashboards.ppt Page
Measuring and Monitoring the Quality of Master Data By Thomas Ravn and Martin Høedholt, November 2008
Measuring and Monitoring the Quality of Master Data By Thomas Ravn and Martin Høedholt, November 2008 Introduction We ve all heard about the importance of data quality in our IT-systems and how the data
Journal of Information Technology Management SIGNS OF IT SOLUTIONS FAILURE: REASONS AND A PROPOSED SOLUTION ABSTRACT
Journal of Information Technology Management ISSN #1042-1319 A Publication of the Association of Management SIGNS OF IT SOLUTIONS FAILURE: REASONS AND A PROPOSED SOLUTION MAJED ABUSAFIYA NEW MEXICO TECH
Intelligent Tools For A Productive Radiologist Workflow: How Machine Learning Enriches Hanging Protocols
GE Healthcare Intelligent Tools For A Productive Radiologist Workflow: How Machine Learning Enriches Hanging Protocols Authors: Tianyi Wang Information Scientist Machine Learning Lab Software Science &
LAB 3: Introduction to Domain Modeling and Class Diagram
LAB 3: Introduction to Domain Modeling and Class Diagram OBJECTIVES Use the UML notation to represent classes and their properties. Perform domain analysis to develop domain class models. Model the structural
Traceability Patterns: An Approach to Requirement-Component Traceability in Agile Software Development
Traceability Patterns: An Approach to Requirement-Component Traceability in Agile Software Development ARBI GHAZARIAN University of Toronto Department of Computer Science 10 King s College Road, Toronto,
54 Robinson 3 THE DIFFICULTIES OF VALIDATION
SIMULATION MODEL VERIFICATION AND VALIDATION: INCREASING THE USERS CONFIDENCE Stewart Robinson Operations and Information Management Group Aston Business School Aston University Birmingham, B4 7ET, UNITED
REQUIREMENTS FOR THE MASTER THESIS IN INNOVATION AND TECHNOLOGY MANAGEMENT PROGRAM
APPROVED BY Protocol No. 18-02-2016 Of 18 February 2016 of the Studies Commission meeting REQUIREMENTS FOR THE MASTER THESIS IN INNOVATION AND TECHNOLOGY MANAGEMENT PROGRAM Vilnius 2016-2017 1 P a g e
