USE OF -DHIS 2 Module-4 Data Quality and Validation Learning objectives: After reading this module you will be able to understand: 1. What is data quality and its importance for. 2. How to do data quality check at point of data entry. 3. How to create data validation rules. 4. How to carry out data triangulation. 5. How to analyze data status. M-4. 1
USE OF -DHIS 2 4.1 Overview of data quality check Ensuring data quality is a key concern in building an effective. Data quality has different dimensions including: Correctness: Data should be within the normal range for data collected at that facility. There should be no gross discrepancies when compared with data from related data elements. Completeness: Data for all data elements for all health facilities/blocks/taluka/districts should have been submitted. Consistency: Data should be consistent with data entered during earlier months and years while allowing for changes with reorganization, increased work load, etc. and consistent with other similar facilities. Timeliness: All data from all health facilities/blocks/taluka/districts should be submitted at the appointed time. If we have poor quality data, we will have garbage in and garbage out situations. Use of poor quality data leads to ill informed decisions. So, the software should be built in with different tools to do data quality checks and validation. 4.1.1 Data quality checks Data quality checking can be done through various means, including: 1. At point of data entry, the software can check the data entered to see if it falls within the min-max ranges of that data element over the last six months or as defined by the user. M-4. 2
USE OF -DHIS 2 2. Defining various validation rules, which can be run once the user has finished data entry. The user can also check the entered data for a particular period and Organization Unit(s) against the validation rules, and display the violations for these validation rules. 3. Analysis of data sets, ie, examining gaps in data. 4. Data triangulation which is comparing the same data or indicator from different sources. 4.2 Data quality check at the point of data entry Data quality can be checked at the point of data entry in the following two ways: a) By setting the minimum and maximum value range for each element manually. Or b) Generating the min-max values using the DHIS 2 if there is historical data available for that data element. a) Setting the minimum and maximum value range manually If you are using the default entry screen click on the element for which you want to set the min-max value, as shown below. M-4. 3
USE OF -DHIS 2 A pop up window will appear as shown below. Here you can enter the min-max values. On subsequent data entry if the value entered does not fall within the set min-max range the text box will change colour to red. The user will also M-4. 4
USE OF -DHIS 2 get a popup as shown below. This change in colour is a prompt to check the data entered and make necessary correction. On the data entry screen the users also have the option to add a comment on how the discrepant figure might be explained (if required). This you can do by using the drop down menu of the comment box. In case you are using the custom data entry screen which is displayed when you deselect the default data entry form option on the top right M-4. 5
USE OF -DHIS 2 corner of the screen. In this case the minimum and maximum values can be added by double-clicking on the data entry box instead of the data element. b) Generated min-max values If you have a minimum of six months of your data entered in the DHIS2 it is possible to generate the min-max value, element-wise, using the DHIS2. In such case you merely need to click on the Generate min-max tab as shown below. In case of default data entry screen the min and max values, when generated, will appear on the left and right side of the data entry box. In case you deselect the default data entry form the generated values will appear on the top right end of the screen as shown in the following screenshot. M-4. 6
USE OF -DHIS 2 M-4. 7
USE OF -DHIS 2 4.3 Defining Validation Rules Validation rules are data quality check mechanism based on verification of the logic of relation between related data elements. Validation rules are relational expressions comprising of related data elements and an operator that states the expected / logical relation between the elements. For example number of infant deaths cannot be greater than the number of deliveries. As can be seen from the example a validation rule comprises of a left and a right side. On the left side of the expression, there must be a data element or a combination of data elements, and the same on the right side. The left and right hand sides of the expression are separated with a validation operator which states the realtion between the elements. As validation rules have a relational property there must be atleast two data elements for which the validation rules may be applied. 4.3.1 Types of validation operators (equal to, less than, greater than): Following are some validation operators used for data quality analysis in DHIS. Equal to: It will validate the validation rule only if both sides are equal. Not Equal to: It will validate the rule if both the sides Not Equal Greater Than: It will validate the rule if the left side is greater than the right side. Greater Than Equal to: It will validate the rule, if the left side is Greater or Equal to the right hand side. M-4. 8
USE OF -DHIS 2 Less Than: It will validate the rule if the left side is smaller than the right hand side. Less Than Equal to: It will validate the rule if the left side is either smaller or equal to the right side. 4.3.2 Adding new validation rule Follow the steps below to add a new valiadtion rule. First select the Data Quality module from the drop down menu of the Services module located on the main tool bar. M-4. 9
USE OF -DHIS 2 In the screen that is displayed Validation Rule Management screen, click on Add new The following screen will appear. M-4. 10
USE OF -DHIS 2 Enter the first three fields specifically validation rule name, description of the validation rule and select the particular operator that forms the validation rule. Next click on the Edit left side button to enter the left side details of the concerned validation rule. The following steps can be used by the user: 1. Add Description 2. Select data element from the Available Data Elements options shown on the right side. 3. Add Operators in between the data elements to generate the desired formula. M-4. 11
USE OF -DHIS 2 4. When you have entered the required fields click on update. This will return you to the previous window. Here click on the Edit right side and follow the steps that you followed for the Edit left. 4.4 Validation Checks When you open the Data Quality module you will see a menu on the left side that lists different options related to validation rules. For purposes of validation checks you will need to use the Run validation option. This is described below. 4.4.1 Run validation: 1. If you select the run validation option the following screen will be displayed. M-4. 12
USE OF -DHIS 2 2. You will be required to specify the period for which you want to run the validation check by selecting the start and end dates. This you can do by using the drop down calender provided for the date fields. 3. Next select the particular organisation unit (s) for which you want to run the validation. 4. Finally click on the validate tab. 5. When you click on Validate (Number 5 on the screenshot) button the following popup will be displayed which will list the validation rules that are violated with data values of the elements constituting the particular validation rule. M-4. 13
USE OF -DHIS 2 4.4 Diagnosing the source of the validation violation: This you can accomplish by selecting the Run validation by avergae option. You can run this validation after entering the required fields of the run validation screen which includes the organisation units(s) and the period for which you want to run the validation. The result of Run validation by average is a pop up screen (see screenshot below) that displays the percentage of validation rules violated by the selected orgnaisation unit(s). Amogst the Orgunits showing violation, select the one which has maximum violation percentange, and drill down to get its detailed M-4. 14
USE OF -DHIS 2 validation list. You could do the same for any other organisation unit as well. The screen that gets displayed presents the list of validation rules that have been violated by the specific orgnaisation unit (see screenshot below). M-4. 15
USE OF -DHIS 2 To get drilldown for one validation, click on any validation rule, it will give you the detailed validation analysis for the selected orgunit and its immediate children. M-4. 16
USE OF -DHIS 2 If you click on any orgunit you can drilldown to its children. 4.5 Analysis of data status The purpose of analysis of data status is to see what is the percentage of missing or unreported data either by data elements or by facilities. 4.5.1 Types of missing data Missing data can be listed by facilities and or data elements. Missing data creates different kinds of problems, such as: 1. Incomplete reports. 2. Indicator calculations will be misleading as there will be some numerator or denominator that is missing. 3. Effective decisions cannot be based on incomplete data. M-4. 17
USE OF -DHIS 2 4. Probably, the facility which is not reporting data is the one which needs more care and support. 4.5.2 Generating missing data reports by facilities, data elements and periods Data Status option provides us the tool to analyze how much data is entered. You can find this option in Dashboard Module which is inturn displayed in the drop down menu of the services module. Clicking on the dashboard module will lead to the following screen where you can find the Data status option. M-4. 18
USE OF -DHIS 2 Once you click on Data Status option you will get the following screen where you can select orgunit, dataset, period for which you want to generate the data status. Once you have done this click on View data status as shown below. M-4. 19
USE OF -DHIS 2 You will get the following output. From here you can go drill down to its immediate children by clicking any orgunit to obtain more detailed sub facility wise data status. M-4. 20
USE OF -DHIS 2 4.6 Data Triangulation Data (for example on institutional deliveries) is collected from different sources such as routine health data and NFHS surveys. By plotting data on this data element across the three surveys and juxtaposing it with routine data, we can have a method of data triangulation. Da t a ::Tr ia ngul a t ion Routine Health Data Routine Health data. Collected and reported routinely every month NFHS Large scale, multi-round survey conducted in a representative sample of households throughout India. Once in 5 years. Census largest single source of a variety of statistical information on different characteristics of the people of India once in 10 years Trends in Institutional Deliveries (%) 80 70 60 50 40 30 20 10 37 46 55 70 0 NFHS 1 NFHS 2 NFHS 3 Apr-Aug 07 37 46 55 70 In the boxes below, the NFHS trends in institutional delivery are compared with trends of monthly figures from the state routine. Trends in Institutional Deliveries (% ) Institutional Deliveries (%) 80 70 70 60 55 50 46 40 37 30 20 10 0 NFHS 1 NFHS 2 NFHS 3 Apr-Aug 07 Institutional Delivery 37 46 55 70 74 73 72 71 70 69 68 67 66 65 64 63 73.2 70.7 70.3 69 66.9 M-4. 21 Apr-07 May-07 Jun-07 Jul-07 Aug-07