STATISTICS IN MEDICINE Statist. Med. 2000; 19:1645 1649 Data management in a longitudinal cross-cultural study R. John Roberts 1; ;, Beverly S. Musick 2, Bola Olley 3, Kathleen S. Hall 1, Hugh C. Hendrie 1 and Allen B. O. O. Oyediran 3 1 Department of Psychiatry; Indiana University School of Medicine; Indianapolis; U.S.A. 2 Division of Biostatistics; Department of Medicine; Indiana University School of Medicine; Indianapolis; U.S.A. 3 Department of Preventive and Social Medicine; University of Ibadan; Ibadan; Nigeria SUMMARY The Indianapolis Ibadan Dementia Project compares the rates of dementia at two sites, one in the U.S.A. and one in Nigeria. This paper concentrates on the data management issues in this longitudinal cross-cultural study. Approximately 2500 elderly people were recruited at each site, and continue to be re-assessed every two years. All the data are collected on paper and then entered into a FoxPro relational database. Most of the data management, including data cleaning, is done in Indianapolis. The design of the data collection forms is particularly important in a cross-cultural study, with the questions and the coding of responses clear and simple. Since Nigeria and the U.S.A. have dierent levels of technological development, the computer hardware and software were chosen to be suitable for use at either site. Exchange visits have been needed to address data management issues and resolve unexpected problems. The data management on cross-cultural studies can be handled successfully, given careful planning. Copyright? 2000 John Wiley & Sons, Ltd. INTRODUCTION The Indianapolis Ibadan Dementia Project is a longitudinal study of two populations, one in the U.S.A. and one in Nigeria. This paper concentrates on the data management issues which have arisen in this cross-cultural study with one study site in a developing country. STUDY DESIGN The study was designed to compare the rates of Alzheimer s disease and other dementias in population-based samples of people aged 65 or over, in Indianapolis, U.S.A., and Ibadan, Nigeria. It is a collaborative project between Indiana University School of Medicine and University College Correspondence to: R. John Roberts, Division of Biostatistics, Department of Medicine, RG 4th, 1001 West Tenth Street, Indianapolis, IN 46202-2859, U.S.A. E-mail: roberts1@iupui.edu Contract=grant sponsor: National Institute on Aging; contract=grant number: RO1 AG 09956, P30 AG 10133; Contract=grant sponsor: Alzheimer s Association; contract=grant number: II RG-95-084 Copyright? 2000 John Wiley & Sons, Ltd.
1646 R. J. ROBERTS ET AL. Hospital, Ibadan. In Indianapolis a random sample of residential addresses was drawn for each of the 29 contiguous census tracts in the study area, and all elderly African-Americans at these addresses were eligible. In Ibadan a complete census was done for all households in the study area, and all elderly residents were eligible. The study is longitudinal, with subjects being re-assessed every two years. At the rst phase subjects at each site were screened to assess cognitive and social functioning (2494 in Ibadan, 2212 in Indianapolis), and approximately 15 per cent of these went on to have a more detailed clinical and neuropsychological assessment. Subsequent phases have also used a two-stage design. The third phase of the study is now almost complete, and about 60 per cent of the subjects remain after losses due to death, refusal, and other reasons. A fourth phase is planned. Details of the study have been reported previously [1; 2]. GENERAL DATA MANAGEMENT The data are collected on paper and entered into a FoxPro relational database using personal computers. For ease of entry and reduction of errors, the data entry system resembles the paper forms as much as possible. Range limits and consistency checks are included to further reduce entry errors. For the third phase, double entry is being used. The main data sets are the screening (about 500 variables), clinical (750 variables), and neuropsychological (300 variables), with smaller data sets for blood test data, CT scans and a subject registry. Data are processed using both FoxPro and SAS. Data are entered at both sites. This allows early detection of errors and, with ready access to the paper documents, eases the process of resolving questions. Also, in Nigeria, notes written on the forms are often in the native language (Yoruba) which would be dicult to interpret if data were entered in Indianapolis. Aside from the entry, most of the data management is carried out in Indianapolis. This includes cleaning the data, producing reports, drawing samples for clinical assessments, and preparing data for statistical analysis. The eort devoted to data management varies, but has been about the equivalent of one person full-time in Indianapolis, and half that amount in Ibadan. Occasionally, while the Nigerian data were being processed, the data entry was temporarily suspended in Ibadan. Such interruptions might be avoided by using direct entry over the Internet, if Internet access and phone lines were more reliable than at present. Despite any small inconveniences, distributed data entry generally works well. Delays in getting the data to Indianapolis mean that sometimes problems have not been spotted quickly. For example, during the early stages of the second phase, some interviews were conducted in Ibadan using the wrong version of the questionnaire. When a new round of data collection starts, it is important to review and enter some data quickly to check for interviewer eects, misinterpreted questions and other problems. DESIGNING DATA COLLECTION FORMS The process of developing screening instruments which can be used in dierent cultures has been described previously [3]. Cultural dierences mean that some items are site specic (for example, a long-term memory question asks either about the assassination of Martin Luther King or about the Nigerian Civil War). However, the questions were developed to be equivalent, and this means that generally there is a 1 to 1 correspondence of the variables.
DATA MANAGEMENT IN LONGITUDINAL CROSS-CULTURAL STUDY 1647 In designing data collection forms for a cross-cultural study, it is imperative that the meaning of all items and responses be as clear as possible. We were able to eliminate ambiguity through logical self-evident questions. We tried to code responses consistently and intuitively. By avoiding open-ended questions, we were able to help dene the meaning of the questions and obtain results that could be directly used in analysis. During extensive training of the interviewers and pilot testing of the forms, most of the errors and ambiguities were eliminated. A few minor problems still emerged. For example, initially the interviewer was asked to rate the reliability and completeness of the interview, but, because there are actually two interviews conducted, one with the subject and one with a family member, it was unclear which interview was being assessed. This was later changed so that the interviewer could rate the two interviews separately. LONGITUDINAL ISSUES As the study progresses, we learn more about the subjects and, sometimes, discover that previously collected data are incorrect or need modication in light of new information. Deciding how to record this information has been troublesome, since analysis, publications and study decisions have been based on the original information. To remedy this, we archive all data sets used for publications and overwrite the original data when necessary with the more accurate, up-to-date information. For instance, for the few subjects who were accidentally screened twice, we moved their second screening data from the main database into an archive le. With any longitudinal study spanning several years, hardware and software become outdated. While we have upgraded the personal computers, operating systems and statistical software, we have opted to continue to use FoxPro versions 2.0 and 2.5 for the data entry systems. Conversion to Visual FoxPro, or indeed any other software, would require extensive re-writing. With each phase of the study, we have weighed the eort to upgrade against the benets, and have concluded that it would not be worthwhile. Data collected at the beginning of the study in 1991 are stored in the same le format and sometimes the same les as the data collected in 1998, which greatly reduces the data management burden. WORKING IN A DEVELOPING COUNTRY Communications with the site in Nigeria are more dicult than would be the case with a second site in a Western country. Most communication is by fax or, more recently, by electronic mail. Both of these can be interrupted for days at a time by telephone problems in Nigeria. Until very recently, computer les were exchanged by mailing oppy disks with concomitant delays. Now les are transmitted as attachments to electronic mail messages. Exchange visits have to be carefully planned, since distance between the sites means that it is not economical to make a quick trip for a day or two, and visa requirements prevent trips on short notice. Nigeria is a developing country, and the general technological level is lower than in the U.S.A. Items and services common in the U.S.A. may not be readily available in Nigeria. At the start of the study, paper survey instruments were chosen instead of direct data entry on laptop computers. Initially a laptop had been purchased for data collection, but it was quickly abandoned due to the dusty, humid climate in Ibadan and the absence of electrical power in the subjects homes there. In
1648 R. J. ROBERTS ET AL. retrospect, using paper forms still appears to have been a good decision. Electronic weight scales and sphygmomanometers were also avoided for similar reasons. Even oce-based work can be more dicult in a developing country. The electrical power supply in Ibadan is not always reliable. There are periods with no power, and when the power is on there can be large voltage uctuations. These problems are partially solved by using a UPS (Uninterruptible Power Supply a lead acid battery back-up), which gives the user a few minutes to save the current data if the power fails, and also protects against voltage spikes. Purchase of our own generator was considered, but a recent upgrade of the power supply in Ibadan improved its reliability. It was expected that exchange visits would be needed between the two sites for training of interviewers and consensus medical diagnosis. In practice, it has been found that visits have also been needed to address administrative and data management issues about two visits per year (one Indianapolis to Ibadan, one Ibadan to Indianapolis). UNEXPECTED DIFFICULTIES The cross-cultural nature of the study has given rise to some unexpected diculties, despite careful work developing and testing the survey instruments. For example, subject identication in Ibadan has sometimes been dicult. Most of the people in our study are illiterate (85 per cent) and do not know their exact date of birth (97 per cent). (Approximate dates of birth were established by the use of historic landmark events [4], at the rst phase of the study). People are known mainly by informal nicknames derived, for example, from their occupation or where they live. In addition, names are given dierently on dierent occasions, and there is no standardized spelling of names. This has given rise to occasional duplication (11 subjects were screened twice) and the possibility of misidentication. Thus extra eort has been made to collect all formal and informal names. All subjects are now photographed and given study identication cards. As a nal check, the interviewer asks various questions such as previous occupations and number of children, and compares the answers against previously reported information, to ensure that the subject has been correctly identied. CONCLUSIONS The data management for cross-cultural studies can be handled successfully. In planning this type of study, avoid complicated technology and rely on simple, familiar data management software. The need for exchange visits for managing data and training should not be underestimated. Extra eort is needed in designing the data collection forms and reviewing the initial results. ACKNOWLEDGEMENTS Supported by grants from the National Institute on Aging (RO1 AG 09956 and P30 AG 10133) and the Alzheimer s Association (II RG-95-084).
DATA MANAGEMENT IN LONGITUDINAL CROSS-CULTURAL STUDY 1649 REFERENCES 1. Hall KS et al. A cross-cultural community based study of dementias: Methods and performance of the survey instrument Indianapolis, U.S.A. and Ibadan, Nigeria. International Journal of Methods in Psychiatric Research 1996; 6:129 142. 2. Hendrie HC et al. Prevalence of Alzheimer s disease and dementia in two communities: Nigerian Africans and African Americans. American Journal of Psychiatry 1995; 152:1485 1492. 3. Hall KS et al. The development of a dementia screening interview in two distinct languages. International Journal of Methods in Psychiatric Research 1993; 3:1 28. 4. Ogunniyi AO, Osuntokun BO. Determination of ages of elderly Nigerians through historical events: Validation of Ajayi- Igun 1963 listing. West African Journal of Medicine 1993; 12:189 190.