Portavita Benchmark: A Dataset Generator for Healthcare

Similar documents
CDA and CCD for Patient Summaries

Guidelines for the management of hypertension in patients with diabetes mellitus

ADVANCE: a factorial randomised trial of blood pressure lowering and intensive glucose control in 11,140 patients with type 2 diabetes

High Blood Pressure (Essential Hypertension)

Cardiovascular Risk in Diabetes

MISSING DATA ANALYSIS AMONG PATIENTS IN THE PINNACLE REGISTRY

Cardiovascular Disease Risk Factors

A Patient s Guide to Primary and Secondary Prevention of Cardiovascular Disease Using Blood-Thinning (Anticoagulant) Drugs

Open source framework for data-flow visual analytic tools for large databases

SUMMARY OF CHANGES TO QOF 2015/16 - ENGLAND CLINICAL

Tackling the Semantic Interoperability challenge

SINTERO SERVER. Simplifying interoperability for distributed collaborative health care

Basics of Dimensional Modeling

Beacon User Stories Version 1.0

EVALUATION OF MEDICAL RECORDS COMPLETENESS IN THE ADULT CARDIOLOGY CLINIC AT NORK MARASH MEDICAL CENTER

Achieving Quality and Value in Chronic Care Management

Case Study 6: Management of Hypertension

This clinical study synopsis is provided in line with Boehringer Ingelheim s Policy on Transparency and Publication of Clinical Study Data.

Mortality Assessment Technology: A New Tool for Life Insurance Underwriting

UW MEDICINE PATIENT EDUCATION. Aortic Stenosis. What is heart valve disease? What is aortic stenosis?

Report on comparing quality among Medicare Advantage plans and between Medicare Advantage and fee-for-service Medicare

LIPID PANEL CHOLESTEROL LIPOPROTEIN, ELECTROPHORETIC SEPARATION LIPOPROTEIN, DIRECT MEASUREMENT (HDL) LDL DIRECT TRIGLYCERIDES

PPS UNDERWRITING GUIDE FOR APPLICANTS

Diabetic nephropathy is detected clinically by the presence of persistent microalbuminuria or proteinuria.

UNIVERSITY OF BIRMINGHAM AND UNIVERSITY OF YORK HEALTH ECONOMICS CONSORTIUM (NICE EXTERNAL CONTRACTOR) Health economic report on piloted indicator(s)

HEART HEALTH WEEK 3 SUPPLEMENT. A Beginner s Guide to Cardiovascular Disease HEART FAILURE. Relatively mild, symptoms with intense exercise

Oracle BI 11g R1: Build Repositories

Interpretation of Pulmonary Function Tests

Acquired Heart Disease: Prevention and Treatment

High Blood Pressure. Dr. Rath s Cellular Health Recommendations for Prevention and Adjunct Therapy

2016 PQRS OPTIONS FOR INDIVIDUAL MEASURES: CLAIMS, REGISTRY

An Overview and Guide to Healthy Living with Type 2 Diabetes

Institute of Applied Health Sciences. University of Aberdeen DATABASE REVIEW. Grampian University. Hospitals NHS Trust GRAMPIAN DIABETES

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Summary of QOF indicators

ADULT HYPERTENSION PROTOCOL STANFORD COORDINATED CARE

Disability Evaluation Under Social Security

Cohort Studies. Sukon Kanchanaraksa, PhD Johns Hopkins University

ACO Program: Quality Reporting Requirements. Jennifer Faerberg Mary Wheatley April 28, 2011

NCD for Lipids Testing

ECG may be indicated for patients with cardiovascular risk factors

NCT sanofi-aventis HOE901_3507. insulin glargine

Clinical Mapping (CMAP) Draft for Public Comment

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

TERMS FOR UNDERSTANDING YOUR TYPE 2 DIABETES. Definitions for Common Terms Related to Type 2 Diabetes

Stroke: Major Public Health Burden. Stroke: Major Public Health Burden. Stroke: Major Public Health Burden 5/21/2012

CHAPTER 9 DISEASES OF THE CIRCULATORY SYSTEM (I00-I99)

Liver Function Essay

DISCLOSURES RISK ASSESSMENT. Stroke and Heart Disease -Is there a Link Beyond Risk Factors? Daniel Lackland, MD

RAW PREVALENCE FOR NORTHERN IRELAND AS AT 31 MARCH 2014

EUROASPIRE II. European Action on Secondary and Primary Prevention through Intervention to Reduce Events

KIH Cardiac Rehabilitation Program

Complete coverage. Unbeatable value.

CASE B1. Newly Diagnosed T2DM in Patient with Prior MI


Delta s Healthy Rewards Program. Administration Services

Population Health Management Program

DIABETES A chronic, debilitating and often deadly disease A global epidemic Diabetes in Africa

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

High-Volume Data Warehousing in Centerprise. Product Datasheet

Type 2 Diabetes workshop notes

PRODUCT INFORMATION. Insight+ Uses and Features

The basic data mining algorithms introduced may be enhanced in a number of ways.

Cardiovascular diseases. pathology

Most probable Diagnosis

CHAPTER V DISCUSSION. normal life provided they keep their diabetes under control. Life style modifications

MY TYPE 2 DIABETES NUMBERS

Diabetes Mellitus Type 2

Coronary Heart Disease (CHD) Brief

Type 2 diabetes Definition

How To Understand What You Know

Data Driven Approaches to Prescription Medication Outcomes Analysis Using EMR

High Blood Cholesterol

Important information regarding your Medical Examiners Certificate (DOT card). Please read carefully! Driver name:

Evaluation Checklist Data Warehouse Automation

MEDICAL EXAMINATION GUIDANCE

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Insulin is a hormone produced by the pancreas to control blood sugar. Diabetes can be caused by too little insulin, resistance to insulin, or both.

Performance Optimization Guide Version 2.0

Heart Diseases and their Complications

DCCT and EDIC: The Diabetes Control and Complications Trial and Follow-up Study

The Jefferson Health Plan. Member Organization Wellness Program Incentive Guide July 1, 2015 June 30, 2016

Marilyn Borkgren-Okonek, APN, CCNS, RN, MS Suburban Lung Associates, S.C. Elk Grove Village, IL

Stress is linked to exaggerated cardiovascular reactivity. 1) Stress 2) Hostility 3) Social Support. Evidence of association between these

Data Integrator Performance Optimization Guide

Predict the Popularity of YouTube Videos Using Early View Data

GENERAL HEART DISEASE KNOW THE FACTS

SOLUTION BRIEF. IMAT Enhances Clinical Trial Cohort Identification. imatsolutions.com

XML Processing and Web Services. Chapter 17

Clinical Quality Measure Crosswalk: HEDIS, Meaningful Use, PQRS, PCMH, Beacon, 10 SOW

Section 2. Overview of Obesity, Weight Loss, and Bariatric Surgery

InfiniteGraph: The Distributed Graph Database

Hospital-based SNF Coding Tip Sheet: Top 25 codes and ICD-10 Chapter Overview

Transcription:

Portavita Benchmark: A Dataset Generator for Healthcare Editors Albana Gaba, Yeb Havinga, Tom van der Weide License Creative Commons, Attribution-ShareAlike Date Februari 3, 2015 Contributors Albana Gaba, Yeb Havinga, Tom van der Weide, Jasper Visser, Evert Jan Hoijtink, Heimen Brons, Jan Willem Kijne, Pieter Spoelstra (Portavita) - Willem Dijksta, Fabian Walraven (MGRID) The research leading to these results has received funding from the European Union s Seventh Framework Programme (FP7/2007-2013) under grant agreement nr. 318633.

Contents 1. Summary 5 2. Introduction 5 3. Background on healthcare information model standards 6 3.1. HL7 Version 3................................. 6 3.2. Clinical Document Architecture CDA................. 7 3.3. Fast Healthcare Interoperable Resource FHIR............. 7 4. Building data models 9 4.1. Overview of Portavita s data representation............... 9 4.1.1. Organizations............................. 10 4.1.2. Roles.................................. 10 4.1.3. Treatments............................... 10 4.1.4. Examinations............................. 11 4.2. Modeling Clinical Data............................ 12 4.2.1. Modeling organizations....................... 12 4.2.2. Modeling Patients.......................... 12 4.2.3. Modeling examinations frequency................. 13 4.2.4. Modeling Examinations....................... 13 4.2.5. The problem of missing values................... 14 4.3. Limitations................................... 14 5. Dataset generator 16 5.1. Synthetic data generation process..................... 16 5.2. Validation................................... 16 5.2.1. Single variable............................ 17 5.2.2. Multi-variable............................. 18 5.3. Performance evaluation........................... 20 5.3.1. Portavita Benchmark v1....................... 20 5.3.2. Portavita Benchmark v1 Performance evaluation....... 21 5.3.3. Portavita Benchmark v2....................... 22 5.3.4. Portavita Benchmark v2 Performance evaluation....... 22 5.3.5. Conclusions on performance.................... 24 6. Benchmark queries 25 6.1. Queries overview............................... 25 7. Conclusions 27 A. Clinical Document Architecture 29 2

B. List of Observations per Examination 32 Bibliography 43 List of Figures 3.1. Simplified representation of the RIM classes................ 6 3.2. UML model of the FHIR Observation resource.............. 8 4.1. Data generation process............................ 9 4.2. Organization hierarchy............................ 10 4.3. Treatment representation........................... 11 4.4. An examination representation........................ 11 5.1. Steps for generating a synthetic dataset with only one organization.. 16 5.2. Discrete real data............................... 17 5.3. Discrete synthetic data............................ 17 5.4. Continuous real data............................. 18 5.5. Continuous synthetic data.......................... 18 5.6. Discrete real data............................... 18 5.7. Discrete synthetic data............................ 18 5.8. Continuous real data............................. 19 5.9. Continuous synthetic data.......................... 19 5.10. Imputed missing real data.......................... 19 5.11. Imputed missing synth data......................... 19 5.12. Co-missing real data............................. 20 5.13. Co-missing synthetic data.......................... 20 5.14. The components involved in the batch processing architecture of Portavita Benchmark v1.............................. 20 5.15. Performance of Portavita Benchmark v1 grouped by component. Time required to create 1GB of synthetic data for various dataset sizes.... 21 5.16. Portavita Benchmark v2. Micro-batch architecture............. 22 5.17. Performance of Portavita Benchmark v2. Time required to generate 1GB of data................................... 23 5.18. Comparative performance evaluation. Amount of documents loaded per second for v1 and v2........................... 23 3

List of Tables 4.1. Data used to model a patient in relation to a treatment.......... 12 4.2. Data used to model the distribution of examinations per patient for each year..................................... 13 4.3. An example of lab examinations. Each row represents a lab examination instance................................... 13 4

1. Summary In this document we introduce Portavita Benchmark[6], a data generator for benchmarking on healthcare data. The generator is based on statistical models of anonymized clinical data from Portavita s care management system. It generates organizations, practitioners and nearly 50 different kinds of examinations that consist of 940 different kinds of observations. The data generated is fully compliant with the HL7 healthcare interoperability standards. Portavita Benchmark includes both clinical document generation and transformation to relational database persistence, and generates up to 1TB/hour of clinical documents. Benchmark queries are included for database performance measurements. 2. Introduction To develop effective techniques for processing and storing large medical data, it is often required to evaluate and compare the performance of these systems. In this document we introduce Portavita Benchmark, the first dataset generator specific for healthcare data. It uses real clinical data for building models, which are then used to create arbitrarily large synthetic datasets. A notable aspect that distinguishes medical data from other data is the use of domainspecific conventions, typically put forward by standards organizations. Standards are instrumental as health data is exchanged between various healthcare organizations, or even between various departments of the same organization. They provide, among others, well-defined information models and specific names for each medical concept. Portavita Benchmark is composed of two parts: data modeling based on existing clinical data and data generation. In the first part we see how data is structured in the Portavita care management system, compliant to Health Level 7 (HL7) standards. In the second part, synthetic health records are generated in two exchangeable HL7 formats, namely CDA and FHIR. This process is followed by the transformation and storage of these documents into a PostgreSQL DBMS. Finally, we provide a brief overview of the queries that come with the Portavita Benchmark for benchmarking purposes. 5

3. Background on healthcare information model standards Standards play a fundamental role in facilitating the interoperability between various healthcare organizations spread across different countries. They determine common ways to model data and to express domain-specific concepts. In this chapter we introduce three important Health Level 7 information models which are widely adopted by the healthcare community. Portavita Benchmark heavily relies on them, as we will see in the next chapters. 3.1. HL7 Version 3 HL7 Version 3 provides a object-oriented development methodology based on a reference information model (RIM). The RIM is an essential part of the HL7 Version 3 as it provides a universal information model for healthcare interoperability, covering the entire healthcare domain [2]. In the RIM there are six high-level concepts to describe all clinical data: entities, roles, acts, participations, role links, and act relationships, as shown in Figure 3.1. There are several specializations of the main classes. We describe briefly the most important classes that we use in our data model. Entity classcode name Role Participation classcode typecode classcode code effectivetime moodcode 1 0..n effectivetime 1 0..n 0..n 1 effectivetime 1 0..n confidentiality confidentiality Code Code statuscode negationind Act Act- Relationship Person Organization CareProvision Observation Substance- Administration Employee Patient VIPCode Assigned- Entity Figure 3.1.: Simplified representation of the RIM classes. Entities can be organizations or persons. A role can be played by an entity and scoped in another entity. For example, patient is a role that is played by a person (an entity) and is scoped by an organization (also an entity). Acts describe events. For example, 6

observations and examinations are acts. A role participates in acts. For example, a patient role can participate as a subject in an observation and a practitioner role can participate as a performer in an observation. Specializations, e.g., Observation, inherit properties from the class they specialize from, e.g., Act, while they have their specific attributes. 3.2. Clinical Document Architecture CDA Clinical Document Architecture, often referred to as CDA, like any clinical documentation, is used to describe care provided to a patient, to maintain a patient medical record and to exchange information between healthcare providers. The CDA is a XML-based markup standard, based on HL7 Version 3 RIM, and as such, it fully maps clinical data modeled with the RIM. A CDA document is comprised of two parts. The header contains contextual information, such as, the patient it applies to, the organization and the person who wrote it and the time when the document was written. The body contains humanreadable narrative text and optional structured clinical statements, including act, observation, substance administration, encounter, procedure, organizer and supply [3]. Appendix A illustrates an example of a CDA document. 3.3. Fast Healthcare Interoperable Resource FHIR FHIR has been introduced by HL7 as a next generation standards healthcare framework. It combines the best features of HL7 s Version 2, Version 3 and CDA product lines while applying a tight focus on easing the exchange of clinical documents and implementability. For example, FHIR provides highly modular components called Resources which can be easily assembled in a way that they can fully represent clinical data. Most resource elements and data type properties include mappings to the RIM. So, just like with the RIM components, FHIR provides resources for modeling RIM classes, like Observation, Patient, Organization and so on. Figure 3.2 shows the UML diagram of an Observation FHIR resource. Similarly to an observation RIM class, it has references to the patient, the performer, the organization, and among other fields, it includes the observation code and the corresponding value. FHIR resources are easily accessible in a wide variety of contexts, including mobile phone applications. In particular, FHIR provides additionally a RESTful API which defines a set of common interactions performed on a repository of typed resources (read, update, search, etc). These interactions follow the RESTful paradigm of managing state by Create/Read/Update/Delete actions on a set of identified resources [4]. 7

Figure 3.2.: UML model of the FHIR Observation resource. 8

4. Building data models An important aspect of Portavita Benchmark is that it generates data that resemble real clinical data. Figure 4.1 shows the processes involved by Portavita Benchmark to generate synthetic datasets based on real data. First, we retrieve the necessary data, according to the models we aim to create, and aggregate them into csv documents. Further, the aggregated data is used to create models by training Bayesian Networks. These models are finally used by the generator to create arbitrarily large datasets of CDAs and FHIR documents. We start by introducing the way data is structured originally in Portavita. This will help to understand later the structure of the synthetic datasets. Portavita Database Aggregate Create Bayesian Networks Dataset-generator CDA / FHIR Figure 4.1.: Data generation process. 4.1. Overview of Portavita s data representation Portavita provides a care management system for treating patients with chronic conditions. The treatments covered by Portavita are numerous, but for the purpose of Portavita Benchmark we consider only data from diabetes, COPD 1 and CVRM 2 treatments, which are the ones with the highest number of records. Portavita uses HL7 Version 3 RIM data models to represent clinical data. This section gives a short overview of the main data concepts, which are also represented in the data generated by Portavita Benchmark. 1 Chronic Obstructive Pulmonary Disease 2 CardioVascular Risk Management 9

4.1.1. Organizations The organizations of Portavita s customers are organized in hierarchies. For Portavita, a top-level organization is called a care group. A care group has a number of suborganizations, each of which can have sub-organizations on its own. As such, this structure can be represented as a hierarchy. An example hierarchy is given in the following figure. Care group GP Surgery GP Surgery Pharmacy Dietitian GP Surgery Figure 4.2.: Organization hierarchy. 4.1.2. Roles There is a large variety of roles played by users in Portavita s system. Here are some of the most relevant for Portavita Benchmark. A care group employee is a role played by users that work for the care group and act as a superusers. One of the main activities is monitoring organizations within the care group. A quality employee is a role that a user can have, which allows them to compare practitioners within an organization or set of organizations. A practitioner is a care provider that can be active in the treatment of patients. A patient is a person who receives treatment. A researcher is a person who is granted access to part of the data to perform research. Note that a person can play multiple roles inside the organization. For example, a practitioner could also be a patient within the same organization. 4.1.3. Treatments When a new patient is entered in the system, a treatment such as diabetes or CVRM, is assigned to them. Every treatment has at least two participations: a subject participation (i.e., patient), and a performer participation (i.e., principal practitioner). The principal practitioner has the final responsibility over the treatment. The diagram in Figure 4.3 illustrates the relationships between a treatment (a type of act), examinations (also acts) performed in the context of a treatment, and the patient 10

and principal practitioner participations in the treatment. It is important to note that typically each observation performed to a patient is linked to a care treatment the patient is part of. Patient Treatment Principal practitioner Examination Examination Examination Figure 4.3.: Treatment representation. 4.1.4. Examinations Examinations make up the largest part of the data in Portavita s database. They are typically performed in the context of a treatment. For example, in a diabetes treatment, a patient typically performs examinations on a regular basis, such as yearly checkups, foot examinations, eye checkups and so on. The general structure of an examination is shown in Figure 4.4. An examination has at least four participations: a subject participation (the patient), a performer participation (the practitioner who performs the examination), a data enterer participation (the person who enters the data into the system), and a legal authenticator participation (the practitioner who accords the data that was entered). Most often, though, a single practitioner participates in the role of performer, data enterer and legal authenticator. Performer Patient Examination Data enterer Organizer Legal authenticator Organizer Observation Organizer Observation Observation Observation Observation Figure 4.4.: An examination representation. Examinations consist of simple observations and organizers, which contain observations grouped together in meaningful sets. Typically, an examination has a small number of observations that are mandatory. However, larger examinations may 11

consist of up to a hundred of different observations. Typically, several observations can only be entered if certain other observations have been made. For example, only if a patient has a skin rash, the performer can enter details about the skin rash. 4.2. Modeling Clinical Data We create models of clinical data by training Bayesian Networks upon Portavita production data. Bayesian Networks capture dependencies between various variables and distribution of their values. For each concept to be modeled, we aggregate the necessary data by querying the production Portavita database. The queries are performed on anonymized patient data. We create models for the following concepts: organization, patient, examination, and treatment. We use the R package bnlearn for training Bayesian networks. To train Bayesian Networks it is important to differentiate between discrete and continuous variables. For example, a discrete variable is used to express whether a patient smokes, with possible values yes, no, or in the past. A continuous variable is, for instance, HbA1c which comprehends values in a certain range. Integer numbers (e.g., number of days) are considered as continuous values by bnlearn. 4.2.1. Modeling organizations To generate an organization with a certain patient population size and number of practitioners, we learn a mixture Gaussian from the sizes of the actual organizations from one caregroup. For this, we use the R package called mixtools. 4.2.2. Modeling Patients A patient is represented by his age, treatment he is part of and the duration of such treatment. Therefore, for each treatment we train Bayesian networks using the variables as shown in Table 4.1. Field Patient Age (days) Treatment Treatment duration (days) Type Continuous Discrete Continuous Table 4.1.: Data used to model a patient in relation to a treatment. 12

4.2.3. Modeling examinations frequency To capture the dependency between the age of the patient and the kind of examinations that are performed in one year, we train Bayesian networks using the patient s age, and for each type of examination, the number of examinations that were performed in that year. Table 4.2 shows the data used to model examinations. Field Type Patient Age (days) Continuous # Examinations A Continuous # Examinations B Continuous...... Table 4.2.: Data used to model the distribution of examinations per patient for each year. 4.2.4. Modeling Examinations There are about 50 examinations in Portavita s system for the treatments we have considered, such as yearly check up, foot checkup, lab examinations and so on. A list of examinations is provided in appendix B. We create a model for each of them. For every kind of examination that can be entered into the system, a query is performed that returns a table with a column for each possible observation value. In addition, for every examination instance, a single row is returned, which contains a cell for every observation that could be measured within that examination. As an example, let us look at the Lab examination. On a regular basis, the blood of the patient must be examined. When performing a lab examination, a blood sample is collected and sent to the lab. The lab examines the blood on a subset of around 30 measurable variables, such as HbA1c and Cholesterol. Not all variables are always measured, but only those that were requested. Furthermore, the data contain both continuous, discrete variables, or a missing value, as shown in Table 4.3. HbA1c HDL LDL Triglyceride.. Albumin 47 4.7.. 27 4.5 3.9.. 4.7.. 31............ Table 4.3.: An example of lab examinations. Each row represents a lab examination instance. The R package bnlearn cannot be used to learn hybrid models, i.e., models containing both discrete and continuous variables. We have tried other packages such as deal, but without success. The number of variables was too big (e.g. 80) resulting in the package being unable to allocate an array of the right size. Because the data 13

consist of discrete and continuous values, we came up with the following solution. Namely, for every examination we train three Bayesian networks: network for discrete values (missing values are represented explicitly) network for continuous values, and network for the missing values in continuous variables Note that missing values for discrete variables can be seen as simply another value, but that for continuous variables this is not the case. To generate examinations that have similar patterns of missing values, we therefore must train another network for the patterns in the missing data of continuous variables. 4.2.5. The problem of missing values Because the input data contain many missing values and many machine learning algorithms cannot deal with missing values, it is important to impute the data. To keep the performance within reasonable levels, we use two packages in R for imputation: mice and imputation. We start with the mice imputation method called norm predict. This method calculates regression weights from the observed data. However, the result of performing this method may still contain missing values. If this is the case, then we continue with the imputation package. If there are more than 1000 rows in the data, then we use the method gbmimput, which uses boosted regression trees for each column x to predict x using all other columns except x. GBM impute is only used when the dataset is large enough, otherwise it does not work well. If there are less than 1000 rows, then we use lmimput, which fills missing values in a column by running locally weighted least squares regression. 4.3. Limitations The models created to represent the original data determine the quality of the synthetic dataset. Therefore, it is worth recalling the limitations of the models built throughout this stage. Hybrid Bayesian Networks The networks we have trained, and hence modeled, are separate for continuous (or numeric) values, discrete and missing values. It implies that within an examination there may be discrete variables that are inconsistent with other continuous values. For example, discrete variable issmoker may be no, while the number of cigarettes smoked per day may be a non-zero integer. Time-based observations per patient Subsequent examinations related to a single patient are not co-related. This means that observations concerning a single patient over time are not consistent. Imputed missing values Since a number of observations were imputed, the accuracy of the models including such observations may be affected. As a consequence, the values generated may be less representative of the original data. 14

Natural numbers The models do not discern between natural and real numbers. As a result, all generated numeric data are real numbers. For instance, smoking daily units are natural numbers in the original data, but real numbers in the synthetic dataset. Another consequence is that negative numbers are created in the synthetic dataset, even when the data type in the original dataset has no negative values. 15

5. Dataset generator 5.1. Synthetic data generation process The generator uses the model of a healthcare organization, as described in Section 4.1.1, to assign the number of patients and practitioners to a synthetically generated organization. This way, the size of a generated dataset is determined by the number of organizations that are required to be generated (numorganizations). Figure 5.1 depicts the consecutive steps to construct synthetic healthcare information. After an organization is created, it is assigned a set of patients and practitioners. Further on, to each patient p is assigned a number of treatments. For each treatment of patient p, a number of examinations is generated distributed over the time of treatment duration. Finally, for each examination, a practitioner is randomly assigned from the set of practitioners of the organization and, based on the model of the examination, a number of observations types with their corresponding values. 1. Organization 2. 3. 4. 5. 6. Practitioners Patients Treatment Examinations Observations Figure 5.1.: Steps for generating a synthetic dataset with only one organization. The patients and the organizations created by Portavita Benchmark are FHIR resources, while the rest, examinations and observations, are CDAs. Both CDA and FHIR are XML formats. 5.2. Validation In order to validate that the dataset generator produces meaningful healthcare information, we compare synthetic data with real healthcare data. 16

Figure 5.2.: Discrete real data Figure 5.3.: Discrete synthetic data Taking into account the way in which separate Bayesian Networks are used to model dependencies between variables, we can expect the following structure in the synthetic data: Correlation between continuous variables in the same examination instance. Refer to Section 4.1.4 for a definition of an examination and Appendix B for lists of observations that occur within an examination. Correlation between discrete variables in the same examination instance. Percentage of missing values and correlation between occurrences of missing values. The synthetic dataset generator does not create the full statistical structure that is present in the real data. See section 4.3 for a discussion about limitations of the models. Two kinds of correlations that are not present are worth mentioning: Correlation between different instances of examinations. This means that a patient today can have a irreversible complication such as retinopathy, which is not present during the following examination. A consequence is that aggregation will cause loss of correlation between variables. For instance, there will be no correlation between yearly averages of systolic and diastolic bloodpressures. Correlation between discrete and continuous variables, such as the discrete value smoking y/n and the numeric variable amount of daily smoking units. Since the bayesian network toolkit we used does not support hybrid networks, separate networks for discrete and continuous variables are used to model dependencies, as described in Section 4.2.4. Comparison of the real and synthetic dataset is done in four ways, as described in the next two sections. We compare the real data with synthetic data using a single variable for discrete and continuous attributes. We also compare the data sets for dependencies between two variables. We use Orange v2.7 [5] to analyze and visualize the data. 5.2.1. Single variable The histograms shown in figures 5.2 and 5.3 show the frequency of values for the discrete variable wellbeing of the real and synthetic dataset. Visual inspection reveals that the distribution of values is similar. 17

Figure 5.4.: Continuous real data Figure 5.5.: Continuous synthetic data Figure 5.6.: Discrete real data Figure 5.7.: Discrete synthetic data For continuous variables, figures 5.4 and 5.5 show box plots of the blood pressure on real and synthetic data. We can see that the statistical mean and standard deviation for the synthetic and real data is similar. The box plots also reveal a difference in the number of distinct values. As described in section 4.3, the generator makes no distinction between natural and real numbers and treats all numeric data as real numbers. As a consequence, almost every value in the synthetic dataset is unique, whereas observations with a natural number domain in the real dataset contain less distinct values. 5.2.2. Multi-variable The mosaic diagrams 5.6 and 5.7 give insight into co-occurrences of pairs of values for the discrete attributes exercise and wellbeing. The size of the area indicates the number of samples with the corresponding values in the the dataset. Both graphs show a similar structure for all combinations of values. For continuous variables, correlation between variables is shown using scatterplot. Figures 5.8 and 5.9 show the correlation between systolic and diastolic blood pressure. 18

Figure 5.8.: Continuous real data Figure 5.9.: Continuous synthetic data Figure 5.10.: Imputed missing real data Figure 5.11.: Imputed missing synth data Finally we consider percentage and co-occurence of missing values for continuous data. 1 To visualize missing values, we impute missing values for systolic and diastolic blood pressure with the value 300. Figures 5.10 and 5.11 compare frequencies of the missing values for the real and synthetic dataset for 330 samples. We can see that both datasets show a comparable amount of missing values for the systolic blood pressure. Besides amount of missing data, co-occurences of missing values in the real dataset should also be reflected in the synthetic data. Again we use data with missing values imputed to value 300. Figures 5.10 and 5.11 show systolic bloodpressure plotted against diastolic blood pressure. The presence of only a dot at point 300,300 in the graph, but no other dots on the x = 300 or y = 300 line, indicate that a missing value for systolic bloodpressure is always matched with a missing value for diastolic blood pressure, in both the real and synthetic dataset. 1 For discrete data, missing values are modeled with an additional nominal in the value domain, hence require no additional validation. 19

Figure 5.12.: Co-missing real data Figure 5.13.: Co-missing synthetic data 5.3. Performance evaluation To perform any operation on the dataset created by the generator, it is necessary to transform and store CDA/FHIR documents in a relational database. To this end, Portavita has delivered two versions of the database generator. This section describes architectural differences between the two versions, and provides results on tests that compare the speed of data generation. 5.3.1. Portavita Benchmark v1 As shown in Figure 5.14, Portavita Benchmark v1 consists of the following components: 1. Clinical Document Architecture and FHIR XML message generator (genxml) 2. Message converter from XML to SQL (xml2sql) 3. Loading SQL documents in a staging database (sql2db) 4. Update statistics used by the PostgreSQL planner to determine the most efficient way to execute a query (vacuumanalyze) 5. Transformation of staging data to dimensional warehouse format (transform2dimentional) 6. Loading the dimensional warehouse format to the final database. (copy2dwh) CDA/FHIR-generator (GENXML) XML2SQL SQL2DB Transform* Dimensional Data Warehouse Files on disk In DB *Transform to dimensional Data Warehouse Figure 5.14.: The components involved in the batch processing architecture of Portavita Benchmark v1. 20

The v1 database generator operates in batches; first all XML documents are created on the file system in step 1. Then the message converter reads the XML files and produces SQL scripts that can be run against a database, and so forth until the last step, that copies data from the staging to the final database. The batch-wise approach has the following drawbacks: Some of the steps are single threaded. The benchmark results show that the sequential steps dominate the generation time of large data sets. The larger batches are, the harder it gets to keep the state between various batches. For instance, using /tmp as storage for temporal steps will cause problems if /tmp is on the root filesystem with limited space. 5.3.2. Portavita Benchmark v1 Performance evaluation We measured the time it takes to create datasets of different sizes. These tests were performed on the AXLE Manchester server with PostgreSQL 9.5 development version from December 11th 2014. The specifications of this server are: 8 x 8 Intel(R) Xeon(R) CPU E5-4620 @ 2.20GHz 256GB RAM Each data point represents the average result of at least two executions with the same parameters. Figure 5.15.: Performance of Portavita Benchmark v1 grouped by component. Time required to create 1GB of synthetic data for various dataset sizes. 21

Figure 5.15 shows the time it takes to create 1 GB in the database for different scalings of 100.000, 500.000, 2.500.000, 5.000.000 and 10.000.000 XML documents respectively. The various database scalings are in conformity with the requests made by AXLE partners during internal mailing-list discussions. As Figure 5.15 shows, genxml and vacuumanalyze are the lowest resource-intensive components. In particular, the XML generator genxml requires ca. 3 seconds per GB, which translates to over 1TB/hour of XML data. The most CPU time is spent on tasks 2, 3 and 5. Portavita Benchmark v2 focuses on improving the performance of these tasks, as we will see in the next section. 5.3.3. Portavita Benchmark v2 The design of the Portavita Benchmark v2 was focused on improving the shortcomings of v1 as follows: 1. Redesign architecture from a batch-oriented to a near-real-time streaming architecture using micro-batches. The purpose-built multidimensional star schema model from v1 was removed; in v2 the HL7v3 RIM model is used directly as the source atomic data of the data warehouse. This resulted in the elimination of the sequential step 5 transform2dimensional. 2. Steps 2 and 3 were already parallelized on a single node using GNU parallel. We analyzed the performance of each component and mitigated performance bottlenecks. In addition, we added a scale-out option for steps 2 and 3, to also go beyond single node performance. CDA/FHIR-generator (GENXML) XML2SQL XML2SQL XML2SQL SQL2DB SQL2DB SQL2DB Data Lake Figure 5.16.: Portavita Benchmark v2. Micro-batch architecture. Together these two design decisions should lead to faster creation of synthetic databases. Nonetheless, scale-out of the system also introduces a new component, the RabbitMQ message broker, and with the broker new configuration and flow control options, that require configuration and monitoring to reach maximum throughput. 5.3.4. Portavita Benchmark v2 Performance evaluation In Portavita Benchmark v2 we use a different way to configure the amount of resulting data generated. Unlike v1, where we specify the number of documents we wish to generate, in v2 we set the number of organizations, which ultimately 22

determines the number of generated documents, hence the database size. Figure 5.17 shows the performance of the Portavita Benchmark v2 in terms of time it takes to generate 1 GB of data for databases of various sizes. The size of the databases is shown by both, the number of organizations and the number of documents generated. The graph shows that the generation rate is about 11 hours/tb of data. 45 Total number of documents 1,248,261 2,470,915 4,991,004 40 35 Time(seconds) / 1GB 30 25 20 15 10 5 0 5 10 20 Number of organizations Figure 5.17.: Performance of Portavita Benchmark v2. Time required to generate 1GB of data. Since Portavita Benchmark v2 is a streaming architecture based on micro-batches, it is not possible to measure the processing time of each step, like was done for v1. Moreover, as the database format was changed in v2, we can only compare the database generation rate based on the number of documents loaded into the final database per second. Figure 5.18 shows the performance of both v1 and v2 on the same single-node server. The data generation rate of v2 is more than twice higher than that of v1. 350 300 v1 v2 250 Documents/second 200 150 100 50 0 0 2,000,000 4,000,000 6,000,000 8,000,000 10,000,000 Total number of documents Figure 5.18.: Comparative performance evaluation. Amount of documents loaded per second for v1 and v2. 23

5.3.5. Conclusions on performance The core of the generator of the AXLE synthetic dataset is the XML generator. It is this generator that is most useful as benchmarking tool for software related to the exchange of healthcare data, since it emits HL7v3 Clinical Document Architecture (CDA) XML and FHIR messages. The XML generator speed is over 1TB/hour. With additional transformation to database format and additional processing, generation speed is 1TB/11 hours on a single-node server. 24

6. Benchmark queries We provide a number of queries for secondary use, i.e., for reporting and analytics purposes. The full document describing the queries has been delivered in June 2014, whereas the queries sources are included in the github repository of AXLE Healthcare Benchmark [1]. 6.1. Queries overview Cross-Organization comparative analysis This query compares all the organizations based on the percentage of the patients who have had one of the following examinations in the last year: Fundus checkup, Foot checkup, Intermediary checkup, Risk inventory, Diabetes medication, Dietary advice. Cross-practitioner comparative analysis This query is quite similar to the previous query in that compares on the same indicators, with the only difference that this query focuses on the performance of each practitioner within an organization. Deviation of organization performance This query reports on organization average values with respect to a number of important patient measurements, such as glucose level, blood pressure and so on. These measurements give a highlevel overview of how well the patients within an organization are doing. Extreme values This query shows all patients with observation values that have been classified as extreme. Such patients are shown with some additional data such as gender, age, most recent HbA1c, triglyceride and blood pressure. Relative extreme values The definition of an extreme value is often determined by national benchmarks. But there are local patient populations with local averages that deviate considerably from the national average. Considering that the organizations are geographically distributed across The Netherlands, this query reports on the patients that have extreme values compared to the average observation values within the organization where these patients are treated. Abnormal blood pressure or macroangiopathy This query retrieves all patients that currently either have an abnormal blood pressure (highly related to the age of the patient) or macroangiopathy. Typically, these patients have an increased risk of developing new complications and are monitored closely. Data Analysis Influence of medication on HbA1c HbA1c is a type of blood value that shows glucose levels over longer periods of time. This query shows 25

for all patients and for all of their medications the average HbA1c one year before the patient started the initial medication, the average HbA1c in between the initial medication and the new medication, and the average HbA1c one year after the new medication. Trend Analysis Trends in the process This query aims to gain insight into how often examinations are performed across various organizations and the trend. So, for every organization, type of examination, and time period, this query shows the average number of times that the examination was performed per patient in that period in that organization. Six periods of three months each are defined starting from the current date. Trend Analysis Trends in smoking This query shows the trends in smoking per organization. Namely, in the last 6 periods of half a year, this query reports on the number of active patients in the organization, how many of them smoked in that period, and how many ceased smoking in that period. 26

7. Conclusions In this document we presented Portavita Benchmark, a dataset generator specific for healthcare. The generated data are based on models built upon real health records, and comply with the exchangeable HL7 formats, namely CDA and FHIR (XMLbased). Portavita Benchmark borrows libraries from MGRID in order to efficiently transform and store the generated data in a PostgreSQL DBMS. The synthetic clinical data includes examinations and observations which are assigned to synthetic patients in the context of a diabetes, COPD or CVRM treatment. The validation of Portavita Benchmark showed that observation values, correlations among various observations, and occurence of missing values in the synthetic dataset resemble the original data. We showed that the process of XML data generation is relatively fast, namely 1TB/hour, compared to the subsequent transformation and storage processes, that is, 1TB in about 11 hours on a single-node server. 27

Appendix 28

Appendix A. Clinical Document Architecture The following xml snippet depicts an example of a CDA document. The first part of the CDA consists of contextual information, such as the code to identify the type of document (code), the time it was issued (effectivetime), the classification level of the document (confidentialitycode), the patient reference (recordtarget) and information about the person who authored, entered and/or authenticated the document. In this case all three roles are covered by the same person. The second part of the CDA starts with the component structuredbody and has information about the observations performed. In this case, the element organizer contains more contextual information about the observations, such as time and performer, and an additional nested organizer with display name Blood pressure. This organizer contains two elements Observation that have values for the systolic and diastolic measurements. <?xml version="1.0" encoding="utf-8"?> <ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/xmlschema-instance"> <typeid root="2.16.840.1.113883.1.3" extension="pocd_hd000040"/> <id root="2.16.840.1.113883.2.4.3.31.3.1" extension="1aa415ca-61ed-4498-9665-d1523a7477d3"/> <code code="68608-9" codesystem="2.16.840.1.113883.6.1" codesystemname="loinc" displayname="summarization note"/> <title>phr Update</title> <effectivetime value="20130709150524"/> <confidentialitycode code="n" codesystem="2.16.840.1.113883.5.25" codesystemname="confidentiality" displayname="normal"/> <recordtarget> <patientrole> <id nullflavor="unk"/> <patient> <id root="2.16.840.1.113883.2.4.3.31.3.2" extension="71738452"/> </patient> <providerorganization> <id root="2.16.840.1.113883.2.4.3.31.3.2" extension="206605404"/> </providerorganization> </patientrole> </recordtarget> <author> <time value="20130709150524"/> <assignedauthor> <id root="2.16.840.1.113883.2.4.3.31.3.2" extension="-1774162843"/> <representedorganization> <id root="2.16.840.1.113883.2.4.3.31.3.2" extension="206605404"/> </representedorganization> </assignedauthor> </author> <dataenterer> <time value="20130709150524"/> <assignedentity> <id root="2.16.840.1.113883.2.4.3.31.3.2" extension="-937850301"/> <representedorganization> <id root="2.16.840.1.113883.2.4.3.31.3.2" extension="206605404"/> 29

</representedorganization> </assignedentity> </dataenterer> <legalauthenticator> <time value="20130709150524"/> <signaturecode code="s"/> <assignedentity> <id root="2.16.840.1.113883.2.4.3.31.3.2" extension="667284500"/> <representedorganization> <id root="2.16.840.1.113883.2.4.3.31.3.2" extension="206605404"/> </representedorganization> </assignedentity> </legalauthenticator> <documentationof> <serviceevent classcode="pcpr"> <id root="2.16.840.1.113883.2.4.3.31.3.1" extension="1894524026"/> <code code="17074200" codesystem="2.16.840.1.113883.6.96" codesystemname="snomed-ct" displayname="diabetes treatment"/> </serviceevent> </documentationof> <component> <structuredbody> <component> <section> <entry> <organizer classcode="battery" moodcode="evn"> <id root="2.16.840.1.113883.2.4.3.31.3.1" extension="2096295173"/> <code code="portavita1234" codesystem="2.16.840.1.113883.2.4.3.31.2.1" codesystemname="portavita" displayname="self-check CVRM"/> <statuscode code="completed"/> <effectivetime> <low value="20120504055554"/> </effectivetime> <performer typecode="prf"> <assignedentity> <id root="2.16.840.1.113883.2.4.3.31.3.2" extension="1236771654"/> <representedorganization> <id root="2.16.840.1.113883.2.4.3.31.3.2" extension="206605404"/> </representedorganization> </assignedentity> </performer> <component> <organizer classcode="battery" moodcode="evn"> <id root="2.16.840.1.113883.2.4.3.31.3.1" extension="2096295172"/> <code code="portavita1235" codesystem="2.16.840.1.113883.2.4.3.31.2.1" codesystemname="portavita" displayname="blood pressure Self measurement"/> <statuscode code="completed"/> <effectivetime> <low value="20120504055554"/> </effectivetime> <component> <observation classcode="obs" moodcode="evn"> <id root="2.16.840.1.113883.2.4.3.31.3.1" extension="2096295170"/> <code code="portavita1236" codesystem="2.16.840.1.113883.2.4.3.31.2.1" codesystemname="portavita" displayname="systolic BP, self measurement"/> <statuscode code="completed"/> <effectivetime> <low value="20120504055554"/> </effectivetime> <value value="130.63331049967894" unit="mm Hg" xsi:type="pq"/> </observation> </component> <component> <observation classcode="obs" moodcode="evn"> <id root="2.16.840.1.113883.2.4.3.31.3.1" extension="2096295171"/> <code code="portavita1237" codesystem="2.16.840.1.113883.2.4.3.31.2.1" codesystemname="portavita" displayname="diastolic BP, self measurement"/> <statuscode code="completed"/> <effectivetime> <low value="20120504055554"/> </effectivetime> <value value="68.57299120677533" unit="mm Hg" xsi:type="pq"/> </observation> 30

</component> </organizer> </component> </organizer> </entry> </section> </component> </structuredbody> </component> </ClinicalDocument> 31

Appendix B. List of Observations per Examination Below follows the list of examinations and their observations generated by Portavita Benchmark. The examinations are denoted with ===== and a.json suffix and have a corresponding human-readable display name. Examinations are composed by a number of observations which are listed after the examination name. The observations have a code which is preceded by the corresponding data type, which is either PQ (physical quantity continous) or CD (coded description discrete). Examination/Observation code displayname --------------------------------+-------------------------------------------------------------------------------------------- ===== 12133-5.json ===== Systolic flow/diastolic flow:velrto:pt:cerebral artery anterior^fetus:qn:us.doppler pq_8462.4 Intravascular diastolic:pres:pt:arterial system:qn: pq_8480.6 Intravascular systolic:pres:pt:arterial system:qn: ===== 127783003.json ===== Spirometry cd_portavita1327 Interpretation pq_313232000 Peak expiratory flow rate after bronchodilation pq_313276007 Peak expiratory flow rate before bronchodilation pq_401012008 FEV1 before bronchodilation pq_401013003 FEV1 after bronchodilation pq_407561008 Forced vital capacity (FVC) after bronchodilation pq_407602006 Forced expiratory volume 1 (FEV1)/ forced vital capacity (FVC) ratio before bronchodilator pq_407603001 Forced expiratory volume 1 (FEV1)/ forced vital capacity (FVC) ratio after bronchodilator pq_portavita534 FEV1 Pre (% of predicted) pq_portavita535 FEV1 Post (% of predicted) pq_portavita536 Reversibility FEV1 (%) pq_portavita537 FVC Pre (Absolute) pq_portavita538 FVC Pre (% of predicted) pq_portavita539 FVC Post (% of predicted) pq_portavita540 PEF Pre (% of predicted) pq_portavita541 PEF Post (% of predicted) pq_portavita542 TLC Pre (Absolute) pq_portavita543 TLC Pre (% of predicted) ===== 164847006.json ===== Standard ECG cd_271921002 ECG finding ===== 170744004.json ===== Follow-up diabetic assessment cd_129863004 Deficient knowledge of dietary regimen cd_237635002 Nocturnal hypoglycaemia cd_302866003 Hypoglycaemia cd_361137007 Irregular heart beat cd_365275006 General well-being finding cd_portavita1342 Antihypertensives cd_portavita1343 Diuretics cd_portavita1344 Beta-blockers cd_portavita1345 Calcium antagonists cd_portavita1346 Drugs affecting the renin-angiotensin system cd_portavita1347 Alpha-blockers cd_portavita1348 Other antihypertensives cd_portavita1349 Blood-thinning drugs cd_portavita1350 Platelet aggregation inhibitors cd_portavita1351 Anticoagulants cd_portavita1352 Other blood-thinning drugs cd_portavita1353 Lipid-lowering drugs cd_portavita1354 Statins cd_portavita1355 Other lipid-lowering drugs cd_portavita1428 Extra attention for Individual care plan cd_portavita24 Fasting hypos cd_portavita25 Hypos after breakfast cd_portavita26 Hypos before lunch cd_portavita27 Hypos after lunch cd_portavita28 Hypos before dinner cd_portavita29 Hypos after dinner cd_portavita30 Hypos before bedtime cd_portavita34 Therapy compliance cd_portavita38 Dietary advice problems cd_portavita39 Insufficient application of guidelines cd_portavita648 Diabetes medication pq_170749009 Frequency of hypoglycaemia attacks 32

pq_27113001 Body weight pq_364075005 Heart rate pq_365811003 Glucose level - finding pq_50373000 Body height measure pq_60621009 Body mass index pq_8462.4 Intravascular diastolic:pres:pt:arterial system:qn: pq_8480.6 Intravascular systolic:pres:pt:arterial system:qn: pq_portavita175 Fasting blood glucose pq_portavita176 Blood glucose after breakfast pq_portavita177 Blood glucose before lunch pq_portavita178 Blood glucose after lunch pq_portavita179 Blood glucose before dinner pq_portavita180 Blood glucose after dinner pq_portavita181 Blood glucose before bedtime pq_portavita182 Nighttime blood glucose ===== 170757007.json ===== Fundoscopy - diabetic check cd_portavita220 Assessment of fundus image ===== 170777000.json ===== Diabetic annual review cd_106070007 Cardiac auscultation finding cd_129863004 Deficient knowledge of dietary regimen cd_162274004 Visual symptoms cd_207057006 [D]Shortness of breath cd_207260005 [D]Other specified symptoms cd_219006 Current drinker cd_22298006 Myocardial infarction cd_228450008 Time spent exercising cd_230690007 Cerebrovascular accident cd_237635002 Nocturnal hypoglycaemia cd_249475006 Thirst symptom cd_266257000 Transient cerebral ischaemia cd_28442001 Polyuria cd_29857009 Chest pain cd_302866003 Hypoglycaemia cd_30782001 Diastolic murmur cd_309597007 Foot abnormality - diabetes-related cd_312975006 Microalbuminuria cd_31574009 Systolic murmur cd_32738000 Pruritus cd_361137007 Irregular heart beat cd_365275006 General well-being finding cd_365980008 Tobacco use and exposure - finding cd_367416001 Angina pectoris cd_370992007 Dyslipidaemia cd_38341003 Hypertensive disorder cd_386137000 Tortuous coronary artery cd_400047006 Peripheral vascular disease cd_401207004 Medication side effects present cd_40733004 Infectious disease cd_52311001 Homocystinaemia cd_82184000 Aortic bruit cd_84114007 Heart failure cd_portavita10 Carotid aorta right cd_portavita11 Renal aorta left cd_portavita12 Renal aorta right cd_portavita13 Femoral aorta left cd_portavita1342 Antihypertensives cd_portavita1343 Diuretics cd_portavita1344 Beta-blockers cd_portavita1345 Calcium antagonists cd_portavita1346 Drugs affecting the renin-angiotensin system cd_portavita1347 Alpha-blockers cd_portavita1348 Other antihypertensives cd_portavita1349 Blood-thinning drugs cd_portavita1350 Platelet aggregation inhibitors cd_portavita1351 Anticoagulants cd_portavita1352 Other blood-thinning drugs cd_portavita1353 Lipid-lowering drugs cd_portavita1354 Statins cd_portavita1355 Other lipid-lowering drugs cd_portavita14 Femoral aorta right cd_portavita1428 Extra attention for Individual care plan cd_portavita24 Fasting hypos cd_portavita25 Hypos after breakfast cd_portavita26 Hypos before lunch cd_portavita27 Hypos after lunch cd_portavita28 Hypos before dinner cd_portavita29 Hypos after dinner cd_portavita30 Hypos before bedtime cd_portavita308 Assessment of ophthalmic examination cd_portavita34 Therapy compliance cd_portavita38 Dietary advice problems cd_portavita39 Insufficient application of guidelines cd_portavita438 Motivation to stop smoking cd_portavita439 Stop smoking advice given cd_portavita44 Symptoms indicating hypoglycemia cd_portavita440 Follow-up appointment made cd_portavita48 Decrease of physical capacity cd_portavita5 Auscultation indicated cd_portavita50 Pain in calves when walking cd_portavita52 Pain or tingling in legs cd_portavita54 Sexual dysfunction disorders cd_portavita6 First sound cd_portavita61 Hypoglycemia recognition cd_portavita63 Patient uses caffeine cd_portavita64 Products with glycyrrhizic acid cd_portavita648 Diabetes medication cd_portavita68 Diabetes in first-degree or second-degree relatives 33

cd_portavita69 Lipid metabolism disorder in first-degree relatives cd_portavita7 Second sound cd_portavita70 Hypertension in first-degree relatives cd_portavita71 Cardiovascular diseases in first-degree relatives cd_portavita8 Auscultation of arteries performed cd_portavita9 Carotid aorta left pq_160573003 Alcohol intake pq_170749009 Frequency of hypoglycaemia attacks pq_266918002 Tobacco smoking consumption pq_27113001 Body weight pq_364075005 Heart rate pq_365811003 Glucose level - finding pq_396552003 Abdominal circumference pq_50373000 Body height measure pq_60621009 Body mass index pq_8462.4 Intravascular diastolic:pres:pt:arterial system:qn: pq_8480.6 Intravascular systolic:pres:pt:arterial system:qn: pq_portavita175 Fasting blood glucose pq_portavita176 Blood glucose after breakfast pq_portavita177 Blood glucose before lunch pq_portavita178 Blood glucose after lunch pq_portavita179 Blood glucose before dinner pq_portavita180 Blood glucose after dinner pq_portavita181 Blood glucose before bedtime pq_portavita182 Nighttime blood glucose ===== 183056000.json ===== Patient advised about diabetic diet cd_129863004 Deficient knowledge of dietary regimen cd_365275006 General well-being finding cd_portavita1428 Extra attention for Individual care plan cd_portavita34 Therapy compliance cd_portavita38 Dietary advice problems cd_portavita39 Insufficient application of guidelines pq_27113001 Body weight pq_396552003 Abdominal circumference pq_50373000 Body height measure pq_60621009 Body mass index ===== 27113001.json ===== Body weight pq_27113001 Body weight ===== 282294001.json ===== Laboratory test finding pq_102737005 HDL cholesterol pq_102739008 LDL cholesterol pq_103232008 HbA>1c< pq_166842003 Total cholesterol:hdl ratio measurement pq_250745003 Albumin/creatinine ratio measurement pq_26091008 Aspartate aminotransferase pq_271000000 Urine albumin measurement pq_275788007 Sodium in sample pq_275789004 Potassium in sample pq_275792000 Creatinine in sample pq_275795003 Albumin in sample pq_38082009 Haemoglobin pq_52302001 Glucose measurement, fasting pq_56935002 Alanine aminotransferase pq_60153001 gamma-glutamyltransferase pq_75828004 Creatine kinase pq_84698008 Cholesterol pq_85600001 Triacylglycerol pq_8879006 Creatinine measurement, 24 hour urine pq_portavita1338 PTH pq_portavita1339 Vitamin D pq_portavita1340 MCV pq_portavita1356 Calcium pq_portavita1357 Phosphate pq_portavita189 Creatinine clearance (Cockcroft) pq_portavita190 Non-fasting blood glucose pq_portavita191 Creatinine clearance (24-hour urine) pq_portavita304 Creatinine clearance (MDRD) pq_portavita845 BNP ===== 396552003.json ===== Abdominal circumference pq_396552003 Abdominal circumference ===== 401191002.json ===== Diabetic foot examination cd_122480009 Hallux valgus cd_201251005 Neuropathic diabetic ulcer - foot cd_249802001 Pes cavus cd_268068002 Ankle and/or foot joint stiffness cd_275520000 Claudication cd_299653001 Amputated foot cd_403059006 Onychomycosis of toenails cd_53226007 Pes planus cd_86380000 Acquired claw toes cd_portavita107 Muscle cramp in calves when lying down cd_portavita109 Skin defect and/or infection cd_portavita110 Autonomic neuropathy cd_portavita111 Tylosis or clavus cd_portavita114 Pressure sores cd_portavita116 Purple discoloration cd_portavita118 Temperature difference cd_portavita120 Posterior tibial artery cd_portavita121 Dorsalis pedis artery cd_portavita123 Superficial sensitivity disorder cd_portavita131 Deep sensitivity disorders cd_portavita1326 Peripheral Arterial Disease (PAD) cd_portavita712 SIMMS classification ===== 401221002.json ===== Ankle brachial pressure index - ABPI pq_portavita714 Systolic blood pressure left ankle pq_portavita715 Systolic blood pressure right ankle pq_portavita716 Systolic blood pressure left arm 34

pq_portavita717 Systolic blood pressure right arm pq_portavita718 ABPI left pq_portavita719 ABPI right ===== 50373000.json ===== Body height measure pq_50373000 Body height measure ===== 77386006.json ===== Patient currently pregnant cd_77386006 Patient currently pregnant ===== BATT204006.json ===== Smoking cd_365980008 Tobacco use and exposure - finding cd_portavita438 Motivation to stop smoking cd_portavita439 Stop smoking advice given cd_portavita440 Follow-up appointment made pq_266918002 Tobacco smoking consumption pq_portavita435 Smoking history: number of years pq_portavita436 Smoking history: units per day pq_portavita437 Cessation attempts pq_portavita457 Pack years ===== Portavita1224.json ===== Inhalation technique cd_portavita466 Inhalation technique examined cd_portavita467 Problems with inhalation technique ===== Portavita1232.json ===== Diagnosis (Diabetes) cd_portavita1232 Diagnosis (Diabetes) ===== Portavita1234.json ===== Self-check (CVRM) ===== Portavita136.json ===== Self-check (Diabetes) cd_169449001 Trying to conceive cd_309597007 Foot abnormality - diabetes-related cd_365275006 General well-being finding cd_77386006 Patient currently pregnant cd_portavita34 Therapy compliance cd_portavita38 Dietary advice problems pq_27113001 Body weight pq_60621009 Body mass index pq_8462.4 Intravascular diastolic:pres:pt:arterial system:qn: pq_8480.6 Intravascular systolic:pres:pt:arterial system:qn: ===== Portavita140.json ===== Risk inventory (Diabetes) cd_106070007 Cardiac auscultation finding cd_108365000 Infection of skin cd_162274004 Visual symptoms cd_169449001 Trying to conceive cd_207057006 [D]Shortness of breath cd_207260005 [D]Other specified symptoms cd_219006 Current drinker cd_22298006 Myocardial infarction cd_228450008 Time spent exercising cd_230690007 Cerebrovascular accident cd_249475006 Thirst symptom cd_266257000 Transient cerebral ischaemia cd_267036007 Dyspnoea cd_278542003 Dental appliance or restoration finding cd_279333002 Pruritus of skin cd_28442001 Polyuria cd_29857009 Chest pain cd_30782001 Diastolic murmur cd_309597007 Foot abnormality - diabetes-related cd_312975006 Microalbuminuria cd_31574009 Systolic murmur cd_32738000 Pruritus cd_361137007 Irregular heart beat cd_365275006 General well-being finding cd_365980008 Tobacco use and exposure - finding cd_367416001 Angina pectoris cd_370992007 Dyslipidaemia cd_38341003 Hypertensive disorder cd_386137000 Tortuous coronary artery cd_400047006 Peripheral vascular disease cd_401207004 Medication side effects present cd_402599005 Acanthosis nigricans cd_40733004 Infectious disease cd_52311001 Homocystinaemia cd_56727007 Vitiligo cd_77386006 Patient currently pregnant cd_82184000 Aortic bruit cd_84114007 Heart failure cd_portavita10 Carotid aorta right cd_portavita11 Renal aorta left cd_portavita12 Renal aorta right cd_portavita1232 Diagnosis (Diabetes) cd_portavita13 Femoral aorta left cd_portavita14 Femoral aorta right cd_portavita1428 Extra attention for Individual care plan cd_portavita438 Motivation to stop smoking cd_portavita439 Stop smoking advice given cd_portavita44 Symptoms indicating hypoglycemia cd_portavita440 Follow-up appointment made cd_portavita48 Decrease of physical capacity cd_portavita5 Auscultation indicated cd_portavita50 Pain in calves when walking cd_portavita52 Pain or tingling in legs cd_portavita54 Sexual dysfunction disorders cd_portavita6 First sound cd_portavita63 Patient uses caffeine cd_portavita64 Products with glycyrrhizic acid cd_portavita68 Diabetes in first-degree or second-degree relatives cd_portavita69 Lipid metabolism disorder in first-degree relatives cd_portavita7 Second sound cd_portavita70 Hypertension in first-degree relatives cd_portavita71 Cardiovascular diseases in first-degree relatives 35

cd_portavita8 Auscultation of arteries performed cd_portavita9 Carotid aorta left pq_160573003 Alcohol intake pq_266918002 Tobacco smoking consumption pq_27113001 Body weight pq_364075005 Heart rate pq_396552003 Abdominal circumference pq_50373000 Body height measure pq_60621009 Body mass index pq_8462.4 Intravascular diastolic:pres:pt:arterial system:qn: pq_8480.6 Intravascular systolic:pres:pt:arterial system:qn: ===== Portavita141.json ===== Risk factors cd_22298006 Myocardial infarction cd_230690007 Cerebrovascular accident cd_266257000 Transient cerebral ischaemia cd_312975006 Microalbuminuria cd_367416001 Angina pectoris cd_370992007 Dyslipidaemia cd_38341003 Hypertensive disorder cd_386137000 Tortuous coronary artery cd_400047006 Peripheral vascular disease cd_52311001 Homocystinaemia cd_84114007 Heart failure ===== Portavita1422.json ===== Annual check-up (Asthma/COPD) cd_10312003 Prednisone preparation cd_162895003 O/E - accessory resp.m s.used cd_170617002 Respiratory drug side effect cd_22298006 Myocardial infarction cd_228450008 Time spent exercising cd_230690007 Cerebrovascular accident cd_266257000 Transient cerebral ischaemia cd_268929007 O/E - rhonchi present cd_28743005 Productive cough cd_301272007 Chest auscultation finding cd_301282008 Finding of respiration cd_365980008 Tobacco use and exposure - finding cd_367416001 Angina pectoris cd_3723001 Arthritis cd_38341003 Hypertensive disorder cd_386137000 Tortuous coronary artery cd_390800000 Goal achievement finding cd_400047006 Peripheral vascular disease cd_41006004 Depression cd_414059009 Drug therapy compliance finding cd_417523004 Loss of interest in previously enjoyable activity cd_419597003 Respiratory corticosteroid cd_48694002 Anxiety cd_64859006 Osteoporosis cd_66493003 Theophylline cd_69896004 Rheumatoid arthritis cd_73211009 Diabetes mellitus cd_78275009 Obstructive sleep apnoea syndrome cd_79015004 Worried cd_79042003 Crepitation cd_84114007 Heart failure cd_portavita1428 Extra attention for Individual care plan cd_portavita359 Undesirable weight loss cd_portavita400 Problems in coughing up slime cd_portavita427 More complaints after exposure to work/hobby cd_portavita438 Motivation to stop smoking cd_portavita439 Stop smoking advice given cd_portavita440 Follow-up appointment made cd_portavita459 Short acting b2-mimetic cd_portavita460 Long-acting b2-mimetic cd_portavita461 Short-acting anticholinergics cd_portavita462 Long-acting anticholinergics cd_portavita463 LTRCA cd_portavita466 Inhalation technique examined cd_portavita467 Problems with inhalation technique cd_portavita565 Barrel chest cd_portavita713 Combination therapy cd_portavita857 Impairment due to COPD pq_266918002 Tobacco smoking consumption pq_27113001 Body weight pq_50373000 Body height measure pq_60621009 Body mass index pq_86290005 Respiratory rate pq_portavita360 Fat Free Mass Index (FFMI) pq_portavita435 Smoking history: number of years pq_portavita436 Smoking history: units per day pq_portavita437 Cessation attempts pq_portavita457 Pack years ===== Portavita1423.json ===== Interim check-up (Asthma/COPD) cd_10312003 Prednisone preparation cd_162895003 O/E - accessory resp.m s.used cd_170617002 Respiratory drug side effect cd_22298006 Myocardial infarction cd_228450008 Time spent exercising cd_230690007 Cerebrovascular accident cd_266257000 Transient cerebral ischaemia cd_268929007 O/E - rhonchi present cd_28743005 Productive cough cd_301272007 Chest auscultation finding cd_301282008 Finding of respiration cd_365980008 Tobacco use and exposure - finding cd_367416001 Angina pectoris cd_3723001 Arthritis 36

cd_38341003 Hypertensive disorder cd_386137000 Tortuous coronary artery cd_390800000 Goal achievement finding cd_400047006 Peripheral vascular disease cd_41006004 Depression cd_414059009 Drug therapy compliance finding cd_417523004 Loss of interest in previously enjoyable activity cd_419597003 Respiratory corticosteroid cd_48694002 Anxiety cd_64859006 Osteoporosis cd_66493003 Theophylline cd_69896004 Rheumatoid arthritis cd_73211009 Diabetes mellitus cd_78275009 Obstructive sleep apnoea syndrome cd_79015004 Worried cd_79042003 Crepitation cd_84114007 Heart failure cd_portavita1428 Extra attention for Individual care plan cd_portavita359 Undesirable weight loss cd_portavita400 Problems in coughing up slime cd_portavita427 More complaints after exposure to work/hobby cd_portavita438 Motivation to stop smoking cd_portavita439 Stop smoking advice given cd_portavita440 Follow-up appointment made cd_portavita459 Short acting b2-mimetic cd_portavita460 Long-acting b2-mimetic cd_portavita461 Short-acting anticholinergics cd_portavita462 Long-acting anticholinergics cd_portavita463 LTRCA cd_portavita466 Inhalation technique examined cd_portavita467 Problems with inhalation technique cd_portavita565 Barrel chest cd_portavita713 Combination therapy cd_portavita857 Impairment due to COPD pq_266918002 Tobacco smoking consumption pq_27113001 Body weight pq_50373000 Body height measure pq_60621009 Body mass index pq_86290005 Respiratory rate pq_portavita360 Fat Free Mass Index (FFMI) pq_portavita435 Smoking history: number of years pq_portavita436 Smoking history: units per day pq_portavita437 Cessation attempts pq_portavita457 Pack years ===== Portavita1442.json ===== Diagnosis (Asthma/COPD) ===== Portavita154.json ===== Interim check-up (Diabetes) cd_182838006 Change of medication cd_274785000 Examination of blood pressure cd_299478007 Foot problem cd_316360006 [V]Other reasons for encounter cd_33747003 Glucose measurement, blood cd_361137007 Irregular heart beat cd_54777007 Deficient knowledge cd_78164000 Feeding problem cd_portavita1342 Antihypertensives cd_portavita1343 Diuretics cd_portavita1344 Beta-blockers cd_portavita1345 Calcium antagonists cd_portavita1346 Drugs affecting the renin-angiotensin system cd_portavita1347 Alpha-blockers cd_portavita1348 Other antihypertensives cd_portavita1349 Blood-thinning drugs cd_portavita1350 Platelet aggregation inhibitors cd_portavita1351 Anticoagulants cd_portavita1352 Other blood-thinning drugs cd_portavita1353 Lipid-lowering drugs cd_portavita1354 Statins cd_portavita1355 Other lipid-lowering drugs cd_portavita1428 Extra attention for Individual care plan cd_portavita648 Diabetes medication pq_364075005 Heart rate pq_8462.4 Intravascular diastolic:pres:pt:arterial system:qn: pq_8480.6 Intravascular systolic:pres:pt:arterial system:qn: pq_portavita175 Fasting blood glucose pq_portavita176 Blood glucose after breakfast pq_portavita177 Blood glucose before lunch pq_portavita178 Blood glucose after lunch pq_portavita179 Blood glucose before dinner pq_portavita180 Blood glucose after dinner pq_portavita181 Blood glucose before bedtime pq_portavita182 Nighttime blood glucose ===== Portavita174.json ===== Glucose curve pq_portavita175 Fasting blood glucose pq_portavita176 Blood glucose after breakfast pq_portavita177 Blood glucose before lunch pq_portavita178 Blood glucose after lunch pq_portavita179 Blood glucose before dinner pq_portavita180 Blood glucose after dinner pq_portavita181 Blood glucose before bedtime pq_portavita182 Nighttime blood glucose ===== Portavita305.json ===== Ophthalmic examination (Diabetes) cd_portavita308 Assessment of ophthalmic examination ===== Portavita316.json ===== Asthma Control Questionnaire (ACQ) ===== Portavita323.json ===== Respiratory Illness Questionnaire-Monitoring 10 (RIQ-Mon10) cd_portavita324 Symptoms score cd_portavita325 Activities score ===== Portavita336.json ===== Clinical COPD Questionnaire (CCQ) 37

cd_portavita337 Symptoms score cd_portavita338 Functional score cd_portavita339 Mental score ===== Portavita351.json ===== Lung specialist consultation cd_portavita1333 Lung function measurement reliable cd_portavita1334 Lung function matching cd_portavita1335 Indication for referral to lung specialist ===== Portavita353.json ===== Intake/Diagnostics (Asthma/COPD) cd_10312003 Prednisone preparation cd_161524000 H/O: hay fever cd_161527007 H/O: asthma cd_161561009 H/O: eczema cd_162895003 O/E - accessory resp.m s.used cd_170617002 Respiratory drug side effect cd_21719001 Allergic rhinitis due to pollen cd_22298006 Myocardial infarction cd_228450008 Time spent exercising cd_230690007 Cerebrovascular accident cd_232347008 Dander (animal) allergy cd_266257000 Transient cerebral ischaemia cd_267036007 Dyspnoea cd_268929007 O/E - rhonchi present cd_28743005 Productive cough cd_301272007 Chest auscultation finding cd_301282008 Finding of respiration cd_365980008 Tobacco use and exposure - finding cd_367416001 Angina pectoris cd_3723001 Arthritis cd_38341003 Hypertensive disorder cd_386137000 Tortuous coronary artery cd_400047006 Peripheral vascular disease cd_41006004 Depression cd_414059009 Drug therapy compliance finding cd_417523004 Loss of interest in previously enjoyable activity cd_419597003 Respiratory corticosteroid cd_48694002 Anxiety cd_64859006 Osteoporosis cd_66493003 Theophylline cd_68154008 Chronic cough cd_69896004 Rheumatoid arthritis cd_73211009 Diabetes mellitus cd_78275009 Obstructive sleep apnoea syndrome cd_79015004 Worried cd_79042003 Crepitation cd_80313002 Palpitations cd_84114007 Heart failure cd_portavita359 Undesirable weight loss cd_portavita378 Frequent respiratory infections cd_portavita380 Complaints triggered by medication cd_portavita381 Respiratory medication discontinued in the past cd_portavita391 Pulmonary diseases of first-degree relatives cd_portavita392 Atopic disorders in first-degree relatives cd_portavita394 Frequency of shortness of breath cd_portavita396 Frequency of wheezing cd_portavita400 Problems in coughing up slime cd_portavita402 Periods without symptoms cd_portavita404 Nighttime symptoms cd_portavita409 Last van kortademigheid, piepen of hoesten bij specifieke of aspecifieke prikkels cd_portavita410 Dusty or wet environment cd_portavita414 Tobacco smoke cd_portavita416 Other non-specific stimuli (cold air, fog, baking odours, paint odours, perfume, etc.) cd_portavita419 RAST test indicated cd_portavita420 Results of previous RAST test cd_portavita422 Skin prick test indicated cd_portavita423 Results previous skin prick test cd_portavita427 More complaints after exposure to work/hobby cd_portavita438 Motivation to stop smoking cd_portavita439 Stop smoking advice given cd_portavita440 Follow-up appointment made cd_portavita441 Ethnicity cd_portavita459 Short acting b2-mimetic cd_portavita460 Long-acting b2-mimetic cd_portavita461 Short-acting anticholinergics cd_portavita462 Long-acting anticholinergics cd_portavita463 LTRCA cd_portavita466 Inhalation technique examined cd_portavita467 Problems with inhalation technique cd_portavita565 Barrel chest cd_portavita566 Wheezing cd_portavita713 Combination therapy cd_portavita857 Impairment due to COPD pq_266918002 Tobacco smoking consumption pq_27113001 Body weight pq_50373000 Body height measure pq_60621009 Body mass index pq_86290005 Respiratory rate pq_portavita360 Fat Free Mass Index (FFMI) pq_portavita435 Smoking history: number of years pq_portavita436 Smoking history: units per day pq_portavita437 Cessation attempts pq_portavita457 Pack years ===== Portavita354.json ===== Follow-up consultation (Asthma/COPD) cd_10312003 Prednisone preparation cd_162895003 O/E - accessory resp.m s.used cd_170617002 Respiratory drug side effect cd_22298006 Myocardial infarction cd_228450008 Time spent exercising 38

cd_230690007 Cerebrovascular accident cd_266257000 Transient cerebral ischaemia cd_268929007 O/E - rhonchi present cd_28743005 Productive cough cd_301272007 Chest auscultation finding cd_301282008 Finding of respiration cd_365980008 Tobacco use and exposure - finding cd_367416001 Angina pectoris cd_3723001 Arthritis cd_38341003 Hypertensive disorder cd_386137000 Tortuous coronary artery cd_390800000 Goal achievement finding cd_400047006 Peripheral vascular disease cd_41006004 Depression cd_414059009 Drug therapy compliance finding cd_417523004 Loss of interest in previously enjoyable activity cd_419597003 Respiratory corticosteroid cd_48694002 Anxiety cd_64859006 Osteoporosis cd_66493003 Theophylline cd_69896004 Rheumatoid arthritis cd_73211009 Diabetes mellitus cd_78275009 Obstructive sleep apnoea syndrome cd_79015004 Worried cd_79042003 Crepitation cd_84114007 Heart failure cd_portavita1428 Extra attention for Individual care plan cd_portavita359 Undesirable weight loss cd_portavita400 Problems in coughing up slime cd_portavita427 More complaints after exposure to work/hobby cd_portavita438 Motivation to stop smoking cd_portavita439 Stop smoking advice given cd_portavita440 Follow-up appointment made cd_portavita459 Short acting b2-mimetic cd_portavita460 Long-acting b2-mimetic cd_portavita461 Short-acting anticholinergics cd_portavita462 Long-acting anticholinergics cd_portavita463 LTRCA cd_portavita466 Inhalation technique examined cd_portavita467 Problems with inhalation technique cd_portavita565 Barrel chest cd_portavita713 Combination therapy cd_portavita857 Impairment due to COPD pq_266918002 Tobacco smoking consumption pq_27113001 Body weight pq_50373000 Body height measure pq_60621009 Body mass index pq_86290005 Respiratory rate pq_portavita360 Fat Free Mass Index (FFMI) pq_portavita435 Smoking history: number of years pq_portavita436 Smoking history: units per day pq_portavita437 Cessation attempts pq_portavita457 Pack years ===== Portavita571.json ===== Cessation of smoking cd_13432000 Nortriptyline cd_365980008 Tobacco use and exposure - finding cd_portavita438 Motivation to stop smoking cd_portavita574 Nicotine replacement therapy cd_portavita577 Smoking cessation appointment made cd_portavita581 Fear of weight gain cd_portavita582 Stress cd_portavita583 Social pressure cd_portavita584 Withdrawal symptoms cd_portavita585 Increase of respiratory complaints cd_portavita586 Previous failed cessation attempts cd_portavita587 Not the right time cd_portavita589 Follow-up appointment made cd_portavita594 Weight gain cd_portavita595 Stress cd_portavita596 Social pressure cd_portavita597 Withdrawal symptoms cd_portavita598 Increase of respiratory complaints cd_portavita843 Varenicline pq_266918002 Tobacco smoking consumption pq_portavita435 Smoking history: number of years pq_portavita436 Smoking history: units per day pq_portavita437 Cessation attempts pq_portavita457 Pack years ===== Portavita652.json ===== Clinimetry Physiotherapy cd_portavita665 Borg Dyspnea scale before the test cd_portavita666 Borg Dyspnea scale after the test cd_portavita668 Borg dyspnea severity before the test cd_portavita669 Borg dyspnea severity after the test cd_portavita671 Bode Index pq_portavita655 Predicted distance pq_portavita656 Test result for distance pq_portavita657 Distance percentage pq_portavita659 Oxygen saturation before the test pq_portavita660 Oxygen saturation after the test pq_portavita662 Heart rate before the test pq_portavita663 Heart rate after the test pq_portavita670 Oxygen saturation pq_portavita673 Predicted quadriceps force pq_portavita674 Test result for quadriceps force pq_portavita675 Quadriceps force percentage pq_portavita677 Predicted hand grip strength pq_portavita678 Test result for hand grip strength 39

pq_portavita679 Hand grip strength percentage pq_portavita681 Predicted mouth pressure ===== Portavita67.json ===== Family anamnesis (Diabetes) cd_portavita68 Diabetes in first-degree or second-degree relatives cd_portavita69 Lipid metabolism disorder in first-degree relatives cd_portavita70 Hypertension in first-degree relatives cd_portavita71 Cardiovascular diseases in first-degree relatives ===== Portavita684.json ===== Blood pressure 24 hours pq_portavita687 Systolic average pq_portavita688 Diastolic average pq_portavita690 Systolic standard deviation pq_portavita691 Diastolic standard deviation pq_portavita694 Systolic average pq_portavita695 Diastolic average pq_portavita697 Systolic standard deviation pq_portavita698 Diastolic standard deviation pq_portavita701 Systolic average pq_portavita702 Diastolic average pq_portavita704 Systolic standard deviation pq_portavita705 Diastolic standard deviation pq_portavita707 Average heart rhythm pq_portavita708 Average heart rhythm pq_portavita709 Average heart rhythm pq_portavita710 Drop in systolic blood pressure pq_portavita711 Drop in diastolic blood pressure ===== Portavita727.json ===== Annual check-up (CVRM) cd_169449001 Trying to conceive cd_207057006 [D]Shortness of breath cd_207260005 [D]Other specified symptoms cd_219006 Current drinker cd_22298006 Myocardial infarction cd_228450008 Time spent exercising cd_230690007 Cerebrovascular accident cd_266257000 Transient cerebral ischaemia cd_29857009 Chest pain cd_312975006 Microalbuminuria cd_361137007 Irregular heart beat cd_365980008 Tobacco use and exposure - finding cd_367416001 Angina pectoris cd_370992007 Dyslipidaemia cd_38341003 Hypertensive disorder cd_386137000 Tortuous coronary artery cd_400047006 Peripheral vascular disease cd_401207004 Medication side effects present cd_48194001 Pregnancy-induced hypertension cd_69896004 Rheumatoid arthritis cd_73211009 Diabetes mellitus cd_77386006 Patient currently pregnant cd_84114007 Heart failure cd_portavita1342 Antihypertensives cd_portavita1343 Diuretics cd_portavita1344 Beta-blockers cd_portavita1345 Calcium antagonists cd_portavita1346 Drugs affecting the renin-angiotensin system cd_portavita1347 Alpha-blockers cd_portavita1348 Other antihypertensives cd_portavita1349 Blood-thinning drugs cd_portavita1350 Platelet aggregation inhibitors cd_portavita1351 Anticoagulants cd_portavita1352 Other blood-thinning drugs cd_portavita1353 Lipid-lowering drugs cd_portavita1354 Statins cd_portavita1355 Other lipid-lowering drugs cd_portavita438 Motivation to stop smoking cd_portavita439 Stop smoking advice given cd_portavita440 Follow-up appointment made cd_portavita48 Decrease of physical capacity cd_portavita50 Pain in calves when walking cd_portavita54 Sexual dysfunction disorders cd_portavita69 Lipid metabolism disorder in first-degree relatives cd_portavita71 Cardiovascular diseases in first-degree relatives cd_portavita735 AAA (abdominal aortic aneurysm) in first-degree relatives cd_portavita748 Intervention cd_portavita751 Knowledge of healthy diet cd_portavita753 Insight into own diet cd_portavita755 Reason for adjustment of diet cd_portavita756 Motivation for adjustment of diet cd_portavita758 Intervention cd_portavita761 Awareness of effects of alcohol use cd_portavita763 Insight into own alcohol use cd_portavita765 Reason for adjustment of alcohol use cd_portavita766 Motivation for adjustment of alcohol use cd_portavita768 Intervention cd_portavita771 Awareness of importance of physical exercise cd_portavita773 Insight into own physical exercise cd_portavita775 Reason for adjustment of physical activity pattern cd_portavita776 Motivation for adjustment of physical activity pattern cd_portavita778 Intervention cd_portavita780 Stress symptoms more than 3 months cd_portavita783 Insight into own stress status cd_portavita785 Intervention cd_portavita788 Awareness of effects of obesity cd_portavita790 Insight into own target weight cd_portavita792 Reason for losing weight cd_portavita793 Motivation for losing weight cd_portavita795 Intervention cd_portavita798 Awareness of effects of high blood pressure 40

cd_portavita800 Insight into own target values cd_portavita802 Intervention cd_portavita805 Awareness of effects of increased cholesterol values cd_portavita807 Insight into own target values cd_portavita809 Intervention cd_portavita817 Reduced walking distance cd_portavita829 Chronic Obstructive Pulmonary Disease (COPD) pq_160573003 Alcohol intake pq_266918002 Tobacco smoking consumption pq_27113001 Body weight pq_364075005 Heart rate pq_396552003 Abdominal circumference pq_50373000 Body height measure pq_60621009 Body mass index pq_8462.4 Intravascular diastolic:pres:pt:arterial system:qn: pq_8480.6 Intravascular systolic:pres:pt:arterial system:qn: pq_portavita435 Smoking history: number of years pq_portavita436 Smoking history: units per day pq_portavita437 Cessation attempts pq_portavita457 Pack years ===== Portavita730.json ===== Family anamnesis (CVRM) cd_portavita69 Lipid metabolism disorder in first-degree relatives cd_portavita71 Cardiovascular diseases in first-degree relatives cd_portavita735 AAA (abdominal aortic aneurysm) in first-degree relatives ===== Portavita746.json ===== Interim check-up (CVRM) cd_219006 Current drinker cd_228450008 Time spent exercising cd_361137007 Irregular heart beat cd_365980008 Tobacco use and exposure - finding cd_portavita1342 Antihypertensives cd_portavita1343 Diuretics cd_portavita1344 Beta-blockers cd_portavita1345 Calcium antagonists cd_portavita1346 Drugs affecting the renin-angiotensin system cd_portavita1347 Alpha-blockers cd_portavita1348 Other antihypertensives cd_portavita1349 Blood-thinning drugs cd_portavita1350 Platelet aggregation inhibitors cd_portavita1351 Anticoagulants cd_portavita1352 Other blood-thinning drugs cd_portavita1353 Lipid-lowering drugs cd_portavita1354 Statins cd_portavita1355 Other lipid-lowering drugs cd_portavita438 Motivation to stop smoking cd_portavita439 Stop smoking advice given cd_portavita440 Follow-up appointment made cd_portavita748 Intervention cd_portavita751 Knowledge of healthy diet cd_portavita753 Insight into own diet cd_portavita755 Reason for adjustment of diet cd_portavita756 Motivation for adjustment of diet cd_portavita758 Intervention cd_portavita761 Awareness of effects of alcohol use cd_portavita763 Insight into own alcohol use cd_portavita765 Reason for adjustment of alcohol use cd_portavita766 Motivation for adjustment of alcohol use cd_portavita768 Intervention cd_portavita771 Awareness of importance of physical exercise cd_portavita773 Insight into own physical exercise cd_portavita775 Reason for adjustment of physical activity pattern cd_portavita776 Motivation for adjustment of physical activity pattern cd_portavita778 Intervention cd_portavita780 Stress symptoms more than 3 months cd_portavita783 Insight into own stress status cd_portavita785 Intervention cd_portavita788 Awareness of effects of obesity cd_portavita790 Insight into own target weight cd_portavita792 Reason for losing weight cd_portavita793 Motivation for losing weight cd_portavita795 Intervention cd_portavita798 Awareness of effects of high blood pressure cd_portavita800 Insight into own target values cd_portavita802 Intervention cd_portavita805 Awareness of effects of increased cholesterol values cd_portavita807 Insight into own target values cd_portavita809 Intervention pq_160573003 Alcohol intake pq_266918002 Tobacco smoking consumption pq_27113001 Body weight pq_364075005 Heart rate pq_396552003 Abdominal circumference pq_50373000 Body height measure pq_60621009 Body mass index pq_8462.4 Intravascular diastolic:pres:pt:arterial system:qn: pq_8480.6 Intravascular systolic:pres:pt:arterial system:qn: pq_portavita435 Smoking history: number of years pq_portavita436 Smoking history: units per day pq_portavita437 Cessation attempts pq_portavita457 Pack years ===== Portavita814.json ===== Comorbidities (CVRM) cd_22298006 Myocardial infarction cd_230690007 Cerebrovascular accident cd_266257000 Transient cerebral ischaemia cd_312975006 Microalbuminuria cd_367416001 Angina pectoris cd_370992007 Dyslipidaemia cd_38341003 Hypertensive disorder 41

cd_386137000 Tortuous coronary artery cd_400047006 Peripheral vascular disease cd_48194001 Pregnancy-induced hypertension cd_69896004 Rheumatoid arthritis cd_73211009 Diabetes mellitus cd_84114007 Heart failure cd_portavita829 Chronic Obstructive Pulmonary Disease (COPD) ===== Portavita846.json ===== Thorax cd_168734001 Standard chest X-ray abnormal (939 rows) 42

Bibliography [1] AXLE project GitHub page: AXLE Healthcare Benchmark. https://github.com/axleproject/axle-healthcare-benchmark, 2015. [2] T. Benson. Principles of health interoperability HL7 and SNOMED. Springer Science & Business Media, 2012. [3] K. Boone. The CDA TM book. Springer London, 2011. [4] HL7 FHIR documentation. http://hl7-fhir.github.io/index.html, 2014. [5] Orange Data Mining. http://orange.biolab.si, 2015. [6] Portavita Benchmark web page. http://portavitabenchmark.com, 2015. 43