CHALLENGES OF EVOLVING PINCLOUD PHR INTO A PHR-BASED HEALTH ANALYTICS SYSTEM Abstract M. Poulymenopoulou, F. Malamateniou, A. Prentza and G. Vassilacopoulos Department of Digital Systems, University of Piraeus, Piraeus, 185 34, Greece State of the art technologies like cloud computing, service-oriented architectures and NoSQL databases enable the creation of new generation Personal Health Record (PHR) systems which are provided as cloud services and enable collecting life-long cross-institutional information from various sources. Based on this concept, the PINCLOUD PHR is a cloud-based service that seeks to integrate patient health and social care information from various sources such as the patient, non-healthcare providers, home care systems (that store medical information transmitted from Internet connected medical devices to the patient) and other healthcare information systems (e.g. primary care systems, electronic medical record - EMR systems, e- prescription and e-referral systems). Such PHR services contain integrated big patient data from many sources that can be used in multiple health analytic scenarios in order to provide insights for improving diagnosis and treatment accuracy, cutting down healthcare costs and improving healthcare delivery. Hence, such PHR services, if enhanced with a health analytic engine, can evolve into a PHR analytics system. However, there are many challenges in the big data integration that should be performed before the data analysis. Those involve data trustworthiness, interoperability and security. To this end, in this paper, an overview of the PINCLOUD PHR is presented and the challenges related to the development of the PINCLOUD PHR with the insight to evolve into a health analytics system are also discussed. Keywords: PHR, health analytics, systems evolution, cloud computing, NoSQL. 1 INTRODUCTION In healthcare, the continued pressure to comply with regulatory demands, healthcare-specific compliance requirements, and guidelines like the U.S. Health Insurance Portability and Accountability Act (HIPAA) and Health Information Technology for Economic and Clinical Health (HITECH) legislation and the European Union Directives, requires transformation in the healthcare information technology (HIT) to effectively handle exponential data growth, improve operational efficiencies, and manage limited budgets (Behara, 2014). Moreover, big data analytics in healthcare is evolving into a promising field for providing insight from very large data sets and improving outcomes while reducing costs. Hence, health analytics evolution has resulted in a swift in HIT from departmental solutions to encompass larger solutions at the enterprise level, and from standalone systems that provide limited and localized solutions to more interconnected ones that provide comprehensive and integrated solutions (Zarrouk, 2014). Personal Health Record (PHR) services that integrate patient health and social information from various, heterogeneous data sources is expected to play a key role in the health analytics evolution since those can provide a data source with integrated patient data to be used by analytic engines that use a set of techniques such as machine learning and text mining in order to analyze patient data. For example, advanced analytics to patient profiles (e.g., segmentation and predictive modeling) can proactively identify individuals who would benefit from preventative care or lifestyle changes or fraud can be predicted and minimized by implementing advanced analytic systems for fraud detection (Raghupathi, 2014, Gearon, 2007). 1
Based on these concepts, this paper presents a cloud-based PHR service that seeks to integrate different information systems and patient data into a central repository (e.g. data warehouse), leading to the provision to authorized actors of integrated patient data anytime and anywhere. In particular, the PINCLOUD PHR service, stems from our involvement in the PINCLOUD project, and consists of the following data components: a) a non-healthcare component containing health and social information collected by either the patient or non-healthcare providers (e.g. family members friends, social care providers); b) a home care component containing health information existing in home care systems (e.g. medical information transmitted from Internet connected medical devices to the patient); and c) a healthcare professional component containing health information stored in various healthcare information systems (e.g. primary care, electronic medical records EMRs, e-referral and e-prescribe systems). This PHR service follows the patient-oriented paradigm which means that patients are the owners of their information and are empowered to authorize other subjects to access it. The PINCLOUD PHR service is built in a reliable and rugged platform warranting stakeholder collaboration and enjoying public trust, by utilizing cloud infrastructure. The PINCLOUD PHR service inevitably contains big data in the sense that data volume can be too large, data variety can comprise of unstructured (e.g. audio, video), semi-structured (e.g. patient generated data or sensor generated data) and structured (e.g. healthcare provider data) forms and data velocity can be too fast while also including incomplete or noisy data due to external factors (Bohlouli, 2013, Kononengo, 2001). The analysis of PINCLOUD PHR big data with the use of appropriate analytic techniques is expected to provide ample opportunities for the improvement of healthcare delivery. Hence, PINCLOUD PHR can be enhanced with a health analytics engine that uses PHR integrated data and a set of analytic algorithms in order to transform patient data into actionable knowledge. Despite the great potential of health analytics, healthcare analysis and interpretation of data is a highly difficult task. In this paper, the challenges related to the big data integration, like security and interoperability, are discussed. For instance, a factor that hinders big data interoperability is that most healthcare information systems are designed to meet local needs (Raghupathi, 2014, Kumar, 2013). To face these challenges there is a need to provide appropriate solutions at technological, organizational and environmental level. To this end, in this paper, an overview of the PINCLOUD PHR is presented and the challenges related to the PINCLOUD PHR evolution into a PHR-based health analytics system are also discussed. Generally, cloud-based PHRs like the PINCLOUD proposed here provides a solution for achieving big data integration that is a key factor for the realization of health analytics systems in the future. 2 BACKGROUND Big data analysis typically refers to the analysis of large complex data set that yields substantially more information when analyzed as a fully integrated data set as compared to the outputs achieved with smaller sets of the same data that are not integrated. Healthcare providers have started to realize the value of health analysis and therefore have started to provide their patient data into large data repositories to be used for gaining insights for making better-informed health-related decisions. Currently, there is a series of initiatives that make efforts to build a cloud-based platform to permit secure data sharing among the institutions so as to improve the quality of consumer care. Some examples for electronic, personally-controlled health records delivered as Software as a Service (SaaS) are the Microsoft Corporation Health Vault Program and the Intel Corporation Dossia. Cloud-based PHR services increase the amount of patient data produced and processed resulting in the big data phenomenon (Bohlouli, 2013). The amount of data in cloud-based PHRs like the one presented in this paper is expanding at an extreme pace thus those can help to the health analytics evolution by realizing the value of healthcare analysis of larger integrated datasets (Poulymenopoulou, 2014). Health analytics systems usually consist of a health analytics engine that analyze a set of patient data by using analytic techniques like machine learning and data mining algorithms. However, since data is by definition large, those systems are mostly implemented on cloud and processing is broken down and executed across multiple nodes. Cloud computing saves on the costs for storing big datasets while also enables the delivery of increased IT efficiency and service levels. Furthermore, open source 2
platforms such as Hadoop/MapReduce, available on the cloud, have encouraged the application of big data analytics in healthcare (Zarrouk, 2014, Bohlouli, 2013). In big data analysis of multiple data sets from dispersed data sources there is also a variety of security issues that need to be considered. These include the regulations that protect patient data and prevent patient re-identification by any means, the healthcare providers security policies, the agreements based on consent forms and patient data sharing preferences as imposed by the patient-oriented paradigm. In big data analysis it is widely suggested to aggregate patient data in order to ensure non-identification and anonymization while the original data remain safe from any modifications (Kamateri, 2014). However, even with the use of those methods, the healthcare provider security policies and patient sharing preferences should be considered. Big data interoperability is also a key problem that hinders big data integration in cloud-based PHRs. Big data might come from many and diverse sources like electronic health records, clinical decision support systems, government sources, laboratories, pharmacies and insurance companies residing at multiple locations (geographic as well as in different healthcare providers sites) in numerous legacy and other applications (transaction processing applications, databases, etc.) (Crisholm, 2015). Therefore patient data exist in multiple formats (flat files,.csv, relational tables, ASCII/text, etc.) and with diverse semantics that should be semantically integrated before its use by analytic techniques in order to provide valuable results. Hence, several decisions should be made with regard to the data collection approach, the distributed design, the data integration scheme and the data semantics. Although there are definite benefits to using cloud computing as a means for storing integrated big patient data, there are also many potential risks, because cloud computing combines new technology with many unproven vendors and service providers. Security, reliability and manageability need to be key elements in the planning and selection processes for the use of private and public cloud services. When sensitive information like patient medical information is to be transferred on a public cloud, the security issues are sometimes a barrier for cloud computing adoption (Poulymenopoulou, 2014, Crisholm, 2015). However, it is possible with the use of appropriate security techniques like encryption, access control policies, data backups, audit logs etc. to safeguard data in the cloud and maximize the healthcare benefits of cloud offerings. Although regulatory and security concerns have held back the healthcare industry from widespread adoption of public clouds, the overall cloud computing market in healthcare is expected to grow to $5.4 billion by 2017 (Zarrouk, 2014). 3 THE PINCLOUD PERSONAL HEALTHCARE RECORD 3.1 Overall architecture The PINCLOUD PHR service offers to authorized users access to integrated patient data and enables healthcare professionals to perform e-prescriptions and e-referrals as well as to inform and advice patients and the patients to electronically arrange medical appointments, to communicate with healthcare professionals and manage authorizations to patient data. As show in Figure 1, the PINCLOUD PHR exists at the cloud and communicates through the Internet with other systems like hospitals, medical offices, health insurance companies, diagnostic centers, pharmacies, social care providers, home care systems, e-referral and e-prescribe systems. Moreover, the PINCLOUD PHR integrates patient data from different organizations and systems and provides integrated services to healthcare professionals, thus resulting to improved patient care. Other systems connected to PINCLPUD might also exist at the same or another cloud provider or implemented at local infrastructures and incorporated into the PINCLOUD PHR through the use of web services. At each source (e.g. social and health care provider, home care systems) connected to the PINCLOUD PHR exist web and/or Representational State Transfer (REST) services that extract a pre-specified subset of patient social and health information (e.g. an extended discharge summary including citizen critical factors like allergies extracted from EMRs) from social care providers and various healthcare systems, accordingly, as JSON documents that are stored to the PINCLOUD PHR data repository. In 3
Authorization System European, Mediterranean & Middle Eastern Conference on Information Systems 2015 (EMCIS2015) particular, web/rest services are used in this project as a means of exchanging patient data among the connected systems. This results in lower cost for the connected healthcare organizations since they retain their existing systems and infrastructures without the need for investing on new technologies. Moreover, the PINCLOUD uses a NoSQL data repository for storing patient data from multiple sources in the form of JSON documents based to a JSON schema defined according to the continuityof-care (CCD) document schema, extensively used in PHR systems (Bonnet, 2011). Patient s Home Cloud Computing Health Insurance Organization INTERNET Hospital Medical Office Diagnostic Center Pharmacy Figure 1. The PINCLOUD PHR service interaction with multiple organizations. Healthcare Organization Web Services Existing Information System PINCLOUD PHR Web Portal Web Services Social Care Organization Web Services NoSQL Data Repository Existing Information System Figure 2. The PINCLOUD PHR service interaction with multiple organizations. More specifically, as shown in Figure 2, the PINCLOUD PHR service comprises the following: 4
A NoSQL data repository that stores patient information in the form of JSON patient documents, A web portal, through which, patients can access and manage their lifelong health and social information and set their sharing preferences, A set of web services that implement the PINCLOUD PHR functionality and are called by authorized users and An authorization system that enables modelling and enforcing patient-centered authorization policies As regards to security, at each connected to the PINCLOUD organization, are defined access control rules of the form which patient data (objects) is allowed to be exported to the PINCLOUD service (subject) and under what circumstances. For example, a healthcare organization access control rule may imply that patient identification data, critical medical data, health problems, hospital encounters and emergency incidents information can be shared with the PINCLOUD PHR. Moreover, access requests to patient data existing in PINCLOUD PHR are subject to patient sharing preferences. Hence, patients should specify its sharing preferences for use by health analytics systems. 3.2 PINCLOUD implementation The PINCLOUD PHR service has been implemented on a private cloud based on Microsoft Windows Azure and is provided as a SaaS. Moreover, the MongoDB has been used as a NoSQL database for storing patient information and the Liferay as the web portal of the PINCLOUD. In addition, a set of web and REST services has been implemented for the realization of PINCLOUD PHR functionality. At this stage the PINCLOUD PHR service has been connected through a Service Oriented Architecture (SOA) with the following healthcare information systems: The healthcare information system of the private hospital Ygeia The home care system of Vidavo A healthcare system for physicians called 4doctors of SingularLogic An e-prescribing service developed An e-referral service developed As regards patient data coding, in this system implementation, standard terminology systems have been used like the International Classification of Diseases (ICD-10) as well as national standards for coding medications. In addition, the epsos patient summary has been used as a basis for the creation of a patient summary to be exchanged among the connected systems. 4 PHR-BASED HEALTH ANALYTICS EVOLUTION CHALLENGES The PINCLOUD PHR main objective is to promote knowledge and excellence through the development and investigation of innovative integrated ehealth services using cloud computing, SOA and advanced patient monitoring technologies and set the basis for the realization of a health analytics system. From the technological point of view, PINCLOUD is expected to contribute to the realization of health analytics systems, as an innovative area, deserving to be extensively analyzed and studied, to be applied in the future as real and commercial services. In terms of the economy, PINCLOUD is expected to reduce cost in health care and free up resources (e.g. hospital beds). Last but not least the project is of great importance for the society as patients health data will be documented by the system that can be used by health analytics scenarios in order to provide medical insights. As a result, medical errors associated with the non-integrated nature of health services will be eliminated and, as a result, less people will be affected by these errors. For the realization of large scale health analytics systems, the lag between data collection and data analysis with the use of analytic algorithms has to be addressed. Hence, important issues and 5
challenges for the realization of integrated big patient data should be identified and appropriate solutions should be found. In particular, the main challenges faced during the PINCLOUD design and implementation related to organizational motivation, technology, interoperability and concerns over data privacy and security are summarized as follows: Participant motivation to join the health analytics effort and give permissions for extracting patient summaries from locally stored data to be transferred to the cloud. Critical to the successful implementation of health analytics systems is the engagement and support of the wide range of healthcare stakeholders like healthcare providers, EMR vendors, governments, pharmacies and insurance companies. Lack of a practical mechanism to uniquely identify participants. The integrity and value of patient data depends fundamentally on being able to unambiguously link it to one and only one individual. This is sometimes achieved by matching records based on several characteristics (probabilistic matching) instead of using a unique identifier. Missing well established laws or regulations mandating the electronic capture of patient data in addition to law covering issues of protection and security of this data. Hence, the EU directives that establish rules on the protection of individuals personal data as well as other international requlations like HIPAA privacy rule should be taken into account and appropriate security measures should be taken in order to safequard the confidentiallity of patient data (Crisholm, 2015). Health care data is rarely standardized, often fragmented, or generated in legacy IT systems with incompatible formats like csv, text, XML. Appropriate mechanisms are required in order to transform patient data from one format to another. The existence of multiple e-health standards (e.g., DICOM, ISO/TC 215, HL7/CDA) developed by numerous standardization bodies (e.g. ISO, HL7). Many of these are not interoperable or not directly coordinated with each other at an organizational level. Hence, an e-health standard should be selected and then the retrieved patient data mapped into this standard. Semi-strucured or unstructured data usually demonstrate significant noise and should be properly prepared (e.g. using cleaning and stemming algorithms) before integrated or used for analysis. Currently, the main benefits gained by the use of the existing PINCLOUD service are summarized below: Healthcare providers retained and connected internal data sources and systems with external virtual private cloud computing resources. This is extremely useful for small and medium sized healthcare providers where they can utilize advanced IT infrastructures and services to support their healthcare operations without facing high initial and operational costs. Better control of HIT costs and the right sizing of IT investments based on the nature of the workload involved Creation of an agile IT environment to support the PINCLOUD PHR, capable of dynamically scaling to meet healthcare needs An integrated medical data repository with data existing in a standard format stored as JSON files (according to the JSON schema defined) that can be used by analytics tools in order to transform big data into actionable knowledge. For example, the integrated data can be easily utilized to develop data mining models to discover new medical facts and to conduct medical research to enhance medications, treatments and healthcare services. The technological solution selected in the PINCLOUD PHR development enable the realization of an integrated big data repository that can then be used by health analytics services. However, regardless the availability of big data and advanced analytics solutions, there is a need to build analytics competencies to harness semantically integrated data beyond big data to obtain business insights and improve outcomes (Ferguson, 2012). Hence, our future work in this area is intended to focus more on the data interoperability challenge for patient data from dispersed and heterogeneous resources (e.g. physical resources, streams from cloud, medical sensors and devices). Semantic web technologies like 6
ontologies have been used for exploiting data heterogeneity and for transforming source patient data into the format required to the data warehouse where analysis is performed. 5 CONCLUSIONS Personal healthcare records are expected to result in a dramatic increase of healthcare data availability. If analyzed properly, this data has the potential to transform healthcare, to contain cost while increasing quality. For example, regulators and policy makers can better define policies that increase healthcare value and safety, pharmaceutical companies can deepen their understanding of diseases and treatments and better direct the design of products; improve their mechanism for recruiting patients for clinical trials and lead innovative smart solutions, such as clinical decision support for personalized treatments, personalized dosage optimization, and adherence analysis (Ferguson, 2012, Raghupathi, 2014). Quality decisions come from quality data, hence data pre-processing is critical and considerable work is needed to ensure data consistency and validity across sources, platforms and systems (Figo, 2010). Several challenged should be faced before the realization of health analytics systems that include organizational, policy, technological and environmental issues. In this paper, the PINCLOUD PHR service is presented which is a cloud-based PHR that integrates patient data from multiple connected sources. Moreover, the challenges realted to evolving PINCLOUD PHR into a health analytics system are presented. From a technological perspective, cloud-based ehealth solutions in conjunction with SOA and NoSQL databases have the potential to enable the realization of integrated big data repositories. However, it is up to future research in this field to create more advanced methods and tools, as well as the readiness of the healthcare field to accept and apply these findings and techniques to improve healthcare quality. References Behara R., Huang C. and Goo J. 2014. The evolving regulatory framework for health information technology in the U.S.. in Twentieth Americas Conference on Information Systems, Savannah. Zarrouk M. 2014. Cloud comouting: Lowering cost complexity barriers in the healthcare industry. NetApp, White Paper. Available via http://cstor.com/flexpod/wpcontent/uploads/2014/11/converged-architecture-2.pdf, Last Accessed May 2015. Raghupathi W. and Raghupathi V. 2014. Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2:3. Gearon C. 2007. Perspectives on the future of personal health records. ihealthreports, California Healthcare Foundation. Available via http://www.chcf.org/~/media/media%20library%20files/pdf/p/pdf%20phrperspec tives.pdf, Last accessed May 2015 Bohlouli M, Schulz F., Angelis L., Pahor D., Brandic I., Atlan D. and Tate R. 2013. Towards an integrated platform for big data analysis. Integration of Practice-Oriented Knowledge Technology: Trends and Prospectives, 47-56. Kononengo I. 2001. Machine learning for medical diagnosis: History, state of the art and perspective. Artificial Intelligence in Medicine 23(1):89-109. Kumar P. and Pandeya K. 2013. Big data and distributed data mining: An example of future networks. International Journal of Advance Research and Innovation, 2:36-39. Poulymenopoulou M., Malamateniou F and Vassilacopoulos G. 2014. Machine learning for knowledge extraction from PHR big data. Studies in Health Technology and Informatics 202:36-39. Poulymenopoulou M., Malamateniou F. and Vassilacopoulos G. 2014. Ontology-driven authorization policies on personal health records for sustainable citizen-centered healthcare. Anals of Information Systems Series, Concepts and Trends in Healthcare Information Systems, Book chapter, Springer International Publishing, Koutsouris D. and Lazakidou A. (eds). 7
Kamateri E., Kalampokis E., Tambouris E., Tarampanis K. 2014. The linked medical data access control framework. Journal of Biomedical Informatics, 50:213-225. Crisholm R., Denny J., Fridsma D., Khertepal S., Masys D., Ohno-Machado L. 2015. Opportunities and challenges related to the use of Electronic Health Records for data research. White Paper. Available via: http://www.nih.gov/precisionmedicine/whitepapers/opportunities- Challenges-Electronic-Health-Records.pdf. Last Accessed May 2015. Bonnet L., Laurent A., Sala M., Laurent B., Sicard N. 2011. Reduce, you say: What NoSQL can do for data aggregation and BI in large repositories. In 22 nd International Workshop on Database and Expert Systems Application, IEEE Computer Society, 483-488. Ferguson M. 2012. Architecting a big data platform for analytics. Intelligent Business Strategies. Available from URL http://www.ndm.net/datawarehouse/pdf/netezza%20- %20Architecting%20A%20Big%20Data%20Platform%20for%20Analytics.pdf Figo D., Diniz P., Ferreira D., Cardoso J. 2010. Preprocessing techniques for context recognition from accelerometer data. Personal and Ubiquitous Computing 14:645-662. 8