Prediction Models for a Smart Home based Health Care System



Similar documents
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

Web Document Clustering

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

How To Predict Web Site Visits

Data Mining with Weka

Comparison of K-means and Backpropagation Data Mining Algorithms

Data quality in Accounting Information Systems

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

How To Understand How Weka Works

Automated Content Analysis of Discussion Transcripts

Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA

Experiments in Web Page Classification for Semantic Web

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

FOREX TRADING PREDICTION USING LINEAR REGRESSION LINE, ARTIFICIAL NEURAL NETWORK AND DYNAMIC TIME WARPING ALGORITHMS

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

Using Data Mining to Identify Customer Needs in Quality Function Deployment for Software Design

Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

Data Mining On Diabetics

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

TIME SERIES ANALYSIS

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Numerical Algorithms for Predicting Sports Results

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

Equity forecast: Predicting long term stock price movement using machine learning

IDENTIFICATION OF DEMAND FORECASTING MODEL CONSIDERING KEY FACTORS IN THE CONTEXT OF HEALTHCARE PRODUCTS

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Mining an Online Auctions Data Warehouse

Modelling and Forecasting Packaged Food Product Sales Using Mathematical Programming

Time Series Analysis of Household Electric Consumption with ARIMA and ARMA Models

How To Plan A Pressure Container Factory

Energy Load Mining Using Univariate Time Series Analysis

ISSUES IN UNIVARIATE FORECASTING

IBM SPSS Forecasting 22

Machine Learning Techniques for Stock Prediction. Vatsal H. Shah

Time Series Analysis

Syllabus. HMI 7437: Data Warehousing and Data/Text Mining for Healthcare

Applications of improved grey prediction model for power demand forecasting

Knowledge Discovery from patents using KMX Text Analytics

Weather forecast prediction: a Data Mining application

PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES

A Comparative Study of clustering algorithms Using weka tools

Smart self management: assistive technology to support people with chronic disease

Standardization and Its Effects on K-Means Clustering Algorithm

Time Series Analysis: Basic Forecasting.

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Model Deployment. Dr. Saed Sayad. University of Toronto

Operations Research and Statistics Techniques: A Key to Quantitative Data Mining

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

Short-Term Forecasting in Retail Energy Markets

Comparison and Analysis of Different Software Cost Estimation Methods

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

A Study on the Comparison of Electricity Forecasting Models: Korea and China

Rule based Classification of BSE Stock Data with Data Mining

Introducing diversity among the models of multi-label classification ensemble

University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task

DEA implementation and clustering analysis using the K-Means algorithm

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

An Introduction to WEKA. As presented by PACE

Automated HPOMDP Construction through Data-mining Techniques in the Intelligent Environment Domain

Study of Wireless Sensor Networks and their application for Personal Health Monitoring. Abstract

SINGULAR SPECTRUM ANALYSIS HYBRID FORECASTING METHODS WITH APPLICATION TO AIR TRANSPORT DEMAND

An Intelligent Software Agent Machine Condition Monitoring System Using GPRS and Data Mining

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Open-Source Machine Learning: R Meets Weka

A New Method for Electric Consumption Forecasting in a Semiconductor Plant

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Mining the Software Change Repository of a Legacy Telephony System

Forecasting sales and intervention analysis of durable products in the Greek market. Empirical evidence from the new car retail sector.

Social Media Mining. Data Mining Essentials

MOBILE PHONE APPLICATION TO SUPPORT THE ELDERLY

DATA MINING IN FINANCE

Towards applying Data Mining Techniques for Talent Mangement

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

Implementation of Breiman s Random Forest Machine Learning Algorithm

Machine Learning Algorithms and Predictive Models for Undergraduate Student Retention

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

WEKA KnowledgeFlow Tutorial for Version 3-5-8

Multi-item Sales Forecasting with. Total and Split Exponential Smoothing

Forecasting of Paddy Production in Sri Lanka: A Time Series Analysis using ARIMA Model

Applying Data Mining Technique to Sales Forecast

Data Mining in Education: Data Classification and Decision Tree Approach

Chapter 12 Discovering New Knowledge Data Mining

Time Series and Forecasting

Name of Module: Big Data ECTS: 6 Module-ID: Person Responsible for Module (Name, Mail address): Angel Rodríguez, arodri@fi.upm.es

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

Health Monitoring in an Agent-Based Smart Home by Activity Prediction

Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad

Data Mining Techniques - SPSS Clementine

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Transcription:

Prediction Models for a Smart Home based Health Care System Vikramaditya R. Jakkula 1, Diane J. Cook 2, Gaurav Jain 3. Washington State University, School of Electrical Engineering and Computer Science, Pullman, WA 99164, USA. Abstract Technology holds great potential for improvements in the field of health care used in intelligent environments, with homes becoming the centers for proactive health care. Smart health care systems at home can be used to provide such solutions. A technology-assisted smart health care system would enable elderly people to lead their independent lifestyle away from hospitals and also avoid having expensive caregivers. In this paper, we present one such solution where a prediction model in an intelligent smart home system can be used for identifying health trends over time and enable prediction of future trends which can aid in providing preventive measures. INTRODUCTION Today in the area of health care, a major issue is the provision of adequate and effective health for the elderly, as people aged 65 and older are the fastest growing segment of the US population; by 23 more than 4 million Americans will be over the age 85 [1]. Studies reveal that older Americans prefer an independent lifestyle [2]. With a growing aging population that desires to maintain their independent living, the need for smart and cost effective, in-home, health care technology to replace existing home care givers becomes critical. The next generation of smart in-home systems built for health care would enable people to lead a healthier life. The major challenge for these systems would be early prediction of health variations over time, which may act as early indicators for chronic diseases. Time series analysis is often associated with discovery of patterns (such as frequent sequences or sequences appearing with increasing or decreasing regularity) and prediction of future values (forecasting). In this paper we employ machine learning techniques using Weka, for predicting trends of health data over time, and also use Box-Jenkins seasonal arima models for forecasting health data ranges over time. The Box Jenkins seasonal arima model is frequently used for analysis and forecasting time series data [3]. We hypothesize that the Box-Jenkins arima method can be effective for analyzing health data in an intelligent environment. To validate our hypothesis, we test the method using health data collected from MavHome smart home project. ENVIRONMENT SENSING AND DATA COLLECTION The major goal of the MavHome project [4, 5] is to design a smart environment that acts as an intelligent agent. We define a smart environment as one with the ability to adapt the environment to the inhabitants and meet the goals of comfort and efficiency. In order to achieve these goals the house should be able to predict, reason, and adapt to its inhabitant. In MavHome, sensor network data is the primary source of data collection. The data collection system consists of an array of motion sensors which collect information through the Argus sensor network [6]. In this study, we enhance collected information with critical health parameters (for example, blood pressure and pulse) collected using digital instruments. The data collected for this study spanned a period of sixty one days, based on a single inhabitant living in the MavPad on-campus apartment (see Figure.1). Our evaluation environment is a student apartment with deployed Argus and X-1 networks. The apartment consists of a living/dining room, kitchen, bath room, and bed room. There are over 15 sensors deployed in the MavPad that include light, temperature, humidity, and reed switches. Figure 1: (a) MavHome Argus Sensor Network (b) MavHome Apartment Environment.

EXPERIMENTATION EVALUATION The goal of this experiment is to evaluate a seriesbased forecasting model which would be a part of smart healthcare systems in smart homes. The expectations from this experiment are better predictive accuracy with a limited data from the smart healthcare system using time series data. These predictions would enable the caregivers and patients to prepare for possible critical situations before they actually occur. The basic idea behind self-projecting time series forecasting models is to find a mathematical formula that will approximately generate the historical patterns in a time series. A time series is a historical record of some activity, with measurements taken at equally-spaced intervals with a consistency in the activity and the method of measurement. Box-Jenkins forecasting models are based on statistical concepts and principles and are able to model a wide spectrum of time series behaviors [7]. We use Box-Jenkins methodology in analysis and forecasting rather than developing our own because the Box-Jenkins methodology is widely regarded to be the most efficient forecasting technique, and is used extensively, especially for univariate time series [1]. Compared to other techniques, Box-Jenkins forecasting provides some of the most accurate short-term forecasts. However, a limitation is that it requires a large amount of data which is a current problem leading to lower accuracy. EXPERIMENTATION AND RESULTS The experiment consists of two parts. The first part uses a support vector machine algorithm found in Weka [8] to predict whether health trends are increasing, decreasing or constant. The second experiment utilizes automated forecasting tools, such as Phicast [9], which enables us to use the Box-Jenkins method for forecasting the range of health data values. The first experiment employs the SVM using Weka to predict trends in the health data values over time. We are looking to predict an increasing, decreasing or constant trend over time. For this experiment we use a training set consisting of 4 days of health parameter data which was collected from the inhabitant in the MavHome apartment. For the second experiment, we included observations from one additional day as it was made available. Thus for the second experiment we did use 41 days of health parameter data. In our prediction experiment, we use observed trend accuracy as the performance measure. This is defined as the number of correctly classified observed instances divided by the total number of observed instances. We also look at the error rates, which is different from trend accuracy and is defined as the incorrectly classified instances divided by the total classified instances in the observed output test set data. Most of the experiment is run using the Weka environment and in all of the experiments reported here percentage split was used as the evaluation technique. This consists of dividing the data into two subgroups. The first subgroup, called the training set, is used for building the model for the classifiers. The second subgroup, called the test set, is used for calculating the accuracy of the constructed model. The subgroup or the test set consisted of 2 data points for experiment one and 21 data points for experiment two. The total dataset consisted of 41 data points. We also did perform 3-fold cross validation on the datasets for experiment one only and the observed results are given below in Table 4. We have chosen 3 fold cross validation because of the size of the dataset used for the experimentation. Table 1: systolic trend observations. Inst No. 1 2:decrease 2:decrease No 4 2:decrease 2:decrease No 5 2:decrease 2:decrease No 6 2:decrease 2:decrease No 1 2:decrease 2:decrease No 11 2:decrease 2:decrease No 1 13 2:decrease 1:increase Yes 15 2:decrease 2:decrease No 16 2:decrease 2:decrease No 17 2:decrease 2:decrease No 18 2:decrease 2:decrease No 19 1:increase 2:decrease Yes 2 2:decrease 2:decrease No Table.1 display s the predicted systolic data trends and compares them with actual trends of the data collected from the inhabitant. We observe the accuracy to be fairly high with an error rate of 1%. Table 2 displays predicted trends in diastolic data values and compares them with actual trends of the data collected from the inhabitant. We observe that the accuracy was less and error rate was 67%, because this method implements a sequential minimal optimization algorithm for training a support vector classifier and converts nominal values to binary for prediction. In Table 3 we compare predicted trends to actual trends in pulse rate values on the data collected from the inhabitant. These predictions have a high accuracy and an error rate of 5%.

Table 2: diastolic trend observations. InstNo. 1 2:decrease 1:increase Yes 4 2:decrease 1:increase Yes 5 2:decrease 1:increase Yes 6 3:constant 1:increase Yes 1 2:decrease 1:increase Yes 11 2:decrease 1:increase Yes 12 3:constant 1:increase Yes 13 1:increase 2:decrease Yes 15 2:decrease 1:increase Yes 16 1:increase 1:increase No 17 2:decrease 1:increase Yes 18 2:decrease 1:increase Yes 19 2:decrease 1:increase Yes 2 1:increase 1:increase No Table 3: pulse trend observations. No 1 2:decrease 2:decrease No 4 2:decrease 2:decrease No 5 2:decrease 2:decrease No 6 2:decrease 2:decrease No 1 2:decrease 2:decrease No 11 2:decrease 2:decrease No 1 13 1:increase 2:decrease Yes 15 2:decrease 2:decrease No 16 2:decrease 2:decrease No 17 2:decrease 2:decrease No 18 2:decrease 2:decrease No 1 2 2:decrease 2:decrease No We also have performed cross validation evaluation on the experiment testing data and the obtained outputs are displayed in Table 4. Table 4: Observation of cross validation on datasets. Systolic Diastolic Pulse Correctly Classified s 19 15 19 Incorrectly Classified s 1 5 1 Mean absolute error.2333.3111.2333 The second experiment evaluates the ability of ARIMA methods to perform effective prediction or forecasting of health data value ranges.. This involves the use of an automated forecasting tool, which predicts the range of values for the time series. The experiment compares predicted values with actual test data. The performance is then measured as the ratio of test cases predicted accurately. The figures plotted for comparing the predicted range with the actual data values are illustrated in fugures 2,3 and 4. In figure 2 we compare the collected systolic data values to the predicted range values over the time series data. We observe that this had 62% accuracy in the prediction. Note the observations summarized in Table 5. Table 5: Comparing the to the Range of the Systolic Data values collected. Systolic 1 134 126.4 142.1 No 2 129 127.93 144.52 No 3 132 128.13 144.44 No 4 134 128.32 144.56 No 5 126 128.1 144.7 Yes 6 123 13.11 144.8 Yes 7 119 129.32 145.34 Yes 8 122 128.2 145.37 Yes 9 135 129.33 145.6 No 1 139 127.76 145.45 No 11 133 126.63 145.44 No 12 132 129.11 145.78 No 13 134 128.21 146.34 No 14 124 129.34 146.37 Yes 15 153 129.57 146.45 Yes 16 14 129.76 146.76 No 17 137 13.95 146.95 No 18 135 131.2 147.15 No 19 12 131.34 147.34 Yes 2 145 131.53 147.5 No 21 123 131.73 147.73 Yes

S y stolic D ata V alue 18 16 14 12 1 8 6 4 2 Comparing Range with s of Systolic Data Systolic 4/22/25 4/3/25 5/2/25 5/4/25 5/6/25 Figure.2: Comparing Range with s of Systolic Data. In Figure 3, we compare the collected diastolic data values to the predicted range values over the time series data and observe that it had 62% accuracy for predicted values. The observations are summarized in Table 6. Comparing the Range with the actual diastolic data values 7 68 67.5 73.68 No 8 72 67.43 73.56 No 9 73 67.38 73.44 No 1 84 67.32 73.32 Yes 11 72 67.21 73.21 No 12 73 67.9 73.9 No 13 73 66.97 74.93 No 14 78 66.85 74.78 Yes 15 91 66.73 74.73 Yes 16 8 66.61 74.45 Yes 17 112 66.49 74.36 Yes 18 71 66.37 74.27 No 19 76 66.26 75.22 No 2 75 66.14 75.18 No 21 78 66.2 76.8 Yes In Figure 4 we compare collected pulse data values to the predicted range. We observe that this had 76% predictive accuracy. The observations are summarized Table 7 and a comparison is shown in Figure 4. 12 Comparing the Ranges with the actual collected Pulse Data 12 1 Diast o lic V a lu es 1 8 6 4 2 Diastolic Figure.3: Comparing Range with s of Diastolic Data. 4/22/25 4/3/25 5/2/25 5/4/25 5/6/25 Table 6: Observations of comparing the to the Range of the Diastolic Data values collected from the inhabitant. Diastolic 1 76 69 73.5 Yes 2 73 68.88 73.56 No 3 8 68.45 74.15 Yes 4 72 68.13 74.4 No 5 69 67.78 73.92 No 6 68 67.56 73.8 No Pulse Data 8 6 4 2 4/22/25 Pulse 4/3/25 5/2/25 Figure.4: Comparing Range with s of Pulse Data. 5/4/25 5/6/25 The observations made from Table 5 reveal that the accuracy was 62%, Table 6 shows an accuracy of 62% and Table 7 shows an accuracy of 76%. The accuracy was low as this method needs more data for accurate predictions. Data is collected continuously, so as more data is available we expect to see an increase in the predictive accuracy. The data does include a few outliers which also affect the accuracy of the results. The first model has higher accuracy, thus making it more effective than the second model which predicts the health data ranges. These models can be more effective when used together.

Table 7: Observations of comparing the to the Range of the Pulse Data values collected from the inhabitant. Pulse 1 76 71 82 No 2 75 71 82 No 3 79 71 82 No 4 82 71 82 No 5 74 69 82 No 6 72 69 82 No 7 7 69 82 No 8 73 69 82 No 9 75 69 82 No 1 85 69 82 Yes 11 69 69 82 No 12 71 69 82 No 13 69 69 82 No 14 77 69 82 No 15 96 69 82 Yes 16 88 69 82 Yes 17 81 69 82 No 18 75 69 82 No 19 7 69 82 No 2 13 69 82 Yes 21 65 69 82 Yes CONCLUSION AND FUTURE WORK Generally, predicting time series data is difficult. In this case, however, we show that there was good overall prediction performance. We have used the Box-Jenkins method with relatively small amounts of data, keeping in mind the use of this tool over time, as the amount of data continuously increases. As this method performs its best with large data [1], the performance will steadily improve. We observe that the prediction models act as useful components to the health care system in smart homes. Future work would include improving the prediction, collecting more data over time which would help improve prediction and detecting and avoiding outliers to increase the prediction accuracy. Providing accurate tele-health care systems in smart homes, would be a breakthrough for the healthcare industry. [2]. T. K. Hareven. Historical Perspectives on Aging and Family Relations. 21. Handbook of Aging and the Social Sciences. 5th Edition. 141-159. [3].Box, G.E.P. and G.M. Jenkins. Time series analysis: Forecasting and control, San Francisco: Holden- Day.(197) [4]. G. Michael Youngblood, Lawrence B. Holder, and Diane J. Cook. Managing Adaptive Versatile Environments.Proceedings of the IEEE International Conference on Pervasive Computing and Communications (PerCom).( 25) [5]. D. J. Cook, M. Youngblood, E. Heierman, K. Gopalratnam, S. Rao, A. Litvin, F. Khawaja. MavHome: An Agent-Based Smart Home. Proceedings of the IEEE International Conference on Pervasive Computing and Communications. 521-524.( 23) [6].G. Michael Youngblood. http://mavhome.uta.edu /argus. (26) [7].Box, G.E.P., Jenkins, G.M., and Reinsel, G.C. Time Series Analysis: Forecasting and Control. Third Edition. Prentice Hall. (1994) [8].Ian H. Witten, Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition. Morgan Kaufmann, San Francisco.(25) [9]. Rob J. Hyndman, Anne B. Koehler, Ralph D. Snyder and Simone Grose. A state space framework for automatic forecasting using exponential smoothing methods http://www.elsevier.com/locate/ijforecast. (22). [1] O. D. Anderson, Box-Jenkins in government, a development in official forecasting. Statist. News 32,14-2 (1977). [11] S. Wheel Wright and S. Makridakis, Forecasting Models for management, Wiley, New York (1975). ACKNOWLEDGEMENT This work is supported by NSF grant IIS-121297. REFERENCES [1]. Anderson, R. N. Method for Constructing Complete Annual U.S. Life Tables. National Center for Health Statistics. Vital and Health Stat, Series 2(129) (1999)