BIG DATA IN HEALTHCARE THE NEXT FRONTIER



Similar documents
Prediction of Heart Disease Using Naïve Bayes Algorithm

Keywords data mining, prediction techniques, decision making.

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Social Media Mining. Data Mining Essentials

Classification algorithm in Data mining: An Overview

A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Effective Analysis and Predictive Model of Stroke Disease using Classification Methods

An Introduction to Data Mining

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

An Overview of Knowledge Discovery Database and Data mining Techniques

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

DATA MINING AND REPORTING IN HEALTHCARE

E-commerce Transaction Anomaly Classification

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Decision Support System on Prediction of Heart Disease Using Data Mining Techniques

Genetic Neural Approach for Heart Disease Prediction

A Review of Data Mining Techniques

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

Research on the Performance Optimization of Hadoop in Big Data Environment

MS1b Statistical Data Mining

A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA. Technology, Coimbatore. Engineering and Technology, Coimbatore.

Role of Social Networking in Marketing using Data Mining

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

Customer Classification And Prediction Based On Data Mining Technique

Using multiple models: Bagging, Boosting, Ensembles, Forests

Keywords: Data Warehouse, Data Warehouse testing, Lifecycle based testing, performance testing.

Comparison of Data Mining Techniques used for Financial Data Analysis

First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms

A Survey on Product Aspect Ranking

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

International Journal of Advanced Computer Technology (IJACT) ISSN: PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS

Random forest algorithm in big data environment

REVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES

Principles of Data Mining by Hand&Mannila&Smyth

Introduction to Data Mining

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Role of Neural network in data mining

Machine Learning Logistic Regression

Top Top 10 Algorithms in Data Mining

Top 10 Algorithms in Data Mining

DATA MINING TECHNIQUES AND APPLICATIONS

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Financial Trading System using Combination of Textual and Numerical Data

Data Mining Approach For Subscription-Fraud. Detection in Telecommunication Sector

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report

A New Approach in Software Cost Estimation with Hybrid of Bee Colony and Chaos Optimizations Algorithms

Stock Portfolio Selection using Data Mining Approach

Data Mining Yelp Data - Predicting rating stars from review text

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India

Content-Based Recommendation

Improving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables

Sentiment analysis using emoticons

Efficient Security Alert Management System

International Journal of Advanced Research in Computer Science and Software Engineering

Identifying Peer-to-Peer Traffic Based on Traffic Characteristics

Predicting Student Performance by Using Data Mining Methods for Classification

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

USING DATA SCIENCE TO DISCOVE INSIGHT OF MEDICAL PROVIDERS CHARGE FOR COMMON SERVICES

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University

Machine Learning with MATLAB David Willingham Application Engineer

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Knowledge Discovery and Data Mining

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Spam Detection Using Customized SimHash Function

American International Journal of Research in Science, Technology, Engineering & Mathematics

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Client Overview. Engagement Situation. Key Requirements

How To Use Neural Networks In Data Mining

A Review on Data Mining in Cloud Computing Environment

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Learning is a very general term denoting the way in which agents:

Data Mining for Business Analytics

CLOUD COMPUTING PARTITIONING ALGORITHM AND LOAD BALANCING ALGORITHM

Data Mining Part 5. Prediction

Web Data Mining: A Case Study. Abstract. Introduction

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining Algorithms Part 1. Dejan Sarka

PharmaSUG2011 Paper HS03

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Improving spam mail filtering using classification algorithms with discretization Filter

Detection of Heart Diseases by Mathematical Artificial Intelligence Algorithm Using Phonocardiogram Signals

Scalable Developments for Big Data Analytics in Remote Sensing

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

CoolaData Predictive Analytics

Big Data Analytics for Mitigating Insider Risks in Electronic Medical Records

Transcription:

BIG DATA IN HEALTHCARE THE NEXT FRONTIER Divyaa Krishna Sonnad 1, Dr. Jharna Majumdar 2 2 Dean R&D, Prof. and Head, 1,2 Dept of CSE (PG), Nitte Meenakshi Institute of Technology Abstract: The world of health care has tremendously taken a vast up-gradation due to various situations which are both in the sides of positive and negative. Increasingly upgrading of technology at a phase where the old has turned out to be completely outdated which are no more applicable (or) of no use in the present modern digital world. In the field of health care the due to heavy technological up-gradation the accessibility of consumers has been increased and the amount of data being generated every day is huge. The huge amount of data being generated created a problem for both storage and accessibility. The problem was tried to be fixed by various means and turned out to be a non-fruitable one, so which finally turns the researchers view towards Big Data. Research in this area has been an ongoing process for over a decade. This paper proposes the use of Big Data for the storage and extraction of enormous data. Keywords: Hadoop, Big data I. INTRODUCTION Almost every day tremendous data is being generated where every bit of information that is being stored and being accessed has its own part to solve various problems.[1][2] In the field of health care the data is precious; every single information helps to solve a particular type of problem and also leads to solve even bigger problems. The databases are filling up as high speeds where the storage is a heavy problem; missing some information in this particular field may lead to casualties. As this cumbersome task of storing data is a high priority problem various databases are developed to solve it but still the solution is no no situation. To overcome this particular problem the researches took a step towards Big data.[1][2][3] The use of big data analysis in the field of health care seems to be the best logical solution for solving the problem. In the field of health care a nano second can determine the survival of a particular person, so data regarding the particular person, research information, past evaluation etc., all information is to be accessed. In the words of Executive Chairman of Google Mr.Eric Schmidt, From the very beginning of civilization of humans until the year 2003, human world had generated five Exabyte amount of data but now the modern human world generates that five Exabyte amount of data every two days and this pace is going on accelerating as the time changes. These words are just as true as we can find enormous amount of data in every aspect especially in the field of healthcare various information regarding the patients, research details, medical details, etc., have been generated. Using various data mining techniques in big data the problems of the existing technology of storing data can be resolved. II. PROPOSED WORK The project deals with specialized techniques which allow the consumers i.e., doctors to predict the future consequence for a particular patient. The application starts with the Home Page where we provide the users to login with there credentials or register to become an user. So if the user is a new user he goes for the registration where he register himself/herself as a doctor/patient. Once the registration is done the user can login into the application where the application is designed in a way thet there is no separate log-in for a patient/doctor, the application is designed to segrigate the particular users and provide the requirements. As a patient user, the user can enter all the essiential requirements related to himself/herself and will be assigned to a doctor. As a doctor user there are 6 flows in this application. Search Patient s Medical Fitness: Here the doctor can enter the unique id of the patient and checks the fitness of a particular patient. Estimate Patients Medical Fitness: Here in this flow the doctor enters Patient s Test Details and find the Medical Fitness of patients. Patient s Dosage Details: Here in this flow the doctor checks by clicking for all or In and Out Patient Dosage and gets to find medical details where again he canclick On Medicine for Medicine Detail and get Patient Dosage details. Medicine Price Details: Here in this flow the doctor www.ijtre.com Copyright 2015.All rights reserved. 3164

enters Medicine Code and gets to all list of Medicine Details once he/she clicks on a particular record the price ofmedicine is found. Emergency Admission: Inthis flow the doctor enter the admission total and finds out the results of the admissions. Gastro Palpation Details: In this flow the doctor enters patient s Palpation and gets the result of Rectal Exam. In this application the use of Data mining algorithms is the main functionality for Big Data Analytics. Here in this project five such algorithms have been used. They are as follows: Naïve Bayes Algorithm. Linear Regression Algorithm. Decision Tree Algorithm. K Nearest Neighbors Algorithm. Artificial Bee Colony Algorithm. The application and description of each algorithm is a follows: A. Naïve Bayes Algorithm: The Naive Bayes classifier is considered as a probabilistic Classifier which was developed basing the theorem of Bayes where there with strong independent assumptions. This particular algorithm is also called as Independent Feature Model as the parameters are independent with one another which taking up this algorithm. In this algorithm, even if any of the particular parameter is dependent over other parameter directly (or) indirectly this algorithm will consider them as independent only. The naïve bayes depending up on the particular nature of the probability model it is trained at a very high supervised learning setting. In many of its applications naive Bayes uses method of maximum likelihood. The main advantage of this algorithm is that it needs only a small amount of training data to estimate the parameters which are necessary for the entire classification. B. Decision Tree Algorithm Decision Tree is a predictive model which has the capability of maps observations regarding an item for concluding about the target value. It is a model which used in many data mining and statistics. In the tree model the target variables is able to take a finite set of variables and are called classification trees. In these particular trees the leaves represent the class labels and all the branches represent the conjunctions of different features that allow getting a lead to particular class labels. Decision Tree is one of the Data Mining techniques which cannot handle continuous variables directly. So, these particular continuous attributes are to be converted to discrete attributes, a process called Discretization. The Decision tree algorithm uses Binary discretization for continuous-valued features. However, the multi-interval discretization methods are known to produce more accurate decision tree than binary discretization. The main two issues that affect the performance of Decision Trees are: The data discretization method used The type of Decision Tree used. Fig 2: Flow Chart of Decision Tree Algorithm Figure 1: Flow Chart of Naïve Bayes Algorithm Linear Regression Algorithm Linear regression algorithm is one of the mathematical technique that relates one variable to another variable i.e., independent variable to a dependent variable which is in the form of an equation for a straight line. The linear equation is as follows, Where, y = Dependent Variable. a = Intercept. b = Slope of the line x = Independent Variable www.ijtre.com Copyright 2015.All rights reserved. 3165

Fig 3: Flow Chart of Linear Regression Algorithm C. K - Nearest Neighbors Algorithm: K Nearest Neighbors Algorithm is one of the type of predicting algorithm which predicts the next possible values based on the stores of all available cases and therefore classifies new cases based on similarity factor The algorithm specifies that a case is been classified by a majority no.of votes by its neighbors where a case being assigned to the class which is most common in amongst its available nearest neighbors which is called as (K nearest neighbors). This is measured by the use of a distance function. The algorithm brings up the issue of standardization of the numerical variables which are between 0 and 1 whenever there is a mixture of both numerical and categorical variables in the provided dataset. The KNN has been used for the statistical estimation and pattern recognition fields. D. Artificial Bee Colony Algorithm: The Artificial Bee Colony (ABC) algorithm is a unique and population based meta-heuristic algorithm. This algorithm was inspired by the intelligent behavior of honey bees. In this particular algorithm a methodology called clustering is been used where the data is taken in the form of groups called clusters. In these particular clusters all one related types are stored in one cluster i.e., similar data is saved under one particular cluster. The advantages of this algorithm are as follows: It employs a total of only three control parameters. It has a very fast convergence Speed. It is robust. Simple. Flexible It can be easily be optimized with any of the algorithms. It has three phases Onlooker bee Phase. Scout bee Phase. Employed bee Phase. Fig 5: Flow Chart of Artificial Bee Colony Algorithm Fig 4: Flow Chart of K-Nearest Neighbour Algorithm III. IMPLEMENTATION Big data is an upcoming technology where the utilization of big data in many of the application features in not yet implemented but whereas till now where ever big data has been implemented it gave tremendous results in the form of efficiency etc., in this project I am using Data Mining Techniques to estimate the patient data and analyse the requirement. Now big data happens to be a source for a million problems in the field of healthcare. www.ijtre.com Copyright 2015.All rights reserved. 3166

Comparison of the Algorithms: Table: Comparison of Algorithm From the above provided table, it is shown as, For Diabetes scenario the Naïve Bayes Algorithm is compared with Decision Tree Algorithm and K- Nearest Neighbour Algorithm. For Medical Fitness, Decision Tree Algorithm is compared with the Naïve Bayes Algorithm and Artificial Bee Colony Algorithm. For Emergency Admission, Linear Regression Algorithm is best suited in the means of efficiency when compared with K-Nearest Neighbour Algorithm and Artificial Bee Colony Algorithm. For Gastro Palpation, Decision Tree Algorithm is compared with Naïve Bayes Algorithm. Sequence Diagram Figure 6: Application Sequence Diagram Figure 7: Data processing, mining and Statistical simulation modelling workflow IV. CONCLUSION AND FUTURE SCOPE In this project I have used various data mining techniques with respect to the application of big data which shows the efficiency of the use of big data in healthcare field. Technology is getting upgraded every day, for an issue generated there comes many solutions to fix the issue yet the efficiency happens to be the main turnover for any technology to be accepted. Big data analysis happens to be the revolutionary technology that is changing the efficiency of the applications in the field of healthcare. With the help of various data mining techniques in big data the efficiency of the application happens to be a turning point which made the researchers across the world to look into big data. Using this application helps the doctors to predict and analyse possible solutions to help their patients. This turned out to be having the power to see future consequences by a doctor of a particular patient and help resolving the issue of health. REFERENCES [1] The Impact of Big Data on the Healthcare Information Systems Kuo Lane Chen, Huei Lee - Transactions of the International Conference on Health Information Technology Advancement 2013 [2] Big data security and privacy issues in healthcare, Nanthealth, Harsh Kupwade Patil and Ravi Seshadri, 2014 IEEE International Conference on Big Data [3] Using Decision Tree for Diagnosing Heart Disease Patients Mai Shouman, Tim Turner, Rob Stocker - 2011 Proceedings of the 9th Australasian Data Mining Conference,Australia [4] Decision Support in Heart Disease Prediction System usingnaive Bayes G.Subbalakshmi, K. Ramesh, M. Chinna Rao - Indian Journal of Computer Science and Engineering (IJCSE), 2011 [5] Optimization of Clustering Problem Using Population Based Artificial Bee Colony Algorithm: A Review, Twinkle Gupta, Dharmender Kumar, 2014 International Journal of Advanced Research in Computer Science and Software Engineering www.ijtre.com Copyright 2015.All rights reserved. 3167

[6] Implementation of Artificial Bee Colony Algorithm, Vimal Nayak, Haresh Suthar, Jagrut Gadit, 2012 IAES International Journal of Artificial Intelligence (IJ-AI) [7] A Layer Based Architecture for Provenance in Big Data, Ashiq Imran, Rajeev Agrawal, Jessie Walker, Anthony Gomes, 2014 IEEE International Conference on Big Data [8] A Big Data Framework for u-healthcare Systems Utilizing Vital Signs Tae-Woong Kim, Kwang-Ho Park, Sang-Hoon Yi, Hee-Cheol Kim - 2014 International Symposium on Computer, Consumer and Control www.ijtre.com Copyright 2015.All rights reserved. 3168