BIG DATA IN HEALTHCARE THE NEXT FRONTIER Divyaa Krishna Sonnad 1, Dr. Jharna Majumdar 2 2 Dean R&D, Prof. and Head, 1,2 Dept of CSE (PG), Nitte Meenakshi Institute of Technology Abstract: The world of health care has tremendously taken a vast up-gradation due to various situations which are both in the sides of positive and negative. Increasingly upgrading of technology at a phase where the old has turned out to be completely outdated which are no more applicable (or) of no use in the present modern digital world. In the field of health care the due to heavy technological up-gradation the accessibility of consumers has been increased and the amount of data being generated every day is huge. The huge amount of data being generated created a problem for both storage and accessibility. The problem was tried to be fixed by various means and turned out to be a non-fruitable one, so which finally turns the researchers view towards Big Data. Research in this area has been an ongoing process for over a decade. This paper proposes the use of Big Data for the storage and extraction of enormous data. Keywords: Hadoop, Big data I. INTRODUCTION Almost every day tremendous data is being generated where every bit of information that is being stored and being accessed has its own part to solve various problems.[1][2] In the field of health care the data is precious; every single information helps to solve a particular type of problem and also leads to solve even bigger problems. The databases are filling up as high speeds where the storage is a heavy problem; missing some information in this particular field may lead to casualties. As this cumbersome task of storing data is a high priority problem various databases are developed to solve it but still the solution is no no situation. To overcome this particular problem the researches took a step towards Big data.[1][2][3] The use of big data analysis in the field of health care seems to be the best logical solution for solving the problem. In the field of health care a nano second can determine the survival of a particular person, so data regarding the particular person, research information, past evaluation etc., all information is to be accessed. In the words of Executive Chairman of Google Mr.Eric Schmidt, From the very beginning of civilization of humans until the year 2003, human world had generated five Exabyte amount of data but now the modern human world generates that five Exabyte amount of data every two days and this pace is going on accelerating as the time changes. These words are just as true as we can find enormous amount of data in every aspect especially in the field of healthcare various information regarding the patients, research details, medical details, etc., have been generated. Using various data mining techniques in big data the problems of the existing technology of storing data can be resolved. II. PROPOSED WORK The project deals with specialized techniques which allow the consumers i.e., doctors to predict the future consequence for a particular patient. The application starts with the Home Page where we provide the users to login with there credentials or register to become an user. So if the user is a new user he goes for the registration where he register himself/herself as a doctor/patient. Once the registration is done the user can login into the application where the application is designed in a way thet there is no separate log-in for a patient/doctor, the application is designed to segrigate the particular users and provide the requirements. As a patient user, the user can enter all the essiential requirements related to himself/herself and will be assigned to a doctor. As a doctor user there are 6 flows in this application. Search Patient s Medical Fitness: Here the doctor can enter the unique id of the patient and checks the fitness of a particular patient. Estimate Patients Medical Fitness: Here in this flow the doctor enters Patient s Test Details and find the Medical Fitness of patients. Patient s Dosage Details: Here in this flow the doctor checks by clicking for all or In and Out Patient Dosage and gets to find medical details where again he canclick On Medicine for Medicine Detail and get Patient Dosage details. Medicine Price Details: Here in this flow the doctor www.ijtre.com Copyright 2015.All rights reserved. 3164
enters Medicine Code and gets to all list of Medicine Details once he/she clicks on a particular record the price ofmedicine is found. Emergency Admission: Inthis flow the doctor enter the admission total and finds out the results of the admissions. Gastro Palpation Details: In this flow the doctor enters patient s Palpation and gets the result of Rectal Exam. In this application the use of Data mining algorithms is the main functionality for Big Data Analytics. Here in this project five such algorithms have been used. They are as follows: Naïve Bayes Algorithm. Linear Regression Algorithm. Decision Tree Algorithm. K Nearest Neighbors Algorithm. Artificial Bee Colony Algorithm. The application and description of each algorithm is a follows: A. Naïve Bayes Algorithm: The Naive Bayes classifier is considered as a probabilistic Classifier which was developed basing the theorem of Bayes where there with strong independent assumptions. This particular algorithm is also called as Independent Feature Model as the parameters are independent with one another which taking up this algorithm. In this algorithm, even if any of the particular parameter is dependent over other parameter directly (or) indirectly this algorithm will consider them as independent only. The naïve bayes depending up on the particular nature of the probability model it is trained at a very high supervised learning setting. In many of its applications naive Bayes uses method of maximum likelihood. The main advantage of this algorithm is that it needs only a small amount of training data to estimate the parameters which are necessary for the entire classification. B. Decision Tree Algorithm Decision Tree is a predictive model which has the capability of maps observations regarding an item for concluding about the target value. It is a model which used in many data mining and statistics. In the tree model the target variables is able to take a finite set of variables and are called classification trees. In these particular trees the leaves represent the class labels and all the branches represent the conjunctions of different features that allow getting a lead to particular class labels. Decision Tree is one of the Data Mining techniques which cannot handle continuous variables directly. So, these particular continuous attributes are to be converted to discrete attributes, a process called Discretization. The Decision tree algorithm uses Binary discretization for continuous-valued features. However, the multi-interval discretization methods are known to produce more accurate decision tree than binary discretization. The main two issues that affect the performance of Decision Trees are: The data discretization method used The type of Decision Tree used. Fig 2: Flow Chart of Decision Tree Algorithm Figure 1: Flow Chart of Naïve Bayes Algorithm Linear Regression Algorithm Linear regression algorithm is one of the mathematical technique that relates one variable to another variable i.e., independent variable to a dependent variable which is in the form of an equation for a straight line. The linear equation is as follows, Where, y = Dependent Variable. a = Intercept. b = Slope of the line x = Independent Variable www.ijtre.com Copyright 2015.All rights reserved. 3165
Fig 3: Flow Chart of Linear Regression Algorithm C. K - Nearest Neighbors Algorithm: K Nearest Neighbors Algorithm is one of the type of predicting algorithm which predicts the next possible values based on the stores of all available cases and therefore classifies new cases based on similarity factor The algorithm specifies that a case is been classified by a majority no.of votes by its neighbors where a case being assigned to the class which is most common in amongst its available nearest neighbors which is called as (K nearest neighbors). This is measured by the use of a distance function. The algorithm brings up the issue of standardization of the numerical variables which are between 0 and 1 whenever there is a mixture of both numerical and categorical variables in the provided dataset. The KNN has been used for the statistical estimation and pattern recognition fields. D. Artificial Bee Colony Algorithm: The Artificial Bee Colony (ABC) algorithm is a unique and population based meta-heuristic algorithm. This algorithm was inspired by the intelligent behavior of honey bees. In this particular algorithm a methodology called clustering is been used where the data is taken in the form of groups called clusters. In these particular clusters all one related types are stored in one cluster i.e., similar data is saved under one particular cluster. The advantages of this algorithm are as follows: It employs a total of only three control parameters. It has a very fast convergence Speed. It is robust. Simple. Flexible It can be easily be optimized with any of the algorithms. It has three phases Onlooker bee Phase. Scout bee Phase. Employed bee Phase. Fig 5: Flow Chart of Artificial Bee Colony Algorithm Fig 4: Flow Chart of K-Nearest Neighbour Algorithm III. IMPLEMENTATION Big data is an upcoming technology where the utilization of big data in many of the application features in not yet implemented but whereas till now where ever big data has been implemented it gave tremendous results in the form of efficiency etc., in this project I am using Data Mining Techniques to estimate the patient data and analyse the requirement. Now big data happens to be a source for a million problems in the field of healthcare. www.ijtre.com Copyright 2015.All rights reserved. 3166
Comparison of the Algorithms: Table: Comparison of Algorithm From the above provided table, it is shown as, For Diabetes scenario the Naïve Bayes Algorithm is compared with Decision Tree Algorithm and K- Nearest Neighbour Algorithm. For Medical Fitness, Decision Tree Algorithm is compared with the Naïve Bayes Algorithm and Artificial Bee Colony Algorithm. For Emergency Admission, Linear Regression Algorithm is best suited in the means of efficiency when compared with K-Nearest Neighbour Algorithm and Artificial Bee Colony Algorithm. For Gastro Palpation, Decision Tree Algorithm is compared with Naïve Bayes Algorithm. Sequence Diagram Figure 6: Application Sequence Diagram Figure 7: Data processing, mining and Statistical simulation modelling workflow IV. CONCLUSION AND FUTURE SCOPE In this project I have used various data mining techniques with respect to the application of big data which shows the efficiency of the use of big data in healthcare field. Technology is getting upgraded every day, for an issue generated there comes many solutions to fix the issue yet the efficiency happens to be the main turnover for any technology to be accepted. Big data analysis happens to be the revolutionary technology that is changing the efficiency of the applications in the field of healthcare. With the help of various data mining techniques in big data the efficiency of the application happens to be a turning point which made the researchers across the world to look into big data. Using this application helps the doctors to predict and analyse possible solutions to help their patients. This turned out to be having the power to see future consequences by a doctor of a particular patient and help resolving the issue of health. REFERENCES [1] The Impact of Big Data on the Healthcare Information Systems Kuo Lane Chen, Huei Lee - Transactions of the International Conference on Health Information Technology Advancement 2013 [2] Big data security and privacy issues in healthcare, Nanthealth, Harsh Kupwade Patil and Ravi Seshadri, 2014 IEEE International Conference on Big Data [3] Using Decision Tree for Diagnosing Heart Disease Patients Mai Shouman, Tim Turner, Rob Stocker - 2011 Proceedings of the 9th Australasian Data Mining Conference,Australia [4] Decision Support in Heart Disease Prediction System usingnaive Bayes G.Subbalakshmi, K. Ramesh, M. Chinna Rao - Indian Journal of Computer Science and Engineering (IJCSE), 2011 [5] Optimization of Clustering Problem Using Population Based Artificial Bee Colony Algorithm: A Review, Twinkle Gupta, Dharmender Kumar, 2014 International Journal of Advanced Research in Computer Science and Software Engineering www.ijtre.com Copyright 2015.All rights reserved. 3167
[6] Implementation of Artificial Bee Colony Algorithm, Vimal Nayak, Haresh Suthar, Jagrut Gadit, 2012 IAES International Journal of Artificial Intelligence (IJ-AI) [7] A Layer Based Architecture for Provenance in Big Data, Ashiq Imran, Rajeev Agrawal, Jessie Walker, Anthony Gomes, 2014 IEEE International Conference on Big Data [8] A Big Data Framework for u-healthcare Systems Utilizing Vital Signs Tae-Woong Kim, Kwang-Ho Park, Sang-Hoon Yi, Hee-Cheol Kim - 2014 International Symposium on Computer, Consumer and Control www.ijtre.com Copyright 2015.All rights reserved. 3168