Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1 siddhanth.gokarap@gmail.com 2 Assistant Professor, Computer Science & Engineering Department, Auroras Research & Technological Institute, Warangal, India 2 laxmanjakkani.mtech@gmail.com REVIEW ARTICAL Abstract In the present computer era of temporal event signature mining which is performed for knowledge discovery as it is considered to be a difficult problem and in this paper we propose a framework to identify the temporal knowledge about identification of large scale signature mining of longitudinal heterogeneous event data framework that mainly deals with the mining of high order latent event structure with respect to its relationship within single and multiple event sequences that are based on the heterogeneous event sequence maps for implementing the geometric image by encoding structure into spatial temporal data. In this paper we propose a probabilistic language that models the usage and extraction process of higher order events from large scale data sources by implementing the constrained conventional sparse matrix coding pattern to learn interpretable and shift invariant based latent temporal event signature and its applications. Another domain that deals with the identification of symptoms is the surveillance domain where the temporal event signatures are aided and moved to the detection of various suspicious events that tends to specify at a specific location or the variance value being generated based on the outcome of an approach. In this paper we consider the patient to be an entity in the medical domain where a doctor tends to visit the patient and the doctor s office is considered to be an event and the identification and implementation of temporal event signature mining for knowledge is to discover the difficult problem where the vast amounts of complex events data pose present day challenges. Keywords Data mining, Signature mining, knowledge representation, Probabilistic Model, Cluster. I. INTRODUCTION In the present computer era of web data mining comparing alternative options are considered to be one of the most essential steps in performing decision making that we carry out every day such as purchasing of mobile phone, then he or she would like to know various alternatives that can be further compared before purchasing the product. One of the major task is to identify and find the latent temporal signatures is considered to be an important task in many domains that pertains the process of encoding and decoding the temporal concepts that relates to various even trends of cycles or episodes and various abnormalities viewed in the patient files that are available in the present day medical domain and issues pertaining them along with the procedure of the cure adopted by the practitioner doctor [1]. Fig.1. System Architecture In this paper we try to answer two fundamental questions or queries for addressing the challenge such as acquiring of appropriate knowledge for representing the data for performing data mining based on its longitudinal event data and we also need to learn how to represent large and complex datasets. The possible event knowledge representation (EKR) is performed very effectively using the human capabilities and the possibilities of implementation on the complex event data pertaining vital information and can perform the data penetration quickly and transformed into actionable knowledge base. 201
In this paper we need to address some of the problems pertaining the research area are based on few aspects such as: We need to identify the process of handling the data related to EKR based on the concept of time invariant representation of multiple event entities that can be considered to be main event entities that typically consists of same temporal signatures taken at different time variant intervals or locations as we consider the EKR to be more flexible for representing different types of event data structures. The process of performing data mining on data pertaining to signatures in the healthcare environment where the data is based on various event sequences or happening of events and its applications such as single multivariate events which are allowed in time variant intervals for allowing rich representation of very complex data and their relationships based on distinct events. Almost all the event signatures and the applications that are designed based on the EKR should be specified to be scalable and must support various analysis and inference on large scale databases that are based on the construction of the event matrix representation and creating a learning framework to perform the temporal signature data mining for large scale data related to longitudinal and heterogeneous event datasets. II. RELATED WORK Most of the researchers have published many papers in this area where few of the major concerns which we recognized are: In the reference provided by Linden provides us with the process of identifying and discovering the related signatures in an entity our work must be similar to the research which is based on the recommender systems that comprises of recommend signatures of possible user where the recommender system mainly rely on similarities between signatures and or its available or derived statistical correlations that are directly based on the user log data [1]. In the reference provided by Ravichandran provides us with the recommended health products to its patients that are based on their own anomalies histories for similar patients who previously have the disease histories and similarity search is performed between various patients where the recommending of an item is not considered to be equivalent for finding a comparable cure data items [2]. products that are centrally available in the proposed system health care unit [3]. In the reference provided by Jeh provides us with the process of comparing various cases of comparison which would like to help users to explore various alternatives and for helping them to make an appropriate decision making process among various comparable patient disease sets and the possible comparative questions that are intended and are posted by users which are considered to be predicted and are simply based on the disease similarity [4]. In the reference provided by Jindal provides us with the process of mining comparative sentences and their relations as the method that is proposed users the class sequential rules that are intended to perform various label sequential rules that is learned from annotated corpora to identify various comparative sentences and extract comparative relations on every possible domain [5]. The process of ensuring high recall is considered to be crucial in the intended application based scenario where every user can raise or issue any number of arbitrary queries for addressing the problem for developing weakly supervised bootstrapping pattern or the learning method that is based on the process of effectively leveraging unlabeled questions that are imposed [6]. In the reference provided by Smeulders provides us with a classic content based question retrieval (CBQR) system that takes a solo query image as requesting source for retrieving similar images based on the localized content based image retrieval system as CBIRs primary task where a user creates a interested portion of the image as input set and will leave the rest as irrelevant content and then the user will explicitly mark the region of interest domain as a localized CBIR which must rely on more number of images that are labeled as positive or negative based on the criteria to learn what portion of the image is considered to be an interest to the end user as the challenge is further localized to represent the image[7]. In the reference provided by Cristianini provides us with Training support vector machine that requires resolving of quadratic programming problem in a number of coefficients that are equal to the total number of training examples from the numeric techniques for a specific QP which is considered to be infeasible for very large dataset by implementing the practical techniques for decomposing the problem [8]. In the reference provided by Kozareva provides us with the process of purposing various recommendations is to identify their proposed customers for performing addition of more item sets to perform shopping and analyzing the shopping carts which is suggested by most of the similar or related item sets which later tends to compare various 202
III. EXISTING SYSTEM In the present computer era the usage of internet and world wide web is being dramatically increasing at a lightning speed due to immense usage of internet either through computers or through mobile phones and the web or internet acts as a medium of exchange for most of the user for obtaining more services for lesser or cheaper costs and the information that is available in the web useful not only to an individual user but also helpful to most of the business organizations or hospitals or educational purposes and many more research areas such as health care unit. There are many different approaches and clashes in the medical and health care sectors and despite of those differences the targeted sector has more need for data mining today that is performing mining of diseases and the cure provided based on the medication provided to the patient. IV. PROPOSED SYSTEM In this paper we propose the process of mining signatures in healthcare data based on event sequences and its applications that is purely based on the process of perfroming the hospital data mining on comparable relative questions set which is used or understanding user search goal that is based on the process of prediction and comparing the search goal as it tends to provided various process that tends to initiate when a user clicks through logs that are analyzed for each and every session to propose a feedback session which is considered to a better approach for representing the user clicks through log files directly where each and every session is analyzed and represented in the question bank session. The health sector not only covers just concerns of public health but also the private health sector data that tends to overload with the wealth of knowledge that is proportionally gained from computerized e-health records that comprises of overwhelming and bulk of data streams that are stored in these databases which makes it extremely difficult for humans to sift through it and discover knowledge required by the patient. Some of the present day researchers and the technical medicos and experts believe that the medical breakthroughs have slowed down while attributing the available prohibitive scale and the available complexity of the present day medical information pertaining systems or computers that consists of enormous amounts of data to perform data mining are best suited in this purposed paper by us. Most of the similar techniques can be adopted to make the functional properties of any mining signatures in healthcare data based on event sequences and its proposed applications implementation and the process of attaining the comparator extractions that are intended from possible set of queries or the sub-queries as specified to perform the predictive entity mining in medical data mining. By implementing and applying the data mining approach at the medical institutions may lead to enormous encouraging results as the data that exists on daily or hourly basis is the mainly targeted data where we can discover new and useful though potentially lifesaving knowledge that may save a patient in time. For an instance we need to consider the study of hospitals and their safety which may be found around 80 to 90 percent in most of the hospitals and sometimes inverse of what has been specified due to which most of the hospital deaths may be occurred. Fig.2. proposed system architecture The process of evaluating various patterns will tend to bring the incomplete awareness about consistent comparator process in the targeted pairs and some of the very little reliable pairs are considered to be normally exposed in the early hours stage of bootstrapping where the situation of the importance might be considered to be miscalculated and could show the impact on the efficiency from a non reliable pattern for individual IEPs that are intended. We need to provide a moderate proposed system or the framework or the architecture as shown in the figure 2 will provided the tackling power of the crisis by a look ahead process that tends to signify the product deposit of applicant patterns at every possible iteration and we also describe how to maintain he comparator pair such as doctor details then patient details and event details pertaining to a patient. The process of user click through data log which contains data for performing the interactions between users and the targeted tests engine that consists of most extensive or the indirect survey for attaining the user experiences. 203
V. RESULTS We have implemented our proposed system in java and some of the screens are: Fig.7. Medical Tests Screen Fig.3. Login page of the proposed system Fig.4. Homepage of the proposed system Fig.8. Operation Theater Screen Fig.5. Doctors Home Screen Fig.9. Rendezvous Screen Fig.6. Patient Home Screen Fig.10. Discharge Summary Screen 204
VI. CONCLUSION AUTHOR In this paper we have proposed the mining signatures in healthcare data based on event sequences and its various applications where the framework tends to have wide applicability of possessing variety of data related to applications and their domains which involves largescale longitudinal event data. In this paper we have demonstrated that our proposed framework which is implemented in academic environment in our college is able to cope up with the double sparsity problem that induces the double sparsity constraint upto a maximum extent in large electronic health records based databases and their data sets. Siddhanth Gokarapu, is presently pursuing his M.Tech degree in Software Engineering of Computer Science and Engineering department from Aurora s Research & Technological Institute affiliated to JNTU Hyderabad. J. Laxmi Narayana, is presently working as Assistant Professor in Aurora s Research & Technological Institute and he has a total teaching experience of 6 years. REFERENCES [1] G. Linden, B. Smith, and J. York, Amazon.com Recommendations: Item-to-Item Collaborative Filtering, IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan./Feb. 2003. [2] D. Ravichandran and E. Hovy, Learning Surface Text Patterns for a Question Answering System, Proc. 40th Ann. Meeting on Assoc. for Computational Linguistics (ACL 02), pp. 41-47, 2002. [3] Z. Kozareva, E. Riloff, and E. Hovy, Semantic Class Learningfrom the Web with Hyponym Pattern Linkage Graphs, Proc. Ann. Meeting of the Assoc. for Computational Linguistics: Human Language Technologies (ACL-08: HLT),pp. 1048-1056, 2008. [4] G. Jeh and J. Widom, Scaling Personalized Web Search, Proc. 12th Int l Conf. World Wide Web (WWW 02),pp. 271-279, 2003. [5] N. Jindal and B. Liu, Identifying Comparative Sentences in Text Documents, Proc. 29th Ann. Int l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR 06),pp. 244-251, 2006. [6] E. Riloff and R. Jones, Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping, Proc. 16th Nat l Conf. Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conf. (AAAI 99/IAAI 99),pp. 474-479, 1999. [7] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, Content-Based Image Retrieval, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349-1380, Dec. 2000. [8] T.-T. Frieß, N. Cristianini and C. Campbell, The Kernel Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines, in 15th Int. Conf. Machine Learning,Morgan Kaufman, 1998. [9] C. Cardie, Empirical Methods in Information Extraction, Artificial Intelligence Magazine,vol. 18, pp. 65-79, 1997. [10] D. Gusfield,Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology.Cambridge Univ. Press, 1997. 205