Mining Signatures in Healthcare Data Based on Event Sequences and its Applications



Similar documents
How To Cluster On A Search Engine

Profile Based Personalized Web Search and Download Blocker

A Review of Data Mining Techniques

IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS

Client Perspective Based Documentation Related Over Query Outcomes from Numerous Web Databases

How To Secure Cloud Computing, Public Auditing, Security, And Access Control In A Cloud Storage System

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Visibility optimization for data visualization: A Survey of Issues and Techniques

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Recommendation Tool Using Collaborative Filtering

Data Mining System, Functionalities and Applications: A Radical Review

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

IJCSES Vol.7 No.4 October 2013 pp Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

ANALYTICS IN BIG DATA ERA

Automatic Annotation Wrapper Generation and Mining Web Database Search Result

Prediction of Heart Disease Using Naïve Bayes Algorithm

Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Role of Social Networking in Marketing using Data Mining

Load Distribution in Large Scale Network Monitoring Infrastructures

SPATIAL DATA CLASSIFICATION AND DATA MINING

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad

IEEE JAVA Project 2012

ISSN: A Review: Image Retrieval Using Web Multimedia Mining


Intinno: A Web Integrated Digital Library and Learning Content Management System

How To Predict Web Site Visits

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

A UPS Framework for Providing Privacy Protection in Personalized Web Search

PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL

Keywords: Information Retrieval, Vector Space Model, Database, Similarity Measure, Genetic Algorithm.

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Introduction. A. Bellaachia Page: 1

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

A Dynamic Approach to Extract Texts and Captions from Videos

An Overview of Knowledge Discovery Database and Data mining Techniques

Natural Language to Relational Query by Using Parsing Compiler

Log Mining Based on Hadoop s Map and Reduce Technique

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Distributed Framework for Data Mining As a Service on Private Cloud

Search Result Optimization using Annotators

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Optimization of Image Search from Photo Sharing Websites Using Personal Data

IMPLEMENTATION OF RELIABLE CACHING STRATEGY IN CLOUD ENVIRONMENT

Bayesian networks - Time-series models - Apache Spark & Scala

Unsupervised Data Mining (Clustering)

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING

How To Use Neural Networks In Data Mining

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Topics in basic DBMS course

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Spam Detection Using Customized SimHash Function

Distributed Database for Environmental Data Integration

Financial Trading System using Combination of Textual and Numerical Data

Implementation of P2P Reputation Management Using Distributed Identities and Decentralized Recommendation Chains

IT services for analyses of various data samples

Clustering Technique in Data Mining for Text Documents

Implementation of hybrid software architecture for Artificial Intelligence System

LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE.

ISSN: (Online) Volume 3, Issue 7, July 2015 International Journal of Advance Research in Computer Science and Management Studies

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Steven C.H. Hoi School of Information Systems Singapore Management University

A Framework of User-Driven Data Analytics in the Cloud for Course Management

Data, Measurements, Features

Exploring Resource Provisioning Cost Models in Cloud Computing

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

A Mind Map Based Framework for Automated Software Log File Analysis

Analysis on Leveraging social networks for p2p content-based file sharing in disconnected manets

Knowledge Discovery from patents using KMX Text Analytics

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

Transcription:

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1 siddhanth.gokarap@gmail.com 2 Assistant Professor, Computer Science & Engineering Department, Auroras Research & Technological Institute, Warangal, India 2 laxmanjakkani.mtech@gmail.com REVIEW ARTICAL Abstract In the present computer era of temporal event signature mining which is performed for knowledge discovery as it is considered to be a difficult problem and in this paper we propose a framework to identify the temporal knowledge about identification of large scale signature mining of longitudinal heterogeneous event data framework that mainly deals with the mining of high order latent event structure with respect to its relationship within single and multiple event sequences that are based on the heterogeneous event sequence maps for implementing the geometric image by encoding structure into spatial temporal data. In this paper we propose a probabilistic language that models the usage and extraction process of higher order events from large scale data sources by implementing the constrained conventional sparse matrix coding pattern to learn interpretable and shift invariant based latent temporal event signature and its applications. Another domain that deals with the identification of symptoms is the surveillance domain where the temporal event signatures are aided and moved to the detection of various suspicious events that tends to specify at a specific location or the variance value being generated based on the outcome of an approach. In this paper we consider the patient to be an entity in the medical domain where a doctor tends to visit the patient and the doctor s office is considered to be an event and the identification and implementation of temporal event signature mining for knowledge is to discover the difficult problem where the vast amounts of complex events data pose present day challenges. Keywords Data mining, Signature mining, knowledge representation, Probabilistic Model, Cluster. I. INTRODUCTION In the present computer era of web data mining comparing alternative options are considered to be one of the most essential steps in performing decision making that we carry out every day such as purchasing of mobile phone, then he or she would like to know various alternatives that can be further compared before purchasing the product. One of the major task is to identify and find the latent temporal signatures is considered to be an important task in many domains that pertains the process of encoding and decoding the temporal concepts that relates to various even trends of cycles or episodes and various abnormalities viewed in the patient files that are available in the present day medical domain and issues pertaining them along with the procedure of the cure adopted by the practitioner doctor [1]. Fig.1. System Architecture In this paper we try to answer two fundamental questions or queries for addressing the challenge such as acquiring of appropriate knowledge for representing the data for performing data mining based on its longitudinal event data and we also need to learn how to represent large and complex datasets. The possible event knowledge representation (EKR) is performed very effectively using the human capabilities and the possibilities of implementation on the complex event data pertaining vital information and can perform the data penetration quickly and transformed into actionable knowledge base. 201

In this paper we need to address some of the problems pertaining the research area are based on few aspects such as: We need to identify the process of handling the data related to EKR based on the concept of time invariant representation of multiple event entities that can be considered to be main event entities that typically consists of same temporal signatures taken at different time variant intervals or locations as we consider the EKR to be more flexible for representing different types of event data structures. The process of performing data mining on data pertaining to signatures in the healthcare environment where the data is based on various event sequences or happening of events and its applications such as single multivariate events which are allowed in time variant intervals for allowing rich representation of very complex data and their relationships based on distinct events. Almost all the event signatures and the applications that are designed based on the EKR should be specified to be scalable and must support various analysis and inference on large scale databases that are based on the construction of the event matrix representation and creating a learning framework to perform the temporal signature data mining for large scale data related to longitudinal and heterogeneous event datasets. II. RELATED WORK Most of the researchers have published many papers in this area where few of the major concerns which we recognized are: In the reference provided by Linden provides us with the process of identifying and discovering the related signatures in an entity our work must be similar to the research which is based on the recommender systems that comprises of recommend signatures of possible user where the recommender system mainly rely on similarities between signatures and or its available or derived statistical correlations that are directly based on the user log data [1]. In the reference provided by Ravichandran provides us with the recommended health products to its patients that are based on their own anomalies histories for similar patients who previously have the disease histories and similarity search is performed between various patients where the recommending of an item is not considered to be equivalent for finding a comparable cure data items [2]. products that are centrally available in the proposed system health care unit [3]. In the reference provided by Jeh provides us with the process of comparing various cases of comparison which would like to help users to explore various alternatives and for helping them to make an appropriate decision making process among various comparable patient disease sets and the possible comparative questions that are intended and are posted by users which are considered to be predicted and are simply based on the disease similarity [4]. In the reference provided by Jindal provides us with the process of mining comparative sentences and their relations as the method that is proposed users the class sequential rules that are intended to perform various label sequential rules that is learned from annotated corpora to identify various comparative sentences and extract comparative relations on every possible domain [5]. The process of ensuring high recall is considered to be crucial in the intended application based scenario where every user can raise or issue any number of arbitrary queries for addressing the problem for developing weakly supervised bootstrapping pattern or the learning method that is based on the process of effectively leveraging unlabeled questions that are imposed [6]. In the reference provided by Smeulders provides us with a classic content based question retrieval (CBQR) system that takes a solo query image as requesting source for retrieving similar images based on the localized content based image retrieval system as CBIRs primary task where a user creates a interested portion of the image as input set and will leave the rest as irrelevant content and then the user will explicitly mark the region of interest domain as a localized CBIR which must rely on more number of images that are labeled as positive or negative based on the criteria to learn what portion of the image is considered to be an interest to the end user as the challenge is further localized to represent the image[7]. In the reference provided by Cristianini provides us with Training support vector machine that requires resolving of quadratic programming problem in a number of coefficients that are equal to the total number of training examples from the numeric techniques for a specific QP which is considered to be infeasible for very large dataset by implementing the practical techniques for decomposing the problem [8]. In the reference provided by Kozareva provides us with the process of purposing various recommendations is to identify their proposed customers for performing addition of more item sets to perform shopping and analyzing the shopping carts which is suggested by most of the similar or related item sets which later tends to compare various 202

III. EXISTING SYSTEM In the present computer era the usage of internet and world wide web is being dramatically increasing at a lightning speed due to immense usage of internet either through computers or through mobile phones and the web or internet acts as a medium of exchange for most of the user for obtaining more services for lesser or cheaper costs and the information that is available in the web useful not only to an individual user but also helpful to most of the business organizations or hospitals or educational purposes and many more research areas such as health care unit. There are many different approaches and clashes in the medical and health care sectors and despite of those differences the targeted sector has more need for data mining today that is performing mining of diseases and the cure provided based on the medication provided to the patient. IV. PROPOSED SYSTEM In this paper we propose the process of mining signatures in healthcare data based on event sequences and its applications that is purely based on the process of perfroming the hospital data mining on comparable relative questions set which is used or understanding user search goal that is based on the process of prediction and comparing the search goal as it tends to provided various process that tends to initiate when a user clicks through logs that are analyzed for each and every session to propose a feedback session which is considered to a better approach for representing the user clicks through log files directly where each and every session is analyzed and represented in the question bank session. The health sector not only covers just concerns of public health but also the private health sector data that tends to overload with the wealth of knowledge that is proportionally gained from computerized e-health records that comprises of overwhelming and bulk of data streams that are stored in these databases which makes it extremely difficult for humans to sift through it and discover knowledge required by the patient. Some of the present day researchers and the technical medicos and experts believe that the medical breakthroughs have slowed down while attributing the available prohibitive scale and the available complexity of the present day medical information pertaining systems or computers that consists of enormous amounts of data to perform data mining are best suited in this purposed paper by us. Most of the similar techniques can be adopted to make the functional properties of any mining signatures in healthcare data based on event sequences and its proposed applications implementation and the process of attaining the comparator extractions that are intended from possible set of queries or the sub-queries as specified to perform the predictive entity mining in medical data mining. By implementing and applying the data mining approach at the medical institutions may lead to enormous encouraging results as the data that exists on daily or hourly basis is the mainly targeted data where we can discover new and useful though potentially lifesaving knowledge that may save a patient in time. For an instance we need to consider the study of hospitals and their safety which may be found around 80 to 90 percent in most of the hospitals and sometimes inverse of what has been specified due to which most of the hospital deaths may be occurred. Fig.2. proposed system architecture The process of evaluating various patterns will tend to bring the incomplete awareness about consistent comparator process in the targeted pairs and some of the very little reliable pairs are considered to be normally exposed in the early hours stage of bootstrapping where the situation of the importance might be considered to be miscalculated and could show the impact on the efficiency from a non reliable pattern for individual IEPs that are intended. We need to provide a moderate proposed system or the framework or the architecture as shown in the figure 2 will provided the tackling power of the crisis by a look ahead process that tends to signify the product deposit of applicant patterns at every possible iteration and we also describe how to maintain he comparator pair such as doctor details then patient details and event details pertaining to a patient. The process of user click through data log which contains data for performing the interactions between users and the targeted tests engine that consists of most extensive or the indirect survey for attaining the user experiences. 203

V. RESULTS We have implemented our proposed system in java and some of the screens are: Fig.7. Medical Tests Screen Fig.3. Login page of the proposed system Fig.4. Homepage of the proposed system Fig.8. Operation Theater Screen Fig.5. Doctors Home Screen Fig.9. Rendezvous Screen Fig.6. Patient Home Screen Fig.10. Discharge Summary Screen 204

VI. CONCLUSION AUTHOR In this paper we have proposed the mining signatures in healthcare data based on event sequences and its various applications where the framework tends to have wide applicability of possessing variety of data related to applications and their domains which involves largescale longitudinal event data. In this paper we have demonstrated that our proposed framework which is implemented in academic environment in our college is able to cope up with the double sparsity problem that induces the double sparsity constraint upto a maximum extent in large electronic health records based databases and their data sets. Siddhanth Gokarapu, is presently pursuing his M.Tech degree in Software Engineering of Computer Science and Engineering department from Aurora s Research & Technological Institute affiliated to JNTU Hyderabad. J. Laxmi Narayana, is presently working as Assistant Professor in Aurora s Research & Technological Institute and he has a total teaching experience of 6 years. REFERENCES [1] G. Linden, B. Smith, and J. York, Amazon.com Recommendations: Item-to-Item Collaborative Filtering, IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan./Feb. 2003. [2] D. Ravichandran and E. Hovy, Learning Surface Text Patterns for a Question Answering System, Proc. 40th Ann. Meeting on Assoc. for Computational Linguistics (ACL 02), pp. 41-47, 2002. [3] Z. Kozareva, E. Riloff, and E. Hovy, Semantic Class Learningfrom the Web with Hyponym Pattern Linkage Graphs, Proc. Ann. Meeting of the Assoc. for Computational Linguistics: Human Language Technologies (ACL-08: HLT),pp. 1048-1056, 2008. [4] G. Jeh and J. Widom, Scaling Personalized Web Search, Proc. 12th Int l Conf. World Wide Web (WWW 02),pp. 271-279, 2003. [5] N. Jindal and B. Liu, Identifying Comparative Sentences in Text Documents, Proc. 29th Ann. Int l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR 06),pp. 244-251, 2006. [6] E. Riloff and R. Jones, Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping, Proc. 16th Nat l Conf. Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conf. (AAAI 99/IAAI 99),pp. 474-479, 1999. [7] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, Content-Based Image Retrieval, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349-1380, Dec. 2000. [8] T.-T. Frieß, N. Cristianini and C. Campbell, The Kernel Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines, in 15th Int. Conf. Machine Learning,Morgan Kaufman, 1998. [9] C. Cardie, Empirical Methods in Information Extraction, Artificial Intelligence Magazine,vol. 18, pp. 65-79, 1997. [10] D. Gusfield,Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology.Cambridge Univ. Press, 1997. 205