Temporal Data Mining in Hospital Information Systems: Analysis of Clinical Courses of Chronic Hepatitis

Size: px
Start display at page:

Download "Temporal Data Mining in Hospital Information Systems: Analysis of Clinical Courses of Chronic Hepatitis"

Transcription

1 Vol. 1, No. 1, Issue 1, Page 11 of 19 Copyright 2007, TSI Press Printed in the USA. All rights reserved Temporal Data Mining in Hospital Information Systems: Analysis of Clinical Courses of Chronic Hepatitis Shoji Hirano and Shusaku Tsumoto Department of Medical Informatics, Shimane University, School of Medicine 89-1 Enya-cho, Izumo, Shimane , Japan {hirano, Received 1 January 2007; revised 2 February 2007, accepted 3 March 2007 Abstract This paper presents a new approach to finding interesting knowledge from temporal data on chronic diseases based on the combination of advanced sequence comparison techniques and cluster analysis procedure. First we briefly introduce the cluster analysis system for temporal data that we have developed. Second, we apply it to the analysis of platelet (PLT) count data on chronic viral hepatitis patients. Third, we show the results of PLT value-based temporal analysis, conducted based on the results of cluster analysis, aiming at finding years for reaching F4 (liver fibrosis stage four), years elapsed between stages, and their relationships with virus types and fibrotic stages. The results conveyed some interesting findings; (1) the temporal courses of PLT could be grouped into several patterns exhibiting similar average PLT level and increase/decrease trends, and (2) liver fibrosis might proceed faster in some exacerbating cases. Keywords Temporal Data Mining, Multiscale Matching, Clustering, Chronic Hepatitis, KDD Process 1. INTRODUCTION Steady operations of hospital information systems over the past two decades gave them a new role of archiving temporal data about long-term condition of patients, in addition to their basic function of providing information necessary for daily clinical services. Such archives of longitudinal, time-series data can be used as a new source for retrospective study on chronic diseases, which may lead to the discovery of novel knowledge useful for diagnosis or treatment. However, large-scale, cross-patient analysis of time-series medical data is a challenging task because of the multidimensionality and temporal irregularity of data caused by the variety of laboratory tests and change of patient conditions over time, as well as the difficulty in determining observation scales appropriate for capturing short-term and long-term events. Therefore, practical application of data mining methods to longitudinal medical time-series data is still limited. In this paper, we present a new approach to finding interesting knowledge from temporal data on chronic diseases based on the combination of advanced sequence comparison techniques and cluster analysis procedure. First we briefly introduce the cluster analysis system for temporal data that we have developed. Second, we apply it to the analysis of platelet (PLT) count data on chronic viral hepatitis patients. Platelet count has been receiving considerable interests as an index for liver dysfunctions, because a hematogenetic factor called thrombopoietin [1], which facilitates the production of platelets, is produced in the liver. Matsumura et al. reported that the PLT count correlated with fibrotic stage [2]. PLT counts were significantly different among the patients of different fibrotic stages, with the characteristics that PLT count becomes smaller as the liver fibrosis proceeds [2]. However, few studies investigate the temporal relationships between the decrease patterns of PLT and progress of liver fibrosis using time series data of individual patients. Our results of cluster analysis indicate that the temporal courses of PLT can be grouped into several patterns, each of which presents similarity in average PLT level and increase/decrease trends. Third, we show the results of PLT value-based temporal analysis aiming at finding years for reaching F4 (fibrosis stage 4), years elapsed between stages, and their relationships with virus types and fibrotic stages. This value-based analysis was conducted based on the observation of quickly decreasing patterns revealed through the cluster analysis. The results of value-based analysis

2 suggest that liver fibrosis may proceed faster in exacerbating cases. 2. CLUSTER ANALYSIS SYSTEM The cluster analysis system we have developed consists of two components, sequence comparison and clustering, in order to utilize advanced sequence comparison methods that can handle the temporal irregularity of medical data. In sequence comparison part, two methods were implemented: dynamic time warping (DTW) [5,6] and modified multiscale structure matching (MMSM) [8]. In clustering part, it employs two methods: conventional hierarchical clustering (HC) [4] and rough set-based clustering (RC) [7]. The sequence comparison part performs pairwise comparison for all possible pairs of time series, and then produces a dissimilarity matrix. The clustering part performs grouping of the time series according to the given dissimilarity matrix. Figure 1 provides a screenshot of the system. The left window shows a dendrogram which is generated when using HC as the clustering method. The right window shows constitution of the clusters, as well as the number of cases in each cluster. When a cluster is selected by a user, sequences that belong to the cluster are visualized. Both windows are related internally. When a user specifies a cutting point on the dendrogram, corresponding cluster constitution and sequences are displayed interactively. 3. CLUSTER ANALYSIS OF TIME-SERIES PLT COUNT DATA Data Sets We employed the chronic hepatitis dataset [3], which was provided as a common dataset for ECML/PKDD Discovery Challenge 2002 and The dataset contained time-series data on laboratory examination which were collected at a university hospital in Japan. The subjects were 771 patients of Type B and Type C chronic viral hepatitis who received hospital laboratory examinations during the period from 1982 to A total of 720 patients received at least one examination on platelet count. Out of these 720 cases, 222 were removed from analysis because their biopsy information was not available and additional 10 were removed because of their short examination periods (less than 2 weeks) Consequently, a total of 488 series were used for analysis. Experimental Procedure Below we show the procedure of cluster analysis. 1. Sequence rebuild: Rearrange PLT data of each patient into one-week interval by linear interpolation. 2. Dataset split by virus types and administration of interferon (IFN) therapy: Split the dataset into Type B and Type C cases, and further the Type C cases into Type C with IFN therapy and Type C without IFN therapy cases. We call these subsets as Type B subset, Type C with IFN subset and Type C without IFN subset. The number of cases in each subset was as follows: Type B = 193, Type C with Figure 1. Cluster analysis system for time-series. 12

3 IFN = 196, Type C without IFN = 99. The following procedures were applied independently to each subset. 3. Creation of a dissimilarity matrix: Perform a comparison of two PLT sequences by the modified multiscale matching. Apply this process to every possible pair of sequences in the subset to fill in the dissimilarity matrix. In order to perform comprehensive comparison, we set the parameters for multiscale matching as follows: the number of scales = 150, starting scale = 0.1, scale interval = 0.5. The weight for replacement cost was set to 0.2 according to a preparatory experiment. 4. Cluster analysis: Generate dendrograms by agglomerative hierarchical clustering and perform cluster analysis. We employed group average as a cluster merge criterion. Figure 2 shows the three dendrograms obtained from Type B, Type C with IFN and Type C without IFN subsets, respectively. We manually determined cutting points on the dendrograms so that the clusters represent global structure of the data while retaining the meaningful features of sequences. Consequently, we obtained 16, 23 and 6 clusters respectively for each subset. A horizontal line on the dendrogram represents the cutting point. Table 1 provides the constitution of clusters stratified by the fibrotic stage. The three sub-tables respectively correspond to, from left to right, Type B, Type C with IFN and Type C without IFN subsets. Each row in a table represents one cluster. The leftmost column contains cluster number. Subsequent five columns contain the number of cases in the cluster stratified by fibrotic stages (F0-F4). The rightmost column contain the total number of cases in the cluster. The tables implied that clusters could be roughly classified into two categories: (1) a cluster containing high stage (progressed) cases, and (2) a cluster containing low (early) stage cases Figure 2. Dendrograms for PLT sequences. Left: Type B, Middle: Type C with IFN, Right: Type C without IFN. Table 1. Cluster constitutions w.r.t. fibrotic stages. Small clusters (less than 3 cases) were omitted. Left: Type B, Center: Type C with IFN, Right: Type C without IFN. B C IFN C noifn Cls # of Cases / Fibrosis Stage # of Cases / Fibrosis Stage # of Cases / Fibrosis Stage Total Cls Total Cls F0 F1 F2 F3 F4 F0 F1 F2 F3 F4 F0 F1 F2 F3 F4 Total

4 Due to space limitation, we mainly describe about the results on Type C with IFN subset. According to the middle table in Table 1, there were two remarkable clusters containing many progressed cases (F4 or F3): cluster 5 (8/11) and 8 (25/40). Additionally, there were other three remarkable clusters containing many early-stage (F0-F2) cases: 11 (34/46), 12 (33/42) and 23 (18/19). Figure 3 provides examples of sequences grouped into clusters 5 and 8, respectively. Each figure is composed of 16 sub-windows and each sub-window contains one sequence. The two horizontal lines in each sub-window represent normal high ( /µl) and normal low ranges ( /µl) respectively. In cluster 5, most of the sequences represented decreasing/flat courses below the normal low range, meaning the severe states of the patients. Sequences in cluster 8 exhibited the similar courses, but with slightly higher values than those in cluster 5. Figure 4 provides sequences grouped into clusters 11, 12 and 23. In contrast to clusters 5 and 8, sequences in these clusters represented flat courses maintaining the normal range. Clusters 11, 12 and 23 would differentiate the global PLT levels: low, middle and high respectively. Other interesting courses were found on clusters 4, 6 and 10, that demonstrated obviously decreasing or increasing patterns as shown in Figure 5. The left in Figure 5 provides sequences in cluster 4 (F1=3,F2=1). While 3/4 of them were on stage F1, PLT counts continued decreasing and finally reached below the normal low level in relatively short period. The middle of Figure 5 shows sequences in cluster 6 (F1=1,F3=1,F4=1). The global levels were lower than those in cluster 4, that might be caused by F3 and F4 cases. The bottom provides sequences in cluster 10 (F1=1,F3=1,F4=3), which represent recovery courses after IFN therapy. We observed similarly interesting patterns on the other two subsets. Below we summarize the findings. 1. In both type B and C, some clusters contained relatively large numbers of progressed cases. PLT count in these cases commonly represented decrease or flat courses going Figure 3. Clusters containing many cases of progressed-stage (F4 or F3) (Type C with IFN). Left: cluster 5. Center and Right: cluster 8 (32 cases selected by MID order). Figure 4. Clusters containing many cases of early-stage (F0, F1 or F2) (Type C with IFN). Left: cluster 11. Center: cluster 12. Right: cluster 23. (16 cases selected by MID order). 14

5 Figure 5. Clusters containing remarkably increase/decrease cases (Type C with IFN). Left: cluster 4. Center: cluster 6. Right: cluster 10 below the normal low level. Some F1 and F2 cases represented similarly low level as F4 cases. (Type B cluster 7, Type C with IFN cluster 5). 2. In both type B and C, some clusters contained relatively large numbers of early-stage cases. PLT count in these cases commonly represented flat courses going within the normal range (Type B cluster 5, 15, 16, Type C with IFN cluster 11, 12, 23). F4 cases might retain the normal range; however, the number of such cases in a cluster decreased following the global PLT levels of the cluster. (Type B cluster: 16>15>5, Type C with IFN cluster: 11=12>23). 3. In type C, there were remarkable cases including F1 and F2 cases in which PLT count continuously decreased and finally reached below the normal range. (Type C with IFN clusters 4 and 6, Type C without IFN cluster 1). In type C without IFN, the decreasing trend was observed rather frequently. (Type C without IFN clusters 1 and 3). 4. In type C with IFN, there were F4 cases in which PLT levels increased toward the normal range after IFN administration 4. ANALYSIS OF YEARS FOR REACHING F4 AND ELAPSED YEARS BETWEEN STAGES BASED ON THE PLT COUNTS Determination of the stage of liver fibrosis is usually done with liver biopsy which is an invasive examination. In recent years, platelet count has been receiving considerable attention as an non-invasive index reflecting the liver dysfunctions, which may be associated with the fibrotic stage in chronic hepatitis. Several researchers have reported the relationships between platelet counts and fibrotic stages [2,9]. For example, Matsumura et al. [2] reported the following values: F1: 20.3±5.2( 10 4 µl), F2: 16.0±4.9, F3: 13.0±4.0, F4: 11.8±4.1 and in LC 11.8±4.1. Our results of cluster analysis corresponded to these differences. Additionally, through the visual inspection of clustered sequences, we observed that there might be several types of temporal courses of PLT values. Matsumura et al. [2] also reported the progress speed of liver fibrosis examined on the patients of Type C chronic hepatitis in Japan. They used the date of blood transplants, which could be associated with F0, and the date and results of liver biopsy for calculating the progress speed. The result was about 0.12±0.15 stage/year. In order to investigate the temporal characteristics PLT count, we tried to utilize the time-series data. We set the goal of this study to analyze, without information about blood transplants, the progress speed of liver fibrosis. As a preliminary stage, we attempted to calculate (1) years required for reaching F4 stage, and (2) years elapsed between stages, by combining the fibrotic stages predicted from PLT level and observed by liver biopsy. Here we made an assumption: If the PLT level of a patient is continuously lower than the normal range for at least 6 months, and after that never keeps normal range more than 6 months, then the patient is F4. Based on this assumption, we first examined whether and when a patient reached F4. Then by subtracting dates and stages from those obtained by biopsy, we calculated elapsed years. As a pre-process, we selected the cases for analysis according to the following procedure. 1. Exclude cases that met any of the following three conditions from analysis: (1) No biopsy - biopsy information was not available. (2) Short sequence - the number of examinations was less than 2 or the duration of examination was shorter than 2 years. (3) Inhomogeneous sequence - Deviation of examination intervals was larger than 1 year. 2. Rearrange the sampling intervals of each sequence into one-week. The starting date of re-sampling was selected independently to each case, based on two criteria that (1) it was the day of a week on which the patient most frequently received examinations, and (2) it was the closest date to the first examination. If examination data were missing, we inserted a predicted value by linearly interpolating nearest examination results. In the following procedures we used these rearranged sequences. 15

6 3. Smooth each sequence in order to remove short-term changes. We performed convolution with discrete Gaussian kernel with support width of 6 month (26 weeks; σ=2.8). 4. From the head of a sequence, search the first point that satisfies both of the following two conditions: (a) PLT level became continuously lower than the normal range for the next 6 months. Duration of IFN therapy was not included therein as it might induce short-term decrease of PLT. (b) Recovered PLT level could not continuously maintain the normal range for 6 months. 5. If found, let the detected point the date of declination from normal range. Otherwise, the case was considered to keep normal PLT range and removed from analysis. Table 2 shows the result of sequence classification by the above four procedure. A total of 97 cases classified as 'declinated' were the subject of analysis. Table 2. Result of sequence classification. Judging criteria for declination are: (1) PLT becomes continuously lower than the normal range over 6 months, (2) Recovered PLT level cannot continuously maintain the normal range for 6 months. Both criteria should be satisfied. Inhomogeneous Available No biopsy Short Total Declinated Normal Table 3 summarizes calculated years for reaching F4 (first examination date basis), for the 97 declinated cases in Table 2. The cases were stratified by the virus types and fibrotic stage. Note that years=0 if the date of declination was earlier than the date of first examination. For each of type B, C with IFN and C without IFN groups, we performed statistical tests (ANOVA) aiming at detecting Table 3. Years for reaching F4 (First-exam basis) stratified by virus types and fibrotic stages. Summary for 97 declination cases in Table 2* Type Fibrotic Years for reaching F4 [First-exam basis] (years) Cases Stage Mean Median SD B subtotal C IFN subtotal C w/o IFN subtotal Total *Fibrotic stages in the second column are based on biopsy. Years for reaching F4 was years from first exam to the date of declination under assumption that the fibrotic stage at the date of declination was F4. If the date of declination was the same as or before the first exam, years were treated as 0. 16

7 differences of mean years for reaching F4 with respect to the biopsy-based fibrotic stages. The result of Type C IFN was p=0.012 (< 0.05), indicating that significant differences of years exist among fibrotic stages. However, this was primarily due to one exceptionally long case in F0; tests after removing this case yielded p=0.291, indicating that there was no significant difference on the years for reaching F4 among fibrotic stages. Results for Type B and Type C w/o IFN were p=0.357 and p=0.613 respectively, indicating no significant differences. Kruscal-Wallis tests yielded the same conclusion. Between-group comparison of Type B, Type C with IFN and Type C w/o IFN groups resulted in p= Years for reaching F4 in Table 3 were calculated as years between the first date of PLT examination and the date of PLT declination. Therein we assumed that the fibrotic stage at first examination was the same as that at first biopsy. However, the date of first biopsy and the date of first PLT examination were generally different; in some cases they were several years apart. This implies that the stages might also be different. Therefore, we calculated years for reaching F4 biopsy basis, which are years from the date of first biopsy to the date of PLT declination. Additionally, based on the assumption that the stage at PLT declination should be F4, we calculated elapsed years between stages by the following formula: (date of declination - date of first biopsy) /(4 - fibrotic stage at biopsy). If declination occurred before the first biopsy, years were treated as 0. Table 4 summarizes the results. As we did in the first-exam basis results, for each of type B, C with IFN and C without IFN groups, we performed statistical tests with ANOVA aiming at detecting differences of mean years for reaching F4 w.r.t. the fibrotic stages. The results were p=0.421, (<0.05), for each group respectively. In Type C IFN there appeared significant difference among stages, however, this was primarily due to one exceptionally long case in F0; tests after removing this case yielded p=0.970, indicating that there was no significant difference on the years for reaching F4 even in the biopsy-date basis measurement. Kruscal-Wallis tests resulted in the same conclusion. Similarity, for each of type B, C with IFN and C without IFN groups, we performed statistical tests with ANOVA aiming at detecting differences of mean elapsed years between stages w.r.t. the fibrotic stages. In this test we removed F4 cases as we could not measure the elapsed years. For the Table 4. Years for reaching F4 (biopsy basis) and years between stages stratified by virus type and fibrotic stages. Summary for 97 declination cases in Table 2* Type Fibrotic Years for reaching F4 [biopsy basis](years) Years between stages (years/stage) Cases Stage Mean Median SD Mean Median SD B subtotal C IFN subtotal C w/o IFN subtotal Total *Fibrotic stages in the second column are based on biopsy. Years for reaching F4 were years from first biopsy to the date of declination under assumption that the fibrotic stage at the date of declination was F4. If the date of declination was the same as or before the first biopsy, years were treated as 0. Years between stages were calculated by (years for reaching F4)/(4-stage at biopsy). 17

8 same reason, we excluded F4 cases for calculating values such as mean and SD in Table 4. The results of ANOVA were p=0.836, 0.425, 0.340, indicating that there was no significant differences among stages, including F0, for all of the three groups. In summary, with this limited analysis, no significant difference was observed for years for reaching F4 and years elapsed between stages, with respect to fibrotic stages, virus types and administration of IFN. However, it is interesting that the elapsed years between stages were 1-2 years/stage in almost all groups. If we simply invert it into progress speed for comparison with other resources, the result would be about 1/1.32=0.76 stage/year for example of Type C w/o IFN cases. This is faster than in [2] (0.12±0.15 stage/year), implying that the liver fibrosis might proceed faster. It should be noted that the results of analysis should not be generalized because (1) we assume that a patient was considered to reach F4 when PLT level continuously declinates from the normal range over long time, (2) we selected only exacerbating cases in which PLT continuously decreased, and (3) we did not take into account patient background information such as history of drinking. However, we consider that our approach of measuring elapsed years between stages by combining fibrotic stages obtained from biopsy and inferred from PLT level lead to find interesting results. 5. CONCLUSIONS In this paper we have introduced a cluster analysis system for time series medical data and reported the results of temporal analysis of PLT data in chronic hepatitis patients. The results revealed that temporal courses of PLT might be classified into some patterns according to their levels and trends which might be further related to fibrotic stages. The results also suggest that, in some exacerbating cases, liver fibrosis may proceed a few times faster than the natural courses. In the future, we would proceed to validate the clinical reasonability of the results and validate the usefulness of the system on other datasets. ACKNOWLEDGEMENTS This work was supported in part by the Grant-in-Aid for Scientific Research on Priority Area (# ), Development of the Active Mining System in Medicine Based on Rough Sets by the Ministry of Education, Culture, Science and Technology of Japan. REFERENCES [1] H. Miyazaki, Future Prospect of Thrombopoietin. Jpn J. Transfusion Medicine, Vol. 46, No.3, pp , [2] H. Matsumura, M. Moriyama, and I. Goto and N. Tanaka, and H. Okubo and Y. Arakawa, Natural course of progression of liver fibrosis in patients with chronic liver disease type C in Japan - a study of 527 patients at one establishment in Japan. J. Viral Hepat, Vol. 7, pp , [3] URL: [4] B. S. Everitt, S. Landau, and M. Leese, Cluster Analysis, Fourth Edition. Arnold Publishers, [5] D. Sankoff and J. Kruskal, Time Warps, String Edits, and Macromolecules. CLSI Publications, [6] S. Chu, E. J. Keogh, D. Hart, and M. J. Pazzani, Iterative Deepening Dynamic Time Warping for Time Series., In Proc. the Second SIAM Int l Conf. Data Mining, pp , [7] S. Hirano and S. Tsumoto (2003): An Indiscernibility-Based Clustering Method with Iterative Refinement of Equivalence Relations - Rough Clustering - Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol. 7, No.2, pp , [8] S. Tsumoto, S.Hirano, and K. Takabayashi, Development of the Active Mining System in Medicine Based on Rough Sets, Journal of Japan Society for Artificial Intelligence, Vol. 20, 2, pp , AUTHOR INFORMATION Shoji Hirano received the Ph. D. degree in electronics in 2001 from Himeji Institute of Technology, Japan. He joined in the Department of Medical Informatics, Shimane Medical University as a research associate in April 2001, and serves as an associate professor since July His research interests include data mining, rough sets, image processing, and medical informatics. He received the Best Paper Award at the Fourth Biannual World Automation Congress 18

9 in 2000, and the Annual Conference Award at the 19th Annual Conference of Japanese Society for Artificial Intelligence (JSAI) in He is a member of the IEEE, JSAI and Japan Society for Fuzzy Theory and Intelligent Informatics. Shusaku Tsumoto graduated from Osaka University, School of Medicine in He received his Ph.D (Computer Science) on application of rough sets to medical data mining from Tokyo Institute of Technology in 1997 and has become a Professor at Department of Medical Informatics, Shimane University in His interests include approximate reasoning, data mining, fuzzy sets, granular computing, knowledge acquisition, mathematical theory of data mining, medical informatics and rough sets (alphabetical order). He serves as a President of International Rough Set Society from 2000 to 2005 and served as a PC chair of RSCTC2000, IEEE ICDM2002, RSCTC2004 and ISMIS

Behavior Grouping based on Trajectories Mining. Department of Medical Informatics Shimane University, School of Medicine, Japan

Behavior Grouping based on Trajectories Mining. Department of Medical Informatics Shimane University, School of Medicine, Japan Behavior Grouping based on Trajectories Mining Shoji Hirano Shusaku Tsumoto Department of Medical Informatics Shimane University, School of Medicine, Japan 1 Introduction Outline Background, Objective,

More information

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Hidenao Abe, Miho Ohsaki, Hideto Yokoi, and Takahira Yamaguchi Department of Medical Informatics,

More information

Maintenance of Domain Knowledge for Nursing Care using Data in Hospital Information System

Maintenance of Domain Knowledge for Nursing Care using Data in Hospital Information System Maintenance of Domain Knowledge for Nursing Care using Data in Hospital Information System Haruko Iwata, Shoji Hirano and Shusaku Tsumoto Department of Medical Informatics, School of Medicine, Faculty

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] Stephan Spiegel and Sahin Albayrak DAI-Lab, Technische Universität Berlin, Ernst-Reuter-Platz 7,

More information

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows: Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva

USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva 382 [7] Reznik, A, Kussul, N., Sokolov, A.: Identification of user activity using neural networks. Cybernetics and computer techniques, vol. 123 (1999) 70 79. (in Russian) [8] Kussul, N., et al. : Multi-Agent

More information

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. ankitanandurkar2394@gmail.com

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. ankitanandurkar2394@gmail.com IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR Bharti S. Takey 1, Ankita N. Nandurkar 2,Ashwini A. Khobragade 3,Pooja G. Jaiswal 4,Swapnil R.

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/54957

More information

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

THE INTELLIGENT INTERFACE FOR ON-LINE ELECTRONIC MEDICAL RECORDS USING TEMPORAL DATA MINING

THE INTELLIGENT INTERFACE FOR ON-LINE ELECTRONIC MEDICAL RECORDS USING TEMPORAL DATA MINING International Journal of Hybrid Computational Intelligence Volume 4 Numbers 1-2 January-December 2011 pp. 1-5 THE INTELLIGENT INTERFACE FOR ON-LINE ELECTRONIC MEDICAL RECORDS USING TEMPORAL DATA MINING

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

Determining optimal window size for texture feature extraction methods

Determining optimal window size for texture feature extraction methods IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237-242, ISBN: 84-8021-351-5. Determining optimal window size for texture feature extraction methods Domènec

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Evaluation of Lump-sum Update Methods for Nonstop Service System

Evaluation of Lump-sum Update Methods for Nonstop Service System International Journal of Informatics Society, VOL.5, NO.1 (2013) 21-27 21 Evaluation of Lump-sum Update Methods for Nonstop Service System Tsukasa Kudo, Yui Takeda, Masahiko Ishino*, Kenji Saotome**, and

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Time series clustering and the analysis of film style

Time series clustering and the analysis of film style Time series clustering and the analysis of film style Nick Redfern Introduction Time series clustering provides a simple solution to the problem of searching a database containing time series data such

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Grid Density Clustering Algorithm

Grid Density Clustering Algorithm Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2

More information

Low-resolution Character Recognition by Video-based Super-resolution

Low-resolution Character Recognition by Video-based Super-resolution 2009 10th International Conference on Document Analysis and Recognition Low-resolution Character Recognition by Video-based Super-resolution Ataru Ohkura 1, Daisuke Deguchi 1, Tomokazu Takahashi 2, Ichiro

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm R.Karthiyayini 1, J.Jayaprakash 2 Assistant Professor, Department of Computer Applications, Anna University (BIT Campus),

More information

A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan

A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan , pp.217-222 http://dx.doi.org/10.14257/ijbsbt.2015.7.3.23 A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan Muhammad Arif 1,2, Asad Khatak

More information

Visual Data Mining with Pixel-oriented Visualization Techniques

Visual Data Mining with Pixel-oriented Visualization Techniques Visual Data Mining with Pixel-oriented Visualization Techniques Mihael Ankerst The Boeing Company P.O. Box 3707 MC 7L-70, Seattle, WA 98124 mihael.ankerst@boeing.com Abstract Pixel-oriented visualization

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

ISSUES IN MINING SURVEY DATA

ISSUES IN MINING SURVEY DATA ISSUES IN MINING SURVEY DATA A Project Report Submitted to the Department of Computer Science In Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science University

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Meta-learning. Synonyms. Definition. Characteristics

Meta-learning. Synonyms. Definition. Characteristics Meta-learning Włodzisław Duch, Department of Informatics, Nicolaus Copernicus University, Poland, School of Computer Engineering, Nanyang Technological University, Singapore wduch@is.umk.pl (or search

More information

Increase Hepatitis C Virus Screening and Treatment

Increase Hepatitis C Virus Screening and Treatment 18 Increase Hepatitis C Virus Screening and Treatment Situation The number of deaths from liver cancer in Japan has been rising rapidly since 1975, and now stands at more than 30,000 per year. About 80

More information

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

A Review of Anomaly Detection Techniques in Network Intrusion Detection System A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In

More information

Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Personalized Hierarchical Clustering

Personalized Hierarchical Clustering Personalized Hierarchical Clustering Korinna Bade, Andreas Nürnberger Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, D-39106 Magdeburg, Germany {kbade,nuernb}@iws.cs.uni-magdeburg.de

More information

DHL Data Mining Project. Customer Segmentation with Clustering

DHL Data Mining Project. Customer Segmentation with Clustering DHL Data Mining Project Customer Segmentation with Clustering Timothy TAN Chee Yong Aditya Hridaya MISRA Jeffery JI Jun Yao 3/30/2010 DHL Data Mining Project Table of Contents Introduction to DHL and the

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

INVESTIGATIONS INTO EFFECTIVENESS OF GAUSSIAN AND NEAREST MEAN CLASSIFIERS FOR SPAM DETECTION

INVESTIGATIONS INTO EFFECTIVENESS OF GAUSSIAN AND NEAREST MEAN CLASSIFIERS FOR SPAM DETECTION INVESTIGATIONS INTO EFFECTIVENESS OF AND CLASSIFIERS FOR SPAM DETECTION Upasna Attri C.S.E. Department, DAV Institute of Engineering and Technology, Jalandhar (India) upasnaa.8@gmail.com Harpreet Kaur

More information

COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES

COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES JULIA IGOREVNA LARIONOVA 1 ANNA NIKOLAEVNA TIKHOMIROVA 2 1, 2 The National Nuclear Research

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable

More information

MapReduce Approach to Collective Classification for Networks

MapReduce Approach to Collective Classification for Networks MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,

More information

Discretization and grouping: preprocessing steps for Data Mining

Discretization and grouping: preprocessing steps for Data Mining Discretization and grouping: preprocessing steps for Data Mining PetrBerka 1 andivanbruha 2 1 LaboratoryofIntelligentSystems Prague University of Economic W. Churchill Sq. 4, Prague CZ 13067, Czech Republic

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

How To Identify Noisy Variables In A Cluster

How To Identify Noisy Variables In A Cluster Identification of noisy variables for nonmetric and symbolic data in cluster analysis Marek Walesiak and Andrzej Dudek Wroclaw University of Economics, Department of Econometrics and Computer Science,

More information

Strategic Online Advertising: Modeling Internet User Behavior with

Strategic Online Advertising: Modeling Internet User Behavior with 2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Resource-bounded Fraud Detection

Resource-bounded Fraud Detection Resource-bounded Fraud Detection Luis Torgo LIAAD-INESC Porto LA / FEP, University of Porto R. de Ceuta, 118, 6., 4050-190 Porto, Portugal ltorgo@liaad.up.pt http://www.liaad.up.pt/~ltorgo Abstract. This

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM

ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for

More information

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

More information

Use of Human Big Data to Help Improve Productivity in Service Businesses

Use of Human Big Data to Help Improve Productivity in Service Businesses Hitachi Review Vol. 6 (216), No. 2 847 Featured Articles Use of Human Big Data to Help Improve Productivity in Service Businesses Satomi Tsuji Hisanaga Omori Kenji Samejima Kazuo Yano, Dr. Eng. OVERVIEW:

More information

ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS

ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS 1 PANKAJ SAXENA & 2 SUSHMA LEHRI 1 Deptt. Of Computer Applications, RBS Management Techanical Campus, Agra 2 Institute of

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Analysis of Software Process Metrics Using Data Mining Tool -A Rough Set Theory Approach

Analysis of Software Process Metrics Using Data Mining Tool -A Rough Set Theory Approach Analysis of Software Process Metrics Using Data Mining Tool -A Rough Set Theory Approach V.Jeyabalaraja, T.Edwin prabakaran Abstract In the software development industries tasks are optimized based on

More information

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

More information

Mining of predictive patterns in Electronic health records data

Mining of predictive patterns in Electronic health records data Mining of predictive patterns in Electronic health records data Iyad Batal and Milos Hauskrecht Department of Computer Science University of Pittsburgh milos@cs.pitt.edu 1 Introduction The emergence of

More information

SEISMIC CAPACITY OF EXISTING RC SCHOOL BUILDINGS IN OTA CITY, TOKYO, JAPAN

SEISMIC CAPACITY OF EXISTING RC SCHOOL BUILDINGS IN OTA CITY, TOKYO, JAPAN SEISMIC CAPACITY OF EXISTING RC SCHOOL BUILDINGS IN OTA CITY, TOKYO, JAPAN Toshio OHBA, Shigeru TAKADA, Yoshiaki NAKANO, Hideo KIMURA 4, Yoshimasa OWADA 5 And Tsuneo OKADA 6 SUMMARY The 995 Hyogoken-nambu

More information

Visualization of Breast Cancer Data by SOM Component Planes

Visualization of Breast Cancer Data by SOM Component Planes International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian

More information

Why do statisticians "hate" us?

Why do statisticians hate us? Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data

More information

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods

More information

Cluster Analysis using R

Cluster Analysis using R Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other

More information

Data Mining for Risk Management in Hospital Information Systems

Data Mining for Risk Management in Hospital Information Systems Data Mining for Risk Management in Hospital Information Systems Shusaku Tsumoto and Shoji Hirano Department of Medical Informatics, Shimane University, School of Medicine, 89-1 Enya-cho, Izumo 693-8501

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

Modeling and Design of Intelligent Agent System

Modeling and Design of Intelligent Agent System International Journal of Control, Automation, and Systems Vol. 1, No. 2, June 2003 257 Modeling and Design of Intelligent Agent System Dae Su Kim, Chang Suk Kim, and Kee Wook Rim Abstract: In this study,

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

Chapter 7: Data Mining

Chapter 7: Data Mining Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain

More information

Biometric Authentication using Online Signatures

Biometric Authentication using Online Signatures Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,

More information

Data Mining Analysis of a Complex Multistage Polymer Process

Data Mining Analysis of a Complex Multistage Polymer Process Data Mining Analysis of a Complex Multistage Polymer Process Rolf Burghaus, Daniel Leineweber, Jörg Lippert 1 Problem Statement Especially in the highly competitive commodities market, the chemical process

More information

Clustering UE 141 Spring 2013

Clustering UE 141 Spring 2013 Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

More information

A Stock Pattern Recognition Algorithm Based on Neural Networks

A Stock Pattern Recognition Algorithm Based on Neural Networks A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo guoxinyu@icst.pku.edu.cn Xun Liang liangxun@icst.pku.edu.cn Xiang Li lixiang@icst.pku.edu.cn Abstract pattern respectively. Recent

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

A Reliability Point and Kalman Filter-based Vehicle Tracking Technique

A Reliability Point and Kalman Filter-based Vehicle Tracking Technique A Reliability Point and Kalman Filter-based Vehicle Tracing Technique Soo Siang Teoh and Thomas Bräunl Abstract This paper introduces a technique for tracing the movement of vehicles in consecutive video

More information

2.1. Data Mining for Biomedical and DNA data analysis

2.1. Data Mining for Biomedical and DNA data analysis Applications of Data Mining Simmi Bagga Assistant Professor Sant Hira Dass Kanya Maha Vidyalaya, Kala Sanghian, Distt Kpt, India (Email: simmibagga12@gmail.com) Dr. G.N. Singh Department of Physics and

More information

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1

More information