Data a systematic approach

Size: px
Start display at page:

Download "Data a systematic approach"

Transcription

1 Pattern Discovery on Australian Medical Claims Data a systematic approach Ah Chung Tsoi Senior Member, IEEE, Shu Zhang, Markus Hagenbuchner Member, IEEE Abstract The national health insurance system in Australia records details on medical services and claims provided to its population. An effective method to the discovery of temporal behavioral patterns in the dataset is proposed in this paper. The method consists of a two step approach which is applied recursively to the dataset. First, a clustering algorithm is used to segment the data into classes. Then, hidden Markov models ) are employed to find the underlying temporal behavioral patterns. These steps are applied recursively to features extracted from the dataset until convergence. The main objective is to minimize the misclassification of patient profiles into various classes. This results in a hierarchical tree model consisting of a number of classes; each class groups similar patient temporal behavioral patterns together. The capabilities of the proposed method are demonstrated through the application to a subset of the Australian national health insurance dataset. It is shown that the proposed method not only clusters data into various categories of interest, but it also automatically marks the periods in which similar temporal behavioral patterns occurred. Index Terms H.2.8.d Data mining,, H.3.3 Information Search and Retrieval, H.3.3 Information Search and Retrieval, H.2.8.b Clustering, classification, and association rules. I. INTRODUCTION Australia s population of about 2 million is covered by a general universal publicly funded medical insurance scheme called Medicare. A government agency called the Health Insurance Commission HIC)

2 administers the Medicare program. The function of HIC includes, among many other tasks, the detailed recording of medical transactions, mainly for their internal accounting purposes. Access to the Australian Medicare program is granted to all Australian residents and certain categories of visitors where services covered include medical and hospital services. Where an eligible person incurs medical expenses in respect of a professional service, Medicare will pay benefits for that particular service as outlined in the Medical Benefit Schedule MBS) which essentially indicates the amount of rebate that the patient can claim, under various circumstances 1. Medical transactions for the entire population have been collected since HIC was set up in Each medical transaction record contains much valuable information, e.g., the name and address of the patient, age, gender, the medical service provider s name, and practice location, the type of medical service provided, the date the service was provided, the date payment was made, the date the claim was processed, the way the payment is claimed, the Medicare office in which the claim was made, the officer who served the customer, etc. This information collection is primarily for the accounting purpose. It is noted that medical treatments for which no claim to the HIC is made are not recorded. In general, the total number of Medicare transaction records is approximately 375,, records per annum which equates to about 13 GB of data. From a data mining perspective, the Medicare transaction record database presents a treasure trove for many possible pattern discovery, and data mining exercises. We have obtained seven quarters of de-identified 2 medical transaction record data from the HIC. Our aim is to detect similar temporal behavioral patterns amongst patients in the dataset. For this paper, we will use hidden Markov models ) as a suitable approach for temporal behavioral pattern discovery. are commonly applied to pattern recognition tasks since they allow a formal representation of a stochastic dynamic process, and allow for a systematic analysis of the data and prediction based on such models. 1 Details of MBS and the various situations in which a patient can claim rebate is found in the following web site: 2 The records are all de-identified so as to remove all possibility of identifying a person to provide anonymity to patients.

3 Overall, the method proposed in this paper consists of three steps: Step 1 Feature extraction Step 2 Segmentation into clusters Step 3 Pattern recognition Because of the richness and the format of the data, the first task is to decide on the extraction of features which is most suitable for the task at hand. Feature extraction is performed by applying content filtering techniques in that we will filter appropriate items in the dataset which are representative of the underlying dynamical behavior of the system. Step 2 is to segment the dataset into clusters of similarly behaving patients. We propose to segment the data into suitable age cohorts, and then to cluster each age cohort using a popular clustering technique, viz., the K-means clustering algorithm. Based on the result from the segmentation process, Step 3 can then be employed to the task of pattern discovery. We propose the use of HMM as a suitable approach for this task. These three steps will yield a set of clusters with similarly behaving temporal patterns together in a class. We will call the application of one cycle of this three step process a coarse classification method as it yields a set of classes from the dataset. The K-means clustering algorithm clusters the dataset according to some criterion based on the entire patient profile, while the HMM groups the dataset into similar temporal behavioral patterns. Hence both K-means clustering algorithm and HMM group the dataset quite differently. Since our primary aim is to group patients of similar temporal behavioral patterns together, we could have used the HMM alone on the features extracted from the dataset without first using the K-means clustering algorithm. However, we find, from some preliminary experiments, this will produce a large number of classes, some of them do not seem to be assigned correctly when inspected visually. The K-means clustering algorithm was introduced to first cluster the dataset according to their overall behaviors, as it considers the characteristics of the entire patient profile. In other words, the K-means clustering algorithm acts as a filtering process, filtering similar overall patient behaviors together first. The HMM is then applied to these similarly overall behaviors to further classify those with similar temporal behavioral patterns together into subclasses. We

4 find from our initial experiments that this two step process yields a smaller number of subclasses. Further, profiles within the same subclass bear resemblance to one another when inspected visually. As will be shown later, the recursive application of Step 2 and Step 3 allows for a further refinement of the coarse classification of the data by the K-means clustering algorithm. This yields a hierarchical tree model, consisting of a number of layers, each layer is described by classes of patients with similar temporal behavioral patterns. The main innovative idea in this paper is the provision of a practical method for unsupervised grouping of a dataset into a hierarchical tree model, each layer of the tree model consists of clusters classes) of similar temporal behavioral patterns. The subclasses representing the successive layers of the hierarchical tree model bear similar temporal behavioral characteristics to the classes in the preceding layer, or their parent classes. The structure of this paper is as follows: a feature extraction approach presented in Section II generates individual patient profiles as output. In Section III we use a K-means clustering algorithm to coarsely segment the data into clusters. Such segmentation allows us to employ a hidden Markov model HMM) to classify the data in each cluster into similar temporal behavioral patterns. This is addressed in Section IV. In Section V we show that K-means clustering and HMM recursively applied to the dataset produce a fine decomposition of classes for the data, and results in a generalization performance which is superior to the initial approach of deploying the K-means clustering algorithm and HMM methodology alone without the hierarchical decomposition. Experimental results presented in Section VI show that the proposed approach is effective in classifying patient records into classes such that similar profiles are grouped together. Finally, conclusions are presented in Section VII. II. FEATURE EXTRACTION AND CONSTRUCTION OF PROFILES Given the vast amount of data, a way is needed to extract some pertinent features of the data to allow for further processing by a pattern discovery algorithm. In addition, a segmentation of the data can be employed to support the mechanism of pattern discovery. It is recognized that a patient s medical

5 record changes dramatically with age. For example, a patient engages medical services differently during childhood years than when at an older age. Also female patients at a child bearing age utilize medical services differently than male patients of the same age. Moreover, a patient s medical record can be influenced by other factors such as location of residence e.g. rural or metropolitan area), seasonal changes, and others. We analyzed the data and found that a patient s age and gender influence the transactions most strikingly. In order to reduce the effect of dependencies on age we decided to sort patients into age cohorts. As a consequence, we partitioned the data into groups of similar age such as those shown in Table I. TABLE I COHORT GROUPS AND SIZES. Female Male Age Number of Average number Average value Number of Average number Average value Comments cohort patients of claims filed of claim patients of claims filed of claim -3 46, , infants , , pre-school age , , school age ,53, , young adult ,498, ,257, adult ,221, ,74, mature adult ,287, ,199, mid-life ,11, ,68, retirement age >71-862, , elderly Σ =8,99,473 / =9.1 / = Σ =6,88,55 / =8.36 / = From Table I we find that both the cost and the frequency of medical services used tend to increase with the age of a patient. Moreover, a striking observation is that with female patients older than 16 years of age use medical services considerably more frequently than their male counterpart. This behavior

6 comes quite obviously with the sexual differences and is associated with its ongoing consequences e.g. child bearing). In contrast, more boys male children younger than 16) seek medical services than girls of the same age. This may be caused by the general observation that boys are more exploratory outdoors thus incurring a higher exposure to risky activities than girls. Further analysis of the data showed that a patient s behavior changes not only with age and gender but also with the address of patients e.g. whether a patient is from a rural or metropolitan area). These two observations guide us to the following feature extraction process: segment the dataset into age cohorts so that patients with similar behaviors are grouped together. Secondly, in order to provide generalization capability, we have chosen to consider patients from addresses involving both metropolitan, semi-rural, and rural areas. This selection of patients will provide possibilities of generalization of the models built as it includes diversity of data for the same age cohort. This paper will utilize the cohort of patients who are between 45 and 55 years of age for demonstration purposes and for subsequent experiments 3. We further mark those patients depending on their location of residence which is either a metropolitan area or a rural region. This cohort group is used to present the modeling methodology. For the task of extracting features, it is possible to use a number of indicators such as the number and type of services used, the way a payment is made, the provider s specialization, etc. For demonstration purposes, we decide to use the total benefits paid as a feature, since it not only bears patient s behaviors but also encodes most types of service implicitly. Most patients do not see a medical service provider daily. This motivates us to consider a rolling time window through the data rather than individual days as a basic time unit. Hence, we will extract features by using a rolling time window of fixed size on the total benefits paid during a period. The result is a temporal profile of a patient. In order to avoid border effects, profiling is performed on one year of data, rather than the full seven quarters. This is because of the fact some patients may not claim for the 3 We could have divided the age cohort further into male and female patient profiles. However, we decided not to present such results as it adds complexity to the presentation without adding much value to the proposed methodology.

7 expenses until some time after having received the services from the medical service providers. Had we use all 7 quarters of data this will introduce border effects. Considering the total benefits within a time window instead of working on each day as a unit of processing has an intuitive appeal. Often a patient is required to undertake a course of treatments over a short period of time, e.g., 14 days or less. During this period, the patient may need to see a medical service provider a number of times, incurring medical expenses. Hence by summing the total benefits paid over a time window, the underlying nature of the course of treatment is captured. Mathematically this simple step can be represented as follows: given a longitudinal set of medical records of a patient over a total of D days. In our case D = 365 days one year). We are only interested in the benefits paid by HIC on servicing this person. Hence only the benefits paid relevant to the person from the longitudinal medical records are extracted. Let this be denoted by z 1, z 2,..., z D. Note that we use the day when the patient visits the medical service provider as the day in which the benefit is paid, rather than the actual date when the benefit is paid. This is because of the fact that the medical service provider may choose to bulk bill, so that claims may be send in batches to the HIC. This would distort the profiles. If there is no visit to the medical service provider, on day i, then z i =. Assume a time window W days, then the total benefits paid over this time window is given by: y t = t+w 1 i=t z i t = 1, 2,..., D W + 1; 1) It is possible to slide the time window to the next day and perform the same computation and obtain another point on the profile. Thus over D days, a total of D W + 1 points in the profile are obtained. We have experimented with different values of fixed time window W, and found that using a period of 14 days helps to achieve good results. Thus, each profile will be of dimension D W + 1 = 352. An example of the effects on using a time window is shown in Figure 1. The profile on the left shows a patient s profile which uses individual days as a basic time unit. The same profile using a sliding time window of W = 14 is shown on the right. It is observed that profiles consist of segments of activities over a period of time. The area under the

8 Total benefit paid $ Total benefit paid Fig A patient s profile. The horizontal axis shows the number of the time window, the vertical axis displays y t, the sum of benefit paid in a time window. The profiles are from the same patient using W = 1 left) and W = 14 right). curve for the segment [t 1, t 2 ] is given by: A t1,t 2 = t 2 t 1 t= Thus the area enclosed by the segment will be a defining feature 4. W t 2 z t1 +i+t = y t t 1, t 2 =, 1, 2,..., 352; 2) i=1 t=t 1 In summary, this feature extraction approach creates a one-dimensional temporal patterns of fixed dimension for each patient in the dataset. These profiles can now be grouped into homogeneous clusters of profiles such that profiles within a cluster share similar properties. A simple approach to the clustering of data is the application of a K-means clustering algorithm. III. CLUSTERING The feature extraction step produced a 352-dimensional numerical data vector for each patient in the data set. Because of the time consuming aspects of performing the calculation on the entire age cohort, we created a smaller subset of profiles totaling 14,417 patients aged between 45 and 55, 32,476 of whom reside in one of the 1 major Australian cities, all other patients live in rural or semi-rural areas. We randomly select 73,942 approximately two third of the total number of patients) profiles to serve as a training set, all remaining profiles will serve as a testing dataset. The feature vectors can differ significantly in value, and hence, produce points in the data space which are located in well distinct areas. As a consequence, it can be assumed that profiles which share certain properties form clusters. 4 It will be shown in Section VI, this is close to the mean value of the output of the.

9 There are a number of algorithms which are capable of segmenting data into clusters in the input space. Perhaps the simplest algorithm for such a task is the well known K-means clustering algorithm [3]. In short, the K-means clustering algorithm works as follows [3]: given a set of unlabeled data y 1, y 2,..., y n, where y i is a d-dimensional vector we wish to compute the number of classes in which this set of data can be grouped together. We use a coherence criterion to group the data points together: J = K k=1 y D k y m k 2 3) where D = K k=1d i is a possible partition of the n data points into K partitions, D i D j =. denotes the Euclidean norm, m i the vector denoting the mean of the i-th cluster. Note that this criterion measures the total distortion from the mean over the entire duration of the profile. The criterion does not pay attention to the underlying temporal differences within the profiles. Thus two profiles with significantly different temporal behaviors may give rise to the same distortion measure. Then a simple K-means clustering algorithm is given as follows [3]: Step 1 Initialize a partition. Randomly choose K points, y 1, y 2,..., y K as centers. Every other point y j, j k is assigned to cluster D i if it is closer to the center y i than to any other center. Step 2 Compute the mean of each cluster D i, i = 1, 2,..., K as follows: m i = 1 D i y k D i y k 4) Step 3 For k = 1, 2,..., n compute d 2 j = y k m j 2, j=1,2,...,k. Assign y j to cluster D i if i = arg min{d 1, d 2,..., d K }. Step 4 Exit if there is no update otherwise repeat step 2 through to step 4. There are many variants of the K-means clustering algorithm. We will use this simple K-means clustering algorithm, as all we wish to obtain is an approximate clustering of profiles. Optionally, other clustering algorithms, e.g., fuzzy K-means algorithm [3], self organizing map method [3], Gaussian Mixtures [4], or EM clustering [5], [6]. We have experimented with various values of K. For example, when applying the K-means algorithm to the training set of 73,942 profiles using K = 1, we find 1 clusters would be sufficient. The six

10 largest clusters contain more than T = 3 profiles each and are shown in Table II. Clusters containing less than 3 profiles will be ignored at this stage 5, their profiles, however, are grouped into the Others category. Reasons for this step will be addressed in greater detail later in this paper. TABLE II THE SIX LARGEST CLUSTERS FOUND BY THE K-MEANS CLUSTERING ALGORITHM. THE NUMBER OF PROFILES IN EACH CLUSTER, AND THE MINIMUM / MAXIMUM VALUE OF TOTAL BENEFITS PAID OVER A 14 DAYS PERIOD FOR THE PROFILES IN A CLUSTER ARE SHOWN. Name of Number Benefit Range Cluster of Profiles in A$ Cluster1 61, ,923.3 Cluster2 7, ,264.3 Cluster3 1, ,43.5 Cluster4 1, , Cluster5 1, ,354.1 Cluster ,582.5 Others 55 2, , Six typical profiles from each cluster are shown in Figure 2. The profiles from different clusters differ significantly in terms of activity and benefit values paid. In the current context, the cluster 6 contains profiles which feature greatest activity and the largest amounts of benefit paid, while in contrast, profiles in cluster 1 featured little activity. The K-means clustering algorithm by itself is not particularly useful for the task of temporal pattern discovery. This is because the K-means clustering algorithm has a number of shortfalls: The algorithm is not very efficient on sparse vectors since the Euclidean norm often becomes zero. Thus, the K-means algorithm does not reliably cluster profiles which show little activity. 5 The figure of T = 3 is obtained by intuition that the HMM is a stochastic model, and requires large set of training data to estimate its parameters. Hence, if the training set consists of T <3 profiles is considered as too small to reliably estimate the parameters of the HMM. In practice, the results obtained stay approximately the same when the value of T varies over a range. Hence, the algorithms appear to be relatively insensitive to the variation of T over a range.

11 Cluster Cluster Cluster Cluster Cluster Cluster 6 Fig A figure showing typical profiles in each cluster. The vertical scales differ in each cluster. In this diagram we are only interested to portrait the variations of the shape of the samples in each cluster, rather than concerning about their actual magnitudes. The algorithm is not particularly reliable in detecting overlapping clusters. The optimal border zone between overlapping clusters may not be found. The clustering of data does not equate to the discovery of temporal patterns which may be encoded in the input data. For the task at hand, a context sensitive classification of the data set is desired. Nevertheless, the K-means clustering algorithm is fast and effective in the segmentation of the input space into clusters of various degrees of interest. Building on this, a temporal pattern discovery algorithm such as a HMM can be trained to detect sequences of events embedded in the profiles. This will be addressed in the following section. IV. HIDDEN MARKOV MODELS It is noted that the profiles resemble signatures. We have a large number of profiles, and wish to find out how any unseen profiles can be classified. In this type of situations, are well known to be

12 a very good tool as it has been widely deployed in speech processing [2], image processing, and many other areas. This motivates us to study to find out if it is possible to classify unseen profiles. This section briefly describes the HMM model, following closely to that developed in [2]. This is presented here for self contained explanation of our proposed method. There are a number of variants of [1], e.g., discrete HMM, continuous observation HMM, input-output HMM. Here we describe the hidden Markov model with continuous observations [2]. The sample HMM given in Figure 3 will help to understand the mathematical formulation. a 11 a 1 X 1 a 13 X a 1 a 12 X 2 a 23 a a 33 a 21 X 3 a 22 Fig. 3. Example HMM with an initial state π, hidden states x i, and state transitions a ij. It is assumed that the observations y 1, y 2,..., y T are generated by a multivariate probability density function. For simplicity, we will assume that this is generated by a Gaussian mixture as follows: M f y x ξ i) = c im N ξ; µ im, C im ) 5) m=1 where N ξ; µ im, C im ) denotes a Gaussian probability density function with mean µ im and covariance matrix C im. The notation f y x ) denotes the probability of observing y given the hidden state sequence x. The constants c im are known as mixing coefficients. In order to be a probability density function, we must have M m=1 c im = 1 for 1 i S, and S is the size of the alphabet the dimension of the state space). It is further assumed that the observation probability density functions are generated by a hidden state x, x is a S dimensional vector. This state follows the evolution equation: xt + 1) = Axt) 6)

13 where A is the state transition matrix, with initial condition x) = π. The parameters in the model are then M = {S, π, A, {f y x ξ i), 1 i S}}. The problem in HMM estimation can be divided into two sub-problems [2]: 1) Given a series of training observations for a given entity, say, a label, how do we train an HMM to represent this label? This problem becomes the finding of a procedure for estimating an appropriate state transition matrix A, and observation probability density function f y x for each state. 2) Given a trained HMM, how do we find the likelihood that it produced the incoming observation sequence. We will not derive the HMM estimation algorithm here, as it is readily available in e.g., [2], [7], [8]. But we will summarize the training algorithm as follows: Let vi; t, l) = P xt) = i yt) produced in accordance with mixture density l) = αy t 1, i)βy T t+1 i) Sj=1 αy t 1, j)βy T t+1 j) c il N ξ; µ il, C il ) Mm=1 c im N ξ; µ im, C im ) 7) where y t 1 denotes the sequence y 1, y 2,..., y t, α and β are quantities, respectively called forward and backward probability sequences, which are associated with the estimation of the parameters, defined as follows: αy t 1, i) = P y t 1 = y t 1, xt) = i M) βy T t+1 i) = P y T t+1 = y T t+1 xt) = i, M) 8) Both α and β can be estimated efficiently from the given set of training data as follows: αy t+1 1, i) = S αy1, t j)ai j)byt + 1) i) 9) j=1 and S βyt+1 i) T = βyt+2 j)aj i)byt T + 1) j) 1) j=1

14 If we let T νi;, l) = νi; t, l) 11) t=1 where νj, t) = αy t 1,j)βyT t+1 j) P y M) other t t = 1, 2,..., T 12) Then we have: c il = µ il = C il = νi;, l) Mm=1 νi;, m) Tt=1 νi; t, l)yt) νi;, l) Tt=1 νi; t, l)[yt) µ il ][yt) µ il ] T νi;, l) 13) 14) 15) These equations, derived using an expectation maximization algorithm [2], will converge to a set of consistent parameters. As for the problem of finding class labels given a set of observations, we will use the Viterbi algorithm [2]. This can be described as follows: Step 1 Initialization: δ i) = 1 i = 1 i > 1 Step 2 Recursion for 1 t T and 1 j n: δ t j) = max 1 j n δ t 1i)a ij b j y t ) Ψ i j) = arg max 1 i n [δ t 1i)a ij ] Step 3 Termination maximum probability P of y and best exiting state i T ) P = max 1 i n δ T i) i T = arg max 1 i n δ T i)

15 Step 4 Backtracking of state sequence: for t = T 1, T 2,..., 1 i t = Ψ t+1 i t+1) In order to understand how HMM performs in the context of this paper, we apply HMM to the dataset described earlier. A. Applying HMM As described in Section III a set of patient profiles has been grouped into 6 individual clusters. Here we apply one HMM to each of the six clusters of data. Each HMM is trained on 9% of the data in each cluster ensuring that the size of the training set does not exceed a given limit here 2). This step effectively groups profiles from each cluster into a training set and a data pool. There are a number of reasons for this step: first, it balances the size of the training sets so that each HMM is trained on a similarly sized set of data. However, a more important reason will become clear in Section V where are applied and trained recursively on the data. There the data pool allows for an optimization of the generalization performance, while providing an automated mechanism which determines the minimal number of training data required to obtain an optimal solution. Table III shows the sizes of the data sets used in this section. Data in the training sets are used to train the. TABLE III SIZE OF TRAINING SET AND DATA POOL. THE CLUSTERS WERE OBTAINED THROUGH K-MEANS ALGORITHM. Cluster Total Training Data Pool Cluster1 61,325 2, 59,325 Cluster2 7,28 2, 5,28 Cluster3 1,89 1, Cluster4 1,76 1, Cluster5 1,65 1, Cluster

16 Once trained, the HMM is evaluated on the combined training set and data pool 6. The classification results for the trained is shown in Table IV. For example, 46,119 patterns from cluster 1 are misclassified as patterns from cluster 2. Diagonal elements in Table IV give the number of patterns where the classification by agrees with the clustering produced by the K-means clustering algorithm. Thus, it is noted that HMM does not agree very well with the K-means clustering at this stage as Table IV contains many non-zero and relatively large off diagonal elements. TABLE IV CLASSIFICATION RESULTS BY THE HIDDEN MARKOV MODEL. THE LABEL A IS USED TO DENOTE PROFILES IN CLUSTER 1 IN TABLE III, B FOR CLUSTER 2, ETC. Class A B C D E F Total A B C D E F Others Σ When comparing HMM s classification results with those obtained from the K-means clustering algorithm, the HMM classification appears to be perceptually more agreeable. From Figure 4 it is observed that the K-means clustering algorithm and the HMM grouped the same set of profiles differently. On a global scale it is noted that the K-means clustering algorithm grouped the profiles as a whole profile using the Euclidean distance as a coherence criterion). On the other hand the HMM groups profiles together based on the time evolution of the profiles, and hence the grouping of profiles is influenced by the context of events. Thus, the classification of profiles is the result of considering the context in 6 We chose to use the combined training set and data pool set in this evaluation step to facilitate further refinement of the modeling process through cycling as will be described in a later section.

17 which benefit is paid. For example, the HMM assesses whether a benefit paid is out-of-the-ordinary by considering the context within which a benefit was paid. As such, may classify large payments of benefit as not atypical if the context within which such a payment is made is commonly observed. Similarly, a HMM may classify relatively small amounts of benefit paid as atypical if the context within which such payment is made is rarely observed in the dataset Class A Class B Class C Class D Class E Class F Fig. 4. The same profiles as in Figure 2 classified by trained hidden Markov models. This approach neglected an important issue: clusters produced by the K-means clustering algorithm which contained less than 3 profiles were ignored, and not used for the training of. This is in recognition of the fact that probabilistic methods such as HMM require large amount of training patterns

18 in order to be trained successfully. However, neglecting profiles from small clusters can have a negative side effect since it is the particularly interesting atypical profiles which are normally grouped in small clusters. Hence, we suggest an approach which will utilize profiles from clusters which are too small for the training of. It is possible to combine profiles from the small clusters with the rest of the dataset during the classification phase. The effect is indicated by the row Others in Table IV. By doing this, all profiles from the training set are classified and hence, can be used for further processing. It is recognized that such a procedure may not produce a particularly good classification result for some of the profiles. This is due to the fact that the K-means clustering algorithm discovered more clusters than HMM can work with, and hence, the separation of the profiles may not be optimal. An approach which solves this dilemma is addressed in the next two sections. In addition, an interesting observation is that HMM classified just 14,229 profiles as patterns from class A, and hence, reduced the number of patterns in the original cluster 1 significantly. Similarly, the resized the other classes. Given that HMM re-arranges the way patterns are classified suggests to re-apply HMM to these new sets of data. For example, we can re-train a HMM on the set of 14,229 profiles which belong to class A. Similarly, a HMM can be trained on patterns from each of the other classes. It can be assumed that such a procedure will reduce the number of off-diagonal, or misclassified, patterns. This idea is elaborated in the following section. V. RECURSION In order to reduce the number of misclassifications we propose to apply HMM recursively to the data set as follows: Step 1 For each class, randomly choose 9% of the profiles but no more than N 7 and place them in the active training set. Place the remaining profiles in a data pool. There is one data pool for each class. 7 In this paper we use N = 2

19 Step 2 Train one HMM on each active training set. Step 3 Classify all data by the newly-trained. The result is a new classification of the data. Step 4 Cycle through Step 1 if the number of misclassifications is greater than zero, or until a maximum number of recursions have been performed. This approach of iterative re-training of has a number of advantages: In step 2, only a relatively small set of patterns is used to train. This reduces the computational time required considerably. Since recursion is performed through step 1, the are eventually presented with all profiles in the training set. But since recursion stops when no profiles are misclassified, a HMM may not necessarily need to be trained on all profiles. Hence, this approach provides a tool to minimize training requirements on large sets containing redundant data. The number of misclassified profiles is minimized. Executing HMM recursively produces Figure 5 which illustrates that the number of misclassified profiles approaches zero with the number of iterations. Here, training is stopped after 5 iterations upon which we observed only 216 misclassified profiles. Details on the classification result are shown in Table V. 6 Convergency at top level training "convergetop.dat" using 1:$2+$3) 5 Sum of Off-diagonal Element Fig Num of Iteration The convergence of the sum of off-diagonal elements through the iterative training of.

20 In this set of experimental results it is observed that the number of misclassified profiles converges within less than 5 iterations. It is possible to interrupt training earlier as the number of misclassified profiles can converge to zero at an earlier iteration. Table V, as an example, presents the confusion matrix obtained after 47 iterations. The sum of off diagonal number or the misclassification to has significantly improved when compared with those in Table IV. TABLE V FINAL RECOGNITION RESULT AFTER TRAINING HIDDEN MARKOV MODEL ITERATIVELY. Class A B C D E F Total A B C D E F Σ Note that the recursive training of the HMM is valid as we do not have any a priori information as which profile belongs to which cluster. In other words, we do not have any ground truth data to guide the grouping of data together. Our task of finding a good grouping of the data is achieved by recursively training the, as described in this section, since this minimizes the off diagonal values of the confusion matrix. In other words, the data set is grouped into groups with minimum misclassification. Intuitively, this should produce groups of profiles which are most similar to one another. The recursive application of HMM re-classified data into different classes. However the number of is influenced by the clustering process. It is possible that within each of the current classes, more than one HMM models exists. Hence, it is possible to diversify the classification by partitioning the profiles from particular classes. For example, the 996 profiles belonging to class A after having applied HMM recursively, can again be segmented into individual clusters by applying the K-means clustering

21 algorithm. Such an approach allows to refine the classification of patterns in class A. Once we have the new partitioning of this subset, it is then possible to re-train on the new clusters. The idea and its effects will become clearer in the following section. A. Refining the clustering The outcomes of the previous section give us profiles which are grouped into a fixed number of classes. What we wish to explore is the possibility of refining the classification by building sub-classes. We achieve this by applying K-means clustering algorithm or any other clustering algorithm) on each of the identified classes. Once we divide a class of profiles into sub-clusters using the K-means clustering algorithm, we can train a HMM, and iterate through the previous steps to obtain a set of sub-classes which constitutes the class. We can repeat this process for all classes. The outcome of this step is that we obtain a number of sub-classes for each class of profiles which allows for a refined labeling of the profiles. In other words, we obtain a hierarchical tree of classes. A flowchart of the proposed approach is given in Figure 6. Clustering Number of clusters > 2 No Yes HMM training No Converged? Yes Classify with HMM Yes Number of classes > 2 No Stop Fig. 6. The procedures involved in the proposed algorithm. The recursive approach is illustrated. It is shown that this approach allows the K-means clustering algorithm to be applied to each sub-group

22 recursively until no further clustering of the data is possible. Top Level 6 Clusters 5 HMM iterations 6 Classes ClassA ClassB ClassC ClassD ClassE ClassF Sublevel 1 2 Clusters 6 Clusters 4 Clusters 3 Clusters 3 Clusters 21 Iteratins 5 Iteratins 5 Iteratins 5 Iteratins 5 Iteratins 1 Cluster 2 Classes 6 Classes 4 Classes 3 Classes 3 Classes Fig. 7. A flowchart of iterative K-means clustering algorithm and the hidden Markov model refinement process. For example, the result from re-applying the clustering algorithm to each of the six classes is illustrated in Figure 7. In Figure 7, Top Level refers to results the classes A to F) as obtained earlier in this section. Level 1 refers to results obtained when re-applying K-means, and HMM training and classification to each of the 6 classes. For example, it is shown that the application of K-means clustering algorithm to the data belonging to class A produced 2 clusters. Similarly, the K-means clustering algorithm separated data belonging to class B into 6 clusters, and so on. Such clustering of data within a class allows to refine the classification of the data in that class. Consequently, HMM can then be trained recursively on each of the new clusters. This is indicated in Figure 7 by the number of iterations executed when training the HMM. It is possible that a recursively trained HMM can respond with less classes than clusters. Evidently, the recursive application of HMM and K-means clustering algorithm allows to find more finely separated sub-classes, and hence allows for refined classification of the data. This refinement of classification is achieved through a tree-like diversification of the partitioning of the data. The recursion can continue until either the K-means clustering algorithm or the is unable to separate patterns into more clusters and classes. A set of experiments which visualizes the effect of the proposed approach is given in the following section.

23 VI. EXPERIMENTAL RESULTS We continued to apply the K-means clustering algorithm and HMM recursively on the dataset until the stopping criteria was reached. The final clustering of the data is illustrated in Figure 8 to Figure 12. Each of these figures correspond to one of the six classes produced at the 47th iteration of top level. Figure 8 gives an example by illustrating the clustering of data belonging to class A. Class A Total number of data 996) 2C Level 1) 21 Iterations Class1 6C 5 Iterations Class2 6C 4 Iterations Level 2) Class1 A1 59 ) Class4 A2 489 ) Class2 A4 522 ) Class5 7C 8 Iterations Class3 5C 4 Iterations Class6 A3 118 ) Class3 A8 676 ) Class1 4C 6 Iterations Class4 A9 44 ) Class4 2C 3 Iterations 1 class left A7 89 ) Level 3) Level 4) Class3 Class5 A5 19 ) 699 A6 ) Level 5) Fig. 8. Applying K-means clustering algorithm and HMM iteratively to profiles in class A. The values in the brackets give the class label, and the number of profiles in the sub-class. It is shown that the recursive application of K-means clustering algorithm and HMM discovered 9 subclasses for class A denoted as A1, A2,..., A9, and that the algorithm reached the stopping condition after recursing for at most 6 times which corresponds to Level 5 in Figure 8. A total of 71 classes are found when combining the results for all classes. Some of the properties of profiles found in each sub-cluster are visualized in Figure 13. Figure 13 presents the maximum, minimum, and average magnitude of total benefit values found in profiles within any given class. It is shown that the value ranges overlap considerably between classes, and hence, it can be concluded that the total benefit

24 Class B Total number of data 1771) Class1 Class2 Class3 3C 34 Iterations Class1 Class2 Class3 C 1692 B3 ) 616 B4 ) 7 Iterations Class1 479 B5 ) 2C Class2 6C 5 Iterations 28th Iteration Class1 Class2 Class6 C B7 ) B8 ) 759 B9 ) 744 B6 ) 6C 5 Iterations 1679 B1 ) Class1 B1 1646) Class4 2C 19 Iterations Class2 B ) Class5 C B2 ) th Iteration Class6 6C 15 Iterations Class2 Class5 Class6 B ) B ) B ) Level 1) Level 2) Level 3) Level 4) Fig. 9. Applying K-means clustering algorithm and HMM iteratively to profiles in class B. The values in the brackets give the class label, and the number of profiles in the sub-class. Class2 4C 5 Iterations Class1 Class2 Class3 Class4 4C C 42nd Iteration C 494 C2 ) 71 C3 ) 991 C4 ) 863 C5 ) Class C Class1 C 18 C1 ) Total Number of Data 99) 4C 5 Iterations C C Class3 4C 5 Iterations C 2th Iteration Class1 Class2 Class3 Class4 746 C6 ) 792 C7 ) 73 C8 ) 159 C9 ) C Class2 4C 24 Iterations 4C 25th Iteration Class4 2C 16 Iterations Class1 Class2 Class4 C C ) C ) C ) Class1 C C1 639 ) Level 1) Level 2) Level 3) Level 4) Fig. 1. Applying K-means clustering algorithm and HMM iteratively to profiles in class C. The values in brackets give the class label, and the number of profiles in the sub-class.

25 Class D Total number of data 18816) 2C 3C 5 Iterations 49th Iteration Class1 Class2 Class3 2C 4C Level 1) Level 2) 43 Iterations 5 Iterations 5 Iterations 12nd Iteration 3th Iteration Class1 4C 22 Iterations Class2 6C 5 Iterations Class1 5C 5 Iterations Class2 3C 7 Iterations Class1 Class2 Class3 Class4 2C 11 Iterations C D17 ) 656 C D D ) 36 ) Level 3) 38th Iteration 41st Iteration Class2 3C 31 Iterations Class4 C D1 ) 36 Class1 Class2 Class3 Class4 Class5 C D5 ) C C D ) D7 125 ) D8 984 ) D9 ) Class1 Class2 Class3 Class4 2C C Iterations D1 ) Class2 C Class3 Class1 Class2 D11 93 ) D ) D ) D2 171 ) D21 ) 892 D12 ) 161 Level 4) Class1 Class2 Class3 C 4C C D4 ) D2 ) 775 D2 ) D13 ) 542 C ) D Level 5) Fig. 11. Applying K-means clustering algorithm and HMM iteratively to profiles in class D. The values in brackets give the class label, and the number of profiles in the sub-class. Class E Total number of data 15733) 3C 5 Iterations Level 1) Class1 5C Class2 5C 44th Iteration Class3 4C Level 2) 5 Iterations 5 Iterations 5 Iterations 28th Iteration Class1 Class5 Class4 4C 13 Iterations C C 666 E1 ) 4C 35 Iterations Class1 Class2 Class1 Class2 Class3 C 33rd Iteration Class3 Class4 Class5 4C 14 Iterations C 2C 13 Iterations Class1 Class3 Class1 Class E2 ) 1144 E3 ) 8 E4 ) 1261 E5 ) 664 E6 ) 147 E8 ) 149 E9 ) E1 19) E ) 2C 5 Iterations 1 class left 137 E7 ) Class1 4C 17 Iterations 1 class left E ) 28th Iteration Class3 E12 336) Level 3) Level 4) Fig. 12. Applying K-means clustering algorithm and HMM iteratively to profiles in class E. The values in brackets give the class label, and the number of profiles in the sub-class.

26 paid to a patient over the course of a year does not contribute significantly to the clustering result. Each of the 71 classes is represented by a HMM. The HMM output is modeled by a mixture of Gaussian which are modeled by combining a number of Gaussian functions. We find that in our case, the HMM gives us a model which requires only one Gaussian function at the output. In Figure 14, we will show the magnitude of the mean and the corresponding variance for the output of each HMM. We notice that the mean values are distinct from class to class. For example, the representing profiles in class A generally produced the smallest mean values. In fact, after sorting the mean values the subclasses are still segmented into non-overlapping sections of parent classes A to F Zoom into the first 23 classes) Benefits paid A B Fig. 13. A B C D E F Classes and class labels List of classes. A zoom into the first 23 classes is also shown. From this, we find that a main contributing factor leading to the separation of profiles are the mean values as obtained by the. Another observation which can be made from Figure 13 and Figure 14 is that profiles which display large amounts of benefits paid generally generate a large mean and variance. This indicates that large benefit payments correspond to patients requiring frequent medical services. This means that patients making isolated instances of large claims can be atypical as such cases are typically not observed in the dataset.

27 Variance Mean value A B C D E F Class label Fig. 14. The variance top) and means bottom values of the 71. A. Classification of the 36 profiles using the developed methodology In this section we will show that the refined can provide a better explanation to patient s behavior. In Section III, we have used 36 sample profiles to demonstrate the grouping of clusters using the k-means clustering algorithm referring to Figure 2). In Section IV-A, the same 36 profiles have been classified by the obtained from the first iteration at the top level referring to Figure 4). It is thought interesting to see how these profiles are classified using the methodology developed, i.e., the classification by the final tree of. The results are shown in Figure 15. It is observed that the final hierarchical tree of provide a finer classification to the profiles. Take Class C in Figure 4 for example: The first five profiles in Figure 4 of Class C are classified into sub-class C1 shown in Figure 15. It is clear these patients had frequent visits to doctors within the specified 12 months. Most of their fortnightly claims are under $3.. The sixth and seventh profiles in Figure 4 of Class C are classified into sub-class C12 in Figure 15

28 where each patient had one fortnightly claim reached about $8.. The last three profile in Figure 4 of Class C are classified into sub-class C13 in Figure 15. All of them had a sudden change of their medical behavior which incurred about $8. benefit. The overall classification of all data in the training set is illustrated by Table VI. Note that the sizes of most classes differ from those seen in Figure 8 to Figure 12. This is due to the fact that the final classification is performed on the entire training set as opposed to during the training phase when classification is performed on the set of training data associated with the top level class only. B. Classification of profiles from the test set The algorithm has demonstrated to be efficient in grouping temporal similar patient behaviors together. In this section we wish to investigate the generalization capabilities of the approach. More specifically, the algorithm was trained on a relatively small sub-set of patient profiles. It is important to find how well data which were not used for training are classified. For this, we utilize 3,475 profiles, none of which has been used in the generation of the HMM models. These 3,475 patients are from the same age cohort, and domain as the training pattern. We will compare classification results on as generated after the first training iteration, and on as obtained after the training procedure converged. We will refer to the former set of HMMS as initial HMM set, and the later as final HMM set. The general recognition results of the initial and the final HMM sets are given in Table VII and VIII respectively. An obvious observation would be that the initial HMM set provides us with an aggregate recognition which means some classes contain large number of profiles, such as class D where 7824 profiles are grouped. A large group can imply an approximate clustering, since the number of classes is to restricted to allow a more efficient separation of profiles. This problem is not observed in Table VIII where the algorithm converged to a final set of. Overall, the classification of test patterns greatly reflect results as obtained on the training set in that the class sizes are proportional to those observed in Table V and Table VI.

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

CLUSTER ANALYSIS FOR SEGMENTATION

CLUSTER ANALYSIS FOR SEGMENTATION CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut. Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,

More information

Strategic Online Advertising: Modeling Internet User Behavior with

Strategic Online Advertising: Modeling Internet User Behavior with 2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods

More information

A Study of Web Log Analysis Using Clustering Techniques

A Study of Web Log Analysis Using Clustering Techniques A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties

More information

Automatic parameter regulation for a tracking system with an auto-critical function

Automatic parameter regulation for a tracking system with an auto-critical function Automatic parameter regulation for a tracking system with an auto-critical function Daniela Hall INRIA Rhône-Alpes, St. Ismier, France Email: Daniela.Hall@inrialpes.fr Abstract In this article we propose

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

Time series clustering and the analysis of film style

Time series clustering and the analysis of film style Time series clustering and the analysis of film style Nick Redfern Introduction Time series clustering provides a simple solution to the problem of searching a database containing time series data such

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Norbert Schuff Professor of Radiology VA Medical Center and UCSF Norbert.schuff@ucsf.edu

Norbert Schuff Professor of Radiology VA Medical Center and UCSF Norbert.schuff@ucsf.edu Norbert Schuff Professor of Radiology Medical Center and UCSF Norbert.schuff@ucsf.edu Medical Imaging Informatics 2012, N.Schuff Course # 170.03 Slide 1/67 Overview Definitions Role of Segmentation Segmentation

More information

Clustering through Decision Tree Construction in Geology

Clustering through Decision Tree Construction in Geology Nonlinear Analysis: Modelling and Control, 2001, v. 6, No. 2, 29-41 Clustering through Decision Tree Construction in Geology Received: 22.10.2001 Accepted: 31.10.2001 A. Juozapavičius, V. Rapševičius Faculty

More information

Time series Forecasting using Holt-Winters Exponential Smoothing

Time series Forecasting using Holt-Winters Exponential Smoothing Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract

More information

A hidden Markov model for criminal behaviour classification

A hidden Markov model for criminal behaviour classification RSS2004 p.1/19 A hidden Markov model for criminal behaviour classification Francesco Bartolucci, Institute of economic sciences, Urbino University, Italy. Fulvia Pennoni, Department of Statistics, University

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,

More information

CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES *

CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES * CLUSTERING LARGE DATA SETS WITH MIED NUMERIC AND CATEGORICAL VALUES * ZHEUE HUANG CSIRO Mathematical and Information Sciences GPO Box Canberra ACT, AUSTRALIA huang@cmis.csiro.au Efficient partitioning

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

MapReduce Approach to Collective Classification for Networks

MapReduce Approach to Collective Classification for Networks MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

Unsupervised learning: Clustering

Unsupervised learning: Clustering Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Coding and decoding with convolutional codes. The Viterbi Algor

Coding and decoding with convolutional codes. The Viterbi Algor Coding and decoding with convolutional codes. The Viterbi Algorithm. 8 Block codes: main ideas Principles st point of view: infinite length block code nd point of view: convolutions Some examples Repetition

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

A Review of Anomaly Detection Techniques in Network Intrusion Detection System A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks

Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks Jangmin O 1,JaeWonLee 2, Sung-Bae Park 1, and Byoung-Tak Zhang 1 1 School of Computer Science and Engineering, Seoul National University

More information

How To Perform An Ensemble Analysis

How To Perform An Ensemble Analysis Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

How to Get More Value from Your Survey Data

How to Get More Value from Your Survey Data Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Data mining and statistical models in marketing campaigns of BT Retail

Data mining and statistical models in marketing campaigns of BT Retail Data mining and statistical models in marketing campaigns of BT Retail Francesco Vivarelli and Martyn Johnson Database Exploitation, Segmentation and Targeting group BT Retail Pp501 Holborn centre 120

More information