Interactive Clustering for Data Exploration
|
|
|
- Oscar Goodman
- 10 years ago
- Views:
Transcription
1 Interactive Clustering for Data Exploration Joel R. Brandt Jiayi Chong Sean Rosenbaum Stanford University Figure 1: A complete view of our system. In the top left, the Solution Explorer is shown. Below this is the Member Table, and to the right are the visualizations of two solutions. A BSTRACT 1 Clustering algorithms are widely used in data analysis systems. However, these systems are largely static in nature. There may be interaction with the resulting visualization of the clustering, but there is rarely interaction with the process. Here, we describe a system for visual data exploration using clustering. The system makes the exploration and understanding of moderately-large ( instances) multidimensional (10-20 dimensions) data sets easier. Clustering is a natural part of the human cognitive process. When shown a set of objects, an individual naturally groups and organizes these items within his or her mind. In the domain of machine learning, unsupervised clustering algorithms have been developed to mimic this intrinsic process. Yet these algorithms are usually employed within rigid frameworks, removing the fluid, exploratory nature of human clustering. In this paper, we present a visual system that uses clustering algorithms to aid data exploration in a natural, fluid way. CR Categories: H.5.0 [Information Systems]: Information Interfaces and Presentation General; I.5.3 [Computing Methodologies]: Pattern Recognition Clustering Keywords: clustering, data exploration, interaction [email protected] [email protected] [email protected] 1.1 I NTRODUCTION Motivation Clustering techniques are widely used to analyse large, multidimensional data sets [1, 2]. However, this use is typically static in nature: the user loads a data set, selects a few parameters, runs a clustering algorithm, and then views the results. The process then stops here; clustering is used simply to analyze the data, not to explore it. We believe that with the right visualization environment, clustering can be used to provide a very natural way for users to explore
2 complex data sets. For example, when a user is given a small, lowdimensional data set to explore (such as a collection of objects on a table), a typical individual intuitively groups, or clusters, similar items mentally. The individual may then compare clusters, break up individual groups by different attributes, completely re-cluster the set based on different attributes, and so on. Without aid, however, both the size of the data set and the types of attributes that an individual can operate on is quite limited. Our system allows the user to perform these intrinsic operations within a much larger space. 1.2 Major Contributions Our system makes contributions in two main areas: data exploration techniques and visualization. More specifically, our system provides an intuitive mechanism for visually exploring moderately-large multi-dimensional data sets, supports a fluid, iterative process of clustering, refinement, and re-clustering to enable this exploration, and proposes a novel, faithful visualization of high-level characteristics of the clustering results. 1.3 Organization The rest of this paper proceeds as follows. In Section 2 we begin with an analysis of prior work in this area. We then detail our data exploration and visualization contributions in Sections 3 and 4 respectively. In Section 5, we give a complete example of the system in use. Finally, we conclude in Section 6 with a plan for future work. 2 PRIOR WORK A great deal of prior work has been done in the areas of visualizing clustering results and interacting with these visualizations. A relatively smaller amount of work has been done in the field of interacting with the clustering process. We examine each of these areas in turn, and then consider some related work that leverages techniques other than clustering. 2.1 Visualization of Clustering Results Given the complex output of clustering algorithms, good visualizations are necessary for users to interpret the results accurately. Visualizations generally display either micro-level, or macro-level characteristics of the clustering. Micro-level visualizations show explicitly the pairwise relations between instances. Conversely, macro-level visualizations attempt to express the quality of the clustering result, such as the size and compactness of each cluster and the separation of clusters relative to each other. We believe that understanding macro-level characteristics is most useful for data exploration, whereas interpreting micro-level characteristics lies in the domain of data analysis. This distinction is described in detail in Section 3.1. Here, we will examine existing visualizations for each class of characteristics separately Micro-level Visualizations Many micro-level visualizations begin by projecting the clusters into 2-dimensional or 3-dimensional space [5, 8]. The projection is chosen, as much as is possible, such that nearby items lie in the same cluster, and nearby clusters are similar. However, 3- dimensional visualizations are often unintuitive due to occlusion and depth-cuing problems. Likewise, 2-dimensional projections are often problematic because of the issues associated with accurately projecting high-dimensional data into a low-dimensional space. Lighthouse [5] takes an interesting approach by allowing the user to switch between 2- and 3-dimensional views. The usability of this feature, however, is not well studied. A colored matrix representation is another widely used method for visualizing clustering results [1, 2, 9, 10]. In this representation, instances lie on one axis, and features lie on the other. Each cell (corresponding to an instance/feature pair) is colored according to the value of that feature for that instance. The features are ordered by a hierarchical clustering method so that like rows appear next to each other. gcluto [9] takes this a step further by allowing the user to also cluster the transpose of the data set, and sort the features by similarity. Alternatively, this same colored matrix representation is often used to express pairwise distances between all instances. In this visualization, each instance lies on both axes. Each cell is colored according to the relative distance between the two instances represented. Hierarchical clustering methods are used to produce an ordering on the axes, so that the majority of cells corresponding to nearby points lie near the diagonal Macro-level Visualizations A relatively smaller amount of effort has been devoted to producing compelling macro-level visualizations. gcluto presents a Mountain visualization technique. The centroids of each cluster are projected into the plane as mentioned above. Then, at each centroid, a mountain is formed. Attributes of the mountain are mapped to attributes of the clusters: the height is mapped to the internal similarity of the cluster, the volume is mapped to the number of objects in the cluster, and the color of the peak is mapped to the internal standard deviation of the cluster s objects. While these are all important attributes to consider, the method of displaying them is arguably a bit unintuitive. 2.2 Interaction with Clustering Work on interaction with clustering algorithms is best divided into two categories: interaction with the result set and interaction with the clustering process Interaction with the Result Set Most commonly, systems provide a means of interacting with the result set. Because these result sets are often too large to be represented in their entirety on a typical display, these interactions usually center around hiding data. When hierarchical clustering is performed, visualization tools often provide a means for collapsing portions of the hierarchy. The Hierarchical Clustering Explorer [10] supports this through dynamic queries, and gcluto [9] supports this through a typical expandable tree structure. While these methods are effective for reducing screen clutter, little semantic meaning is tied to the directional branching of the tree, so it can be difficult to select only the regions of interest. Domain-specific methods of interacting with the result set are also common [5, 8]. For example, Lighthouse produces a visualization of web search results using clustering, and then allows the user to select a point in the representation to visit the corresponding site. Such domain-specific interactions are of little interest in this work. Finally, many clustering visualizations use detail on demand techniques [5, 9, 10]. Positioning a cursor over a particular data point, for example, often brings up a small window with metadata about the corresponding instance. Such techniques are necessary because of the large amount of data being displayed.
3 Visualize Solutions Subset Data Generate Sub-problem Cluster Figure 2: The data exploration pipeline for our system Interaction with the Clustering Process Systems that allow interaction with the clustering process are somewhat more rare. Many systems let users define initial cluster centroids in a visual way, rather than choosing them randomly. The value of such a system, however, is unclear: if it is easy for the user to select centroids, it is probably unnecessary to cluster the data! Some systems go a bit further and allow the user to interact with the clustering process as it is occurring. For example, Looney s work [6] allows the user to eliminate or merge clusters at various steps in the algorithm. While this work takes strides to solve some of the major problems with clustering, it requires that the user understand the data set in order to produce a result. We seek exactly the opposite paradigm: the user iteratively produces results in order to understand the data set Alternatives to Clustering Self-organizing maps are an unsupervised learning technique built on neural networks. They have been widely explored as a tool to aid both visualization and data exploration [3, 4]. They are typically employed to aid in the production of low-dimensional visualizations of high-dimensional data. The benefits of self-organizing maps are somewhat contrary to the goals of this work: they automatically reduce the dimensionality of the data, while providing little evidence of why a particular projection was chosen. Instead, we enable the user to explore and re-weight dimensions at his or her discretion, helping the user to understand links between these dimensions. 3 DATA EXPLORATION The principle goal of our system is to enable intuitive data exploration of moderately-large, multi-dimensional data sets. In this paper, we center our data exploration process around k-means, a straightforward clustering algorithm [7]. However, we believe our data exploration pipeline is applicable when coupling user interaction with any automated technique. We begin by explaining the difference between data exploration and data analysis. Then, we discuss our the details of our data exploration pipeline. 3.1 Data Exploration versus Data Analysis The distinction between data analysis and data exploration seems subtle at first. Most simply, in data analysis, the user knows what he or she is looking for; in data exploration, the user does not. Data analysis tasks typically investigate specific data instances, and their relation to other instances. The analyst usually has a large understanding of the structure of the data set he or she is working with. That is, the relations between attributes are typically well understood, or at least the characteristics of a particular attribute are well known. The examination of a gene array clustering, for example, is a typical data analysis task [1, 2]. The analyst knows what each gene is, and what each experiment is, and is attempting to determine which genes respond in similar ways to particular experiments. Such a task is completely static: a clustering is produced, visualized, and analyzed. Data exploration tasks are those which attempt to uncover the general structure of the data. Here, the user may not know which attributes best separate or explain the data, may not know the relationships between attributes, and may not even know which attributes are useful. However, the user is likely to have domain knowledge about the data set being explored. For example, the user may have high-level knowledge about instances in the data set, and may be interested in determining which attributes are most useful in predicting or explaining that knowledge. Data exploration is an iterative process of discovery. As such, tools for data exploration must support this iterative search. Specifically, we believe tools for data exploration must make it easy for the user to explore the data along multiple paths, create branches in those exploration paths, and compare various exploration paths. 3.2 The Data Exploration Pipeline The use of our system centers around our Data Exploration Pipeline, shown in Figure 2. The user explores the data by iterating through this pipeline. After loading the data set, the user is presented with a visualization of a solution to a trivial clustering problem: clustering all of the data into one cluster. From this visualization, the exploration begins: 1. The user explores a solution, visualizing it through the techniques described in Section The user selects a subset of the data to continue exploring. This subset may, of course, be the entire set. 3. The user generates a sub-problem using this subset of the data. This involves chooses the value of several parameters, such as number of clusters to form, which attributes to use when clustering, and the relative weights of each of those attributes. 4. The clustering is performed and the sub-solution is stored. 5. The process repeats using the new sub-solution. Of course, the user has more control over the pipeline than what is given here. For example, the user can generate several different sub-problems from any solution, varying the parameters (and even the sets) in each sub-problem. As a result of this flexibility, the
4 1 Figure 3: A view of an individual cluster. The centroid is shown in red. The currently selected point is shown in blue. Points that lie close to the selected point (in high-dimensional space) are shown in gray. pipeline results in a hierarchy of clustering solutions, where each sub-solution is a refinement of a subset of its parent solution. Furthermore, as will be discussed in Section 4, we allow the user to open up visualizations of as many solutions as is desired, and link the display of these solutions so that similarities and differences can easily be seen. In this way, it is easy for the user to explore the effects of clustering using different features and parameters. When each solution is generated, we keep track of the parameters used in the clustering, as well as the parameters defining the subset of instances to be clustered. With this information, if new data is added to the system (for example, if we want to classify additional instances), we can place the new instances in the appropriate clusters in all solutions within the system. 4 VISUALIZATION TECHNIQUES In this section, we examine the visualization techniques used to support the data exploration pipeline discussed in Section 3. We devote the majority of our attention to the techniques used to visualize a particular solution, and to compare several solutions, as this is the novel portion of our system. However, in Section 4.2.1, we discuss the interfaces for managing a hierarchy of solutions and for generating new solutions. The visualization techniques presented here have been developed with the goals of effectively representing macro-level characteristics of clustering results and enabling intuitive comparison of multiple clustering solutions. These are the techniques required for data exploration. While we provide some drill-down into the micro-level characteristics of a solution as a part of our brushing and highlighting techniques, these characteristics are not our primary concern. Investigation of micro-level characteristics lies mainly in the domain of data analysis rather than data exploration. We believe that much of the prior work discussed in Section 2 accomplishes the data analysis task successfully. So, in a complete system for both data exploration and analysis, we propose the marrying of our new techniques with extensions of existing analysis techniques. This is discussed further in our section on future work (Section 6.1). 4.1 Small Multiple Histograms for Cluster Visualization In the simplest sense, we visualize a clustering solution as a collection of histograms. Each cluster is represented by a histogram, as shown in Figure 3. The centroid of the cluster is placed on the left of the histogram, and the instances are arranged according to their Euclidean distance from the centroid. All of the histogram axes are scaled the same within one clustering solution. Complete sets of small multiples for a solution can be seen in Figures 1 and Figure 4: Four views of the same centroid histogram. In each histogram, the centroid indicated is used as the basis Centroid Histogram Furthermore, we produce a histogram layout of the centroids that mimics the individual cluster histograms. The user is able to select the centroid to serve as the basis, and the other centroids are placed in the histogram according to their Euclidean distance from the basis centroid. (This rocking of the basis element is discussed further in Section ) Examples of this visualization can be seen in Figure 4. Together, these histograms provide an intuitive summary of macro-level cluster characteristics. Cluster size and distribution can easily be seen within one cluster histogram. Comparing cluster histograms gives an understanding of relative compactness of each cluster. Finally, the centroid histogram summarizes the inter-cluster separation One Dimension versus Two Dimensions Our histograms can be thought of as a one-dimensional projection of the data. At first consideration, it may seem that projecting into one dimension (instead of two or three) gives up a great deal of flexibility. However, we believe that when combined with the decoupling of inter- and intra-cluster characteristics mentioned above, our histograms lead to a more faithful representation of the macro-level characteristics than would be possible in a typical two-dimensional projection. Consider a projection of all instances into two dimensions using a technique such as multi-dimensional scaling. In such a technique, one attempts to find the projection that preserves the pairwise distances between instances to the greatest extent possible. In most cases, the actual distance between two points will be well reflected by their distance in projected space. However, for some pairs of instances, such results may be impossible to achieve. It may be that the best projection overall still places a significant number of points close to other points that are actually far apart. Such misrepresentations make understanding macro-level characteristics of the
5 clustering result more difficult using this representation. Furthermore, even without these misrepresentations, the decoupling of inter- and intra-cluster characteristics in our method is not easily achievable in a two-dimensional projection of all data. Instead, the user must segment the space mentally to perform such comparisons. Our decoupling allows the user to more easily examine the characteristics he or she is concerned about, without having to block out additional information. As has been mentioned in Section 2, other two-dimensional techniques, such as colored matrix representations, exist for expressing cluster results. These techniques, however, are more suited toward micro-level examination, which is outside the scope of this work Rocking Rocking is a technique often used to solve depth-cuing problems when visualizing three-dimensional point data in two dimensions. If the point set is rotated slightly, the necessary depth cue is provided: points in front move one direction while points in back move the opposite direction. We borrow this idea of rocking to improve our histogram displays. In our histograms, it is often the case that two distant points will end up nearly the same distance from the centroid, and thus in the same place on the histogram. (Note that this is not a misrepresentation, we make no claim that distant points will end up far apart in the histogram.) We allow the user to select two points, one of which may be the centroid. After doing so, the first point is used as the basis for computing distances to build the histogram. A slider can be used to rock the basis point along the line between the first and second points. As before, points that move left are closer to the second point than the first, and conversely for points that move right. An example of this is shown in Figure 5. As mentioned earlier, we allow a similar type of rocking within the centroid display. We allow the user to select a basis centroid by clicking. When the user changes basis centroids, new positions for each centroid in the histogram are computed, and the movement between old and new positions is animated. As before, the movements express the relative locations of other centroids as compared to the two bases. Figure 4 shows all possible rockings of a single centroid histogram. From these views, it is clear that clusters 2 and 3 are located quite close to each other, whereas 1 and 4 are both far from each other and far from 2 and Data Subsetting A crucial part of the data exploration pipeline is the subsetting and reprocessing of data. We support data subsetting through dynamic queries as shown in Figure 6. The user makes a range selection simply by dragging the range selection brackets so that they enclose only the points of interest. Note that this range selection may be made with any point selected as the basis for distance calculation (and even when the basis point is being rocked.) This allows the user to select points that are close to (or far from) any point, rather than making selections with respect to the centroid only Generating Solution Hierarchies Once the desired subset is selected within each cluster histogram, a new sub-solution may be generated. The Sub-Solution Generator window (not shown) presents a simple user interface to select the attributes for clustering, assign weights to those attributes, choose the number of clusters to be formed, and initiate the clustering. When a new clustering solution is generated, it is placed in the Solution Explorer (shown in Figure 1) as a child of its parent solution. Any number of child and grandchild solutions may be created Figure 5: Rocking of a cluster. The two points used to compute the rocking line are shown in yellow. The amount of rocking is controlled by moving the slider. Figure 6: A dynamic query within a cluster histogram, used for both data subsetting and range selection in view linking.
6 from any parent solution. In this way, the user is able to traverse multiple exploration paths, branching as desired. The Solution Explorer keeps each solution organized, providing a summary of the attributes used to generate the clustering. 4.3 Brushing and View Linking With the generation of multiple solutions comes the need to explore them in concert. We support this exploration through brushing and view linking. As shown in Figure 3, brushing is used to highlight instances near a selected instance (in high-dimensional space). When a user hovers over an instance, it is highlighted in blue. Additionally, all nearby instances are colored gray. Note that this highlighting is somewhat akin to rocking. Only the instances which are actually near the instance of interest are highlighted. In addition to highlighting close-by instances within the cluster, we also highlight instances corresponding to the selected instance in all other views. Similarly, we highlight all instances contained within the active region of the currently selected cluster in all solution visualizations. (The currently selected cluster is defined by the chosen basis centroid in the currently focused window.) This highlighting is shown in Figure 7. View linking allows the user to easily visualize the relative consistency of clustering between different solutions using different attributes and weights. In this way, relations between attributes can be easily found and understood, helping the user uncover the structure of the data, the ultimate goal of data exploration. 4.4 Micro-level Data Examination A minimal amount of support is provided within the system for micro-level data examination. The member table, shown in Figure 1, lists all instances in the currently selected cluster. Brushing is supported between the member table and solution visualization: the selected instance in the visualization is highlighted in the member table, and likewise, a selected instance in the member table is highlighted in the solution visualization. Finally, if a range selection is made within the active cluster, only the selected instances are shown in the member table. Support for micro-level data examination could be greatly enhanced by building upon much of the prior work mentioned in Section We discuss our plans for this further in Section 6.1. None the less, we believe that the linking of brushing between macro- and micro-level visualizations presented here here would prove to be a very useful feature regardless of the micro-level visualization used. 5 EXAMPLE USE In this section, we present a brief example of one possible use of our system. Consider a network administrator who is attempting to locate machines that are behaving atypically. The administrator believes that the usage patterns of most machines stay consistent from month to month, and that a change in behavior might be an indication of an intrusion or other exploit. However, she has a large number of traffic metrics available to her, and is not sure which of these metrics best express a machine s usage pattern. The network administrator begins her data exploration by compiling a variety of traffic metrics for each month for each machine. Each of these values becomes an attribute. Each instance (a machine) has a group of attributes for each month of data, resulting in n m attributes, where n is the number of traffic metrics, and m is the number of months. All of this data is loaded into the system. She first decides to cluster the entire dataset into 5 clusters using the all of the first months attributes. To do this, she opens the initial solution (a clustering of everything into one cluster) present in the Solution Explorer. She does not need to subset the data, so she simply uses the Sub-Problem Generator to define her clustering problem: she selects the attributes of interest and chooses 5 clusters. After the clustering is completed, she visualizes the new solution. She observes that two clusters contain most of the instances. She quickly hovers over the few instances that lie in the other three clusters. She discovers that all of these machines are servers of one sort or another. Since she watches these machines pretty closely using other tools, she decides to exclude them from her exploration. She makes an empty range selection in each of these clusters, leaves the entire range selected in the two dense clusters, and opens up the Sub-Problem Generator again. She decides now to try to confirm her theory that the behavior of most machines does not change from month to month. Using the subset of machines selected, she produces two new sub-solutions: one using the first month s attributes, and one using the second month s attributes. She opens up both solutions, and utilizes the view linking features to explore their similarity. This exploration is shown in Figure 7. She selects each cluster in the first month s visualization in turn. As she does so, the corresponding instances in the second month s visualization are highlighted. As is shown in Figure 7, the clusters stay relatively consistent. She quickly examines those instances that change clusters. For some, she easily observes that the machines are outliers in both clustering solutions. This suggests that the attributes or weights in use may not be optimal. For others, she uses her domain knowledge to explain the differences. For example, perhaps one of the instances is a machine that was added in the middle of the first month. For a few others, she decides to do further investigation. From this point, her data exploration process could go any number of directions. She could produce several clustering solutions for the same month using different attributes to explore links between the attributes. She could produce clustering solutions of varying sizes for the same attributes, giving her a clearer picture of how many types of machines she really has. The system easily supports these and many other tasks. Furthermore, once she has determined the set of attributes and weights that best characterizes her data, she can carry this information over into the data analysis domain and use these same clustering techniques in her daily network monitoring. 6 CONCLUSION We have presented a visualization system that harnesses clustering algorithms to make exploration of moderately-large, highdimensional data sets more intuitive. An iterative process of visualization, query refinement, and re-processing was presented that we believe accurately represents the ideal data exploration process. Furthermore, we proposed a novel method for visualizing macrolevel clustering characteristics. Finally, we showed an example use of our system. 6.1 Future Work We believe that a complete solution would provide means for both data exploration and data analysis. In this work, we have only explored the domain of data exploration. We are interested in augmenting the visualizations presented here with adaptations of some of the techniques presented in Section 2 to produce such a complete system. We also plan to characterize more clearly the trade-offs between existing two-dimensional visualizations and the one-dimensional approach presented here when visualizing macro-level characteristics of clustering results. Furthermore, we plan to explore ways to afford the user more control over the visualizations produced using our techniques. For example, the user could control the distance
7 Figure 7: An example of view linking. In each group, the same two solutions are shown. The cluster indicated at the bottom of each group is selected in the left visualization. Instances in the right visualization are colored blue if they are members of the cluster selected on the left. metric used to perform the layout separately from the weights used in clustering. This would allow exploration of the tightness of various dimensions. Similarly, we would like to investigate ways to make rocking more general and intuitive. We also plan to explore the affinity of other clustering techniques (as well as other non-clustering-based machine learning techniques) to our data exploration framework. Ultimately, an in-depth user study of an extended version of this system would be quite valuable. This is the only way to accurately judge the usefulness and applicability of these techniques. 7 ACKNOWLEDGEMENTS We would like to thank Alexis Battle, Dan Ramage, and Ling Xiao for their helpful insights. We would also like to thank Pat Hanrahan and all of the Winter 2006 CS448b class for their useful comments. R EFERENCES [1] U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In Proceedings of the National Academy of Sciences, volume 96, pages , [2] Amir Ben-Dor, Ron Shamir, and Zohar Yakhini. Clustering gene expression patterns. Journal of Computational Biology, 6(3/4): , [3] Arthur Flexer. On the use of self-organizing maps for clustering and visualization. In Principles of Data Mining and Knowledge Discovery, pages 80 88, [4] M.Y. Kiang, U.R. Kulkarni, and Y.T. Kar. Self-organizing map network as an interactive clustering tool an application to group technology. Decision Support Systems, 15(4): , December [5] Anton Leuski and James Allan. Lighthouse: Showing the way to relevant information. In INFOVIS, pages , [6] Carl G. Looney. Interactive clustering and merging with a new fuzzy expected value. Pattern Recognition, 35(11): , November [7] J. McQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages , [8] Sougata Mukherjea, James D. Foley, and Scott E. Hudson. Interactive clustering for navigating in hypermedia systems. In ECHT 94: Proceedings of the 1994 ACM European conference on Hypermedia technology, pages , New York, NY, ACM Press. [9] Matt Rasmussen and George Karypis. gcluto: An interactive clustering, visualization, and analysis system. Technical Report TR-04021, University of Minnesota, [10] Jinwook Seo and Ben Shneiderman. Interactively exploring hierarchical clustering results. IEEE Computer, 35(7):80 86, July 2002.
Big Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs [email protected] Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
Visualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Topic Maps Visualization
Topic Maps Visualization Bénédicte Le Grand, Laboratoire d'informatique de Paris 6 Introduction Topic maps provide a bridge between the domains of knowledge representation and information management. Topics
COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3
COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping
Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary
Specific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence
DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence Abstract David Gotz, PhD 1, Jimeng Sun, PhD 1, Nan Cao, MS 2, Shahram Ebadollahi, PhD 1 1 IBM T.J. Watson Research Center, New
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Clustering Data Streams
Clustering Data Streams Mohamed Elasmar Prashant Thiruvengadachari Javier Salinas Martin [email protected] [email protected] [email protected] Introduction: Data mining is the science of extracting
HDDVis: An Interactive Tool for High Dimensional Data Visualization
HDDVis: An Interactive Tool for High Dimensional Data Visualization Mingyue Tan Department of Computer Science University of British Columbia [email protected] ABSTRACT Current high dimensional data visualization
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Hierarchical Data Visualization. Ai Nakatani IAT 814 February 21, 2007
Hierarchical Data Visualization Ai Nakatani IAT 814 February 21, 2007 Introduction Hierarchical Data Directory structure Genealogy trees Biological taxonomy Business structure Project structure Challenges
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
OLAP. Business Intelligence OLAP definition & application Multidimensional data representation
OLAP Business Intelligence OLAP definition & application Multidimensional data representation 1 Business Intelligence Accompanying the growth in data warehousing is an ever-increasing demand by users for
Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
The Value of Visualization 2
The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21 Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate
Multivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch
Global Journal of Computer Science and Technology Software & Data Engineering Volume 12 Issue 12 Version 1.0 Year 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills
VISUALIZING HIERARCHICAL DATA Graham Wills SPSS Inc., http://willsfamily.org/gwills SYNONYMS Hierarchical Graph Layout, Visualizing Trees, Tree Drawing, Information Visualization on Hierarchies; Hierarchical
Self Organizing Maps for Visualization of Categories
Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, [email protected]
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering
Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
What is Visualization? Information Visualization An Overview. Information Visualization. Definitions
What is Visualization? Information Visualization An Overview Jonathan I. Maletic, Ph.D. Computer Science Kent State University Visualize/Visualization: To form a mental image or vision of [some
Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
A successful market segmentation initiative answers the following critical business questions: * How can we a. Customer Status.
MARKET SEGMENTATION The simplest and most effective way to operate an organization is to deliver one product or service that meets the needs of one type of customer. However, to the delight of many organizations
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Visualization Quick Guide
Visualization Quick Guide A best practice guide to help you find the right visualization for your data WHAT IS DOMO? Domo is a new form of business intelligence (BI) unlike anything before an executive
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Visualization Techniques in Data Mining
Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano
Graphical Representation of Multivariate Data
Graphical Representation of Multivariate Data One difficulty with multivariate data is their visualization, in particular when p > 3. At the very least, we can construct pairwise scatter plots of variables.
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
3D Interactive Information Visualization: Guidelines from experience and analysis of applications
3D Interactive Information Visualization: Guidelines from experience and analysis of applications Richard Brath Visible Decisions Inc., 200 Front St. W. #2203, Toronto, Canada, [email protected] 1. EXPERT
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
They can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat
HQJHQH70 *XLGHG7RXU This document contains a Guided Tour through the HQJHQH platform and it was created for training purposes with respect to the system options and analysis possibilities. It is not intended
Strategic Online Advertising: Modeling Internet User Behavior with
2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009
Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation
Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
Data Mining with SQL Server Data Tools
Data Mining with SQL Server Data Tools Data mining tasks include classification (directed/supervised) models as well as (undirected/unsupervised) models of association analysis and clustering. 1 Data Mining
Cluster Analysis: Basic Concepts and Algorithms
Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths
High-Dimensional Data Visualization by PCA and LDA
High-Dimensional Data Visualization by PCA and LDA Chaur-Chin Chen Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan Abbie Hsu Institute of Information Systems & Applications,
Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015
Principles of Data Visualization for Exploratory Data Analysis Renee M. P. Teate SYS 6023 Cognitive Systems Engineering April 28, 2015 Introduction Exploratory Data Analysis (EDA) is the phase of analysis
Cluster Analysis for Evaluating Trading Strategies 1
CONTRIBUTORS Jeff Bacidore Managing Director, Head of Algorithmic Trading, ITG, Inc. [email protected] +1.212.588.4327 Kathryn Berkow Quantitative Analyst, Algorithmic Trading, ITG, Inc. [email protected]
A Web-based Interactive Data Visualization System for Outlier Subspace Analysis
A Web-based Interactive Data Visualization System for Outlier Subspace Analysis Dong Liu, Qigang Gao Computer Science Dalhousie University Halifax, NS, B3H 1W5 Canada [email protected] [email protected] Hai
A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images
A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
<no narration for this slide>
1 2 The standard narration text is : After completing this lesson, you will be able to: < > SAP Visual Intelligence is our latest innovation
20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns
20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns John Aogon and Patrick J. Ogao Telecommunications operators in developing countries are faced with a problem of knowing
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
Customer Analytics. Turn Big Data into Big Value
Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
A New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
Hierarchical Clustering Analysis
Hierarchical Clustering Analysis What is Hierarchical Clustering? Hierarchical clustering is used to group similar objects into clusters. In the beginning, each row and/or column is considered a cluster.
Analytics with Excel and ARQUERY for Oracle OLAP
Analytics with Excel and ARQUERY for Oracle OLAP Data analytics gives you a powerful advantage in the business industry. Companies use expensive and complex Business Intelligence tools to analyze their
An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]
An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] Stephan Spiegel and Sahin Albayrak DAI-Lab, Technische Universität Berlin, Ernst-Reuter-Platz 7,
Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?
Information Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY
QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,
Machine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
Grid Density Clustering Algorithm
Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2
Heat Map Explorer Getting Started Guide
You have made a smart decision in choosing Lab Escape s Heat Map Explorer. Over the next 30 minutes this guide will show you how to analyze your data visually. Your investment in learning to leverage heat
Pitfalls and Best Practices in Role Engineering
Bay31 Role Designer in Practice Series Pitfalls and Best Practices in Role Engineering Abstract: Role Based Access Control (RBAC) and role management are a proven and efficient way to manage user permissions.
from Larson Text By Susan Miertschin
Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.
Working with telecommunications
Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature
