2D, 3D and High-Dimensional Data and Information Visualization

Size: px
Start display at page:

Download "2D, 3D and High-Dimensional Data and Information Visualization"

Transcription

1 University of Hannover Institut für Wirtschaftsinformatik (IWI) 2D, 3D and High-Dimensional Data and Information Visualization Kim Bartke Tutor: Prof. Michael H. Breitner Seminar on Data and Information Management SS 05

2 CONTENTS 1 Contents 1 Introduction 3 2 Data visualization in knowledge discovery Data types Data visualization techniques Two-dimensional data Three-dimensional data High-dimensional data Icon-based methods Hierarchical methods Geometrical methods Projection techniques Multi-dimensional scaling (MDS) Self-organizing maps (SOMs) Interaction Filtering Linking and brushing Zooming Manipulation of data Software and Applications 21 7 Outlook and Conclusion 22

3 LIST OF FIGURES 2 List of Figures 1 Scatterplot of car data set [Hoffmann99] Total crimes D linegraph (surface) [generated with Matlab] Chernoff faces [Ward99] Star glyphs [Oellien03] Dimensional stacking Fractal foam [Hoffmann99] Parallel coordinates for points A, B, C Parallel coordinates (Iris data set) [Grinstein01] Parallel coordinates (Iris data set) [Hoffmann99] Andrew s curves for data points A, B, C Andrews curves (Iris data set) [Hoffmann99] Scatter plot matrix (Car data set) [Hoffmann99] RadViz (Iris data set) [Grinstein01] PolyViz (Iris data set) [Grinstein01] Hyperbolic patch embedded into R Circle Limit III (1958) Self organizing map of the Iris data set [Grinstein01] Tramnetwork F+C technique [Keahey]

4 1 INTRODUCTION 3 1 Introduction We do it in the city, in the country side and on mountains; we even do it in space. You are asking yourself what I am talking about. We collect data. Since the beginning of mankind, people have gathered data. They used to do it by hand, counting mammoths or cattle, watching the sun and the clouds. Nowadays we use electronic devices to inspect and monitor our environment. In police departments you can find huge databases about criminals and crime. Everywhere on the planet we see weather stations that collect meteorological data, and in space large space shuttles examine planets, their atmosphere and the Earth. These electronic devices enable us to collect a vast amount of data and todays computers make it possible to store and process it. The reason we make the effort to accumulate data in such quantities is because of the information it can yield. For example police officers endeavour to find patterns in criminals behaviour to enable them to react on crimes more quickly and more effectively, while meteorology stations are used to predict the weather and, who knows, we may even find signs of life in outer space. So we have two main goals in our data analysis: the generation of hypotheses and their verification. This means that we use the gathered information to either describe the actual situation or make predictions for the behaviour of future data. As mentioned before computers do not only allow us to store data but also to process large amounts of it. In order to discover knowledge we often use numerical solutions such as data mining algorithms. But with the increase of both the number of variables (or dimensions) and the number of cases it is very demanding to recognize patterns in data. It is also difficult to understand the data structure itself and the exploration process. The numerical solutions are also very susceptible to noise and incorrect data. Remedial measures can be taken by presenting the data visually. Data visualization is the mapping of data into a Cartesian space. This method integrates a person s creativity and expertise into the knowledge discovery process and therefore allows a symbiosis of the computational power with our visual potentials. The visualization does also give the user the opportunity to interact with the computer, e. g. change the data and observe the reactions to gain a better insight into the data structure. The greatest challenge for visualizing data is to find a good spatial representation. Most of the time, the data that needs to be presented is high dimensional. That means in order to communicate the information we have to scale the data so that we can display it in two-dimensional (2D) or three-dimensional (3D) coordinate systems while losing the least amount of information. This is fairly easy for scientific data. Problems arise if data depends on variables such as customer satisfaction or purchase channels which are nonspatial. In the following paper I will present the uses of data and information visualization in the knowledge discovery process by explaining and evaluating data visualization techniques for both 2D and 3D representation. I will then deal with human-machine interaction methods and will also provide some examples of software packages and their application. I will conclude with giving you a preview on the direction of further research, e. g. the further development of the group of visualization tools.

5 2 DATA VISUALIZATION IN KNOWLEDGE DISCOVERY 4 2 Data visualization in knowledge discovery As mentioned before, knowledge discovery in databases (KDD) is the way of gaining interpretable information out of a set of raw data. Fayyad et al. [Rhodes02] make the distinction between KDD as the process of discovering knowledge whereas data mining is the method of extracting patterns from the data. Bearing this in mind the following steps in the KDD process can be identified [Oellien03, p. 2]. The first step is the pre-processing and data preparation. This includes the selection of a particular part of the data set that seems to be suitable for the following process. It is also very important to carry out noise reduction algorithms and exclude false data points to guarantee a result that reflects its nature as well as possible. The data set might also need to be adjusted. Normalization is a very useful and often performed transformation. Another key technique is the projection of high-dimensional data into the two or three dimensional space. The second step is the actual use of data mining algorithms. The goal of this step is the classification of the data as well as the clustering and summarization of it [Fayyad02]. It concludes in the pattern recognition. Finally the information gained from the data needs to be communicated to the user(s). All these elements of the analytic process are often done with the help of visualization. It is this visualization that this paper will focus on. Since the techniques always depend on the data type, this chapter will start with an overview of data types. 2.1 Data types Every data set has a general structure. It is always characterised by a group of variables (also called dimensions) and the records the database contains. One way of categorizing data is to differentiate between sets that can be described by dimensionality and sets that cannot [Keim02]. The first group consists of one-dimensional, two-dimensional, three-dimensional and highdimensional data sets. The variable in one-dimensional data is usually time. An example is the log of interrupts in a processor. Two-dimensional data can often be found in statistics like the number of financial transactions in a certain period of time. Threedimensional data can be positions in three-dimensional space or points on a surface whereas time (the third dimension) varies. High-dimensional data contains all those sets of data that have more than three considered variables. Examples are locations in space that vary with time (here: time is the fourth dimension) or any other combination of more than three variables, e. g. product - channel - territory - period - customer s income. In the second group we distinguish between text and graphs. Especially since the birth of the World Wide Web, analysing text (or in this case hypertext) becomes more and more important. Text itself is not easily analysed by data visualization techniques, so a transformation into numbers is necessary. A technique for this change from textual to numerical data could be word counting for instance. A graph is a set of objects, called nodes, and connections between these objects, called edges [Keim02]. Any relational databases are examples for this type of data sets. In the following chapters I will present several data visualization techniques for two-, three- and high-dimensional data sets. I will also give an overview of two non-linear proc 2005 Kim Bartke: 2D, 3D and High-Dimensional Data and Information Visualization

6 3 DATA VISUALIZATION TECHNIQUES 5 Figure 1: Scatterplot of car data set [Hoffmann99] jection techniques to reduce the size of high dimensional data, namely multi-dimensional scaling (MDS) and Kohonens Self-Organizing Maps (SOM) as an example for a neural network algorithm. 3 Data visualization techniques 3.1 Two-dimensional data Two-dimensional data can be visualized in different ways. A very common visualization form is the scatterplot. In a scatterplot the frame for the data presentation is a Cartesian coordinate system, in which the axes correspond to the two dimensions. The data is usually represented by points in the coordinate systems first quadrant (assuming the data point values are not negative). In case of two or more data sets being displayed in the same coordinate system different colours can be used to distinguish between the distinct plots. A problem with this way of displaying data arises when the amount of data points gets very high as the points become too dense. In order to avoid this Becker suggests binning of the data set [Sahling03]. The quality of the visualization now depends on the number of bins and their sizes. Figure 1 shows the distribution of miles per gallon (MPG) vs. horsepower for American (red), European (blue) and Japanese (green) cars. Another important visualization technique for two-dimensional data is the linegraph. The difference to scatterplots is that this time the relation between the dimension on the horizontal axis and the one on the vertical axis is definite. Figure 2 shows an example for a linegraph displaying the number of crimes in Niedersachsen in the years 1993 to Extensions of linegraphs are survey plots. They can be obtained by turning the plot 90 degrees clockwise and then halve the length of the rays and add this half on the other side of the now vertical axis. The last technique I would like to mention here is the visualization of data as barcharts. Considering the last figure a barchart representation would be the same as above but with the area under the graph filled in. Histograms are particular barcharts with the bar standing for the sum of the data point class [Hoffmann02]. 3.2 Three-dimensional data The two-dimensional techniques can easily be extended to three dimensions. The third dimension is achieved in scatterplots and barcharts by adding a further axis, orthogonal

7 3 DATA VISUALIZATION TECHNIQUES 6 Figure 2: Total crimes Figure 3: 3D linegraph (surface) [generated with Matlab] to the other two. The additional dimension in a linegraph representation has the effect that the resulting plot is a surface. Figure 3 shows an example that has been generated with Matlab. A very widespread technique for visualizing the third dimension in a two-dimensional coordinate system is the use of colour or a variation of the data point size. Another very interesting visualization technique is animation to show the variation of the plot with time for instance. 3.3 High-dimensional data The visualization of high-dimensional data raises a very severe problem: the visualization space is limited to three dimensions or even to only two since data is usually displayed on screens or paper. One of the obstacles in the discovery of high-dimensional data sets information Mihalisin [Mihalisin02] points out is that techniques of extracting lowdimensional information and displaying it cannot automatically be employed for highdimensional data as the data set size is too large. Next we have to study the effect on the possible resulting data sets if we increase the number of variables or values they can hold. In order to do this, consider the following example [Mihalisin02]: We have a data set

8 3 DATA VISUALIZATION TECHNIQUES 7 Figure 4: Chernoff faces [Ward99] consisting of six columns which represent the attributes product, territory, sales channel, method of payment, time of payment and a unique identifier. Furthermore we have 100,000 rows representing the records. Our company sells five products in five different territories via two sales channels. We also offer the opportunity of two distinct methods of payment, all divided into five quarters. This means there are = 500 possible cell results. 100,000 records, each having one of the 500 cell results, leads to 100, 499! 100, = as the amount of different data sets. This is a huge number (larger than the number quantity of atoms in the universe!) and it is only a very small database. Coming now to the different visualization techniques, we distinguish between icon-based, hierarchical and geometrical methods Icon-based methods Icon-based methods are approaches that use icons (or glyphs) to represent high-dimensional data. They map data components to graphical attributes. The most famous technique is the use of Chernoff faces [Hoffmann02]. In this case a data point is represented by an individual face whereas the features map the data dimensions. Five different sizes of the eyes could correspond to the five products of the example above and the mouth might symbolize the two methods of payment. This scheme uses a person s ability of recognizing faces. Examples for Chernoff faces shows figure 4. The probably most common icon-based technique is the use of star glyphs to denote data points. A star glyph consists of a centre point with equally angled rays. These branches correspond to the different dimensions and the length of the limbs mark the value of this particular dimension for the studied data point. A polygon line connects the outer ends of the spokes [Oellien03]. An illustration of the star glyphs approach is figure 5. These icon-based techniques are very vivid but have several disadvantages. A very severe problem is the organisation of the glyphs on the screen as no coordinate system representing two of the dimensions is provided. Even if you decided to use a Cartesian system it

9 3 DATA VISUALIZATION TECHNIQUES 8 Figure 5: Star glyphs [Oellien03] would put more weight on these two dimensions and so probably distort the data pattern. Another obstacle is the amount of variables and the size of the data set itself. If the number of rays become too high a distinction between the different spokes and the values they represent is not possible anymore. A similar unclear map emerges if the number of data points exceeds a certain amount Hierarchical methods The most important representative of the group of hierarchical visualization techniques is dimensional stacking. It is a method of embedding coordinate systems recursively into each other [Grinstein02a]. Consider again the example with the five products, five territories, two sales channels, two methods of payment and five quarters [Mihalisin02]. First of all you have to select the two outermost dimensions. We choose the quarters and the pay types. Our horizontal axis is now divided into five parts while the vertical axis becomes halved. We now decide that we would like the sales channel to be embedded into the method of payment, so each part of the pay type axis gets further divided into two parts that represent the different channels. The axis corresponding to the quarters will embed the products so these elements become subdivided as well. Finally the upright axis lodges the five territories. The resulting coordinate axes combination system can be obtained in figure 6. It shows that goods of product type four, sold in quarter one in territory four, via the first sales channel and the first type of payment can be represented by the coloured rectangle. In order to visualize the amount of data points you can use a colour/grey scale. Considering the colour scale drawn next to the plot the filled rectangle would represent an amount of less than 40,000 items. This value is binned since otherwise a clear visualization would not be possible. The common depiction of the dimensional stacking technique is a bit more compact and not as nicely presented as the one above. Usually the rectangles, which are now spaced to make the distinction between the different attribute combinations easier, are close to each other, only separated by a thicker line.

10 3 DATA VISUALIZATION TECHNIQUES 9 Figure 6: Dimensional stacking This method is very useful for hierarchical data sets that only have a small number of dimensions as otherwise the embedding process will make the resulting plot too crowded. A great challenge is the question of labelling. The way chosen in the example is one possibility of naming the different variables in the plot. A technique that displays the correlation between dimensions (not the data itself!) recursively [Hoffmann02] is the fractal foam. The starting point is a chosen dimension that is depicted by a coloured circle. Attached to this circle are further circles, which symbolize the other dimensions. The size of these rings corresponds to the correlation between the inner circle and the fastened ones. A high correlation requires a large circle. Fixed to the second layer of circles is a third layer which describes the correlation of these dimensions and so on. An example of fractal foam can be found in figure Geometrical methods Geometrical methods are a very large group of visualization techniques. Probably the easiest and most commonly used one is the method of parallel coordinates. Here the dimensions are represented by parallel lines, which are equally spaced. They are linearly scaled so that the bottom of the axis stands for the lowest possible value whereas the top corresponds to the highest value. A data point is now drawn into this system of axes with a polygonal line, which crosses the variable lines at the locations the data point holds for the examined dimension. A simple example with three points and four dimensions is shown in figure 8. The points displayed are A = (1; 3; 2; 5), B = (2; 4; 1; 6) and C = (1; 4; 3; 5). This method is not exclusively applicable to data sets that are as simple as the last one. One of the familiar high-dimensional data set examples used to explain data visualization techniques is the Iris data set. It consists of three different Iris types, namely Iris Setosa,

11 3 DATA VISUALIZATION TECHNIQUES 10 Figure 7: Fractal foam (sepal length - centre (white), petal length - right (red), petal width - top (yellow), sepal width - bottom (green)) [Hoffmann99] Figure 8: Parallel coordinates for points A, B, C

12 3 DATA VISUALIZATION TECHNIQUES 11 Figure 9: Parallel coordinates (Iris data set) [Grinstein01] Iris Versicolor and Iris Virginica. The variables of this data set are the sepal length, the sepal width, the petal length and the petal width, all measured in millimetres. As you can see in plot 9 the parallel coordinate technique is a tool which enables you to find out attributes that allow a categorization of the different flower types. In the diagram the petal width seems to be a good classifier for the red Iris type. It is also a fairly good attribute to distinguish between the violet and green flower category. A very significant feature of this visualization technique is that the dimensions are treated equally. This characteristic permits a rearrangement of the displayed dimensions, which gives another view on the data and therefore might lead to the recognition of certain patterns (or classification attributes) that would otherwise be hidden in the actual visualization arrangement. Figure 10 shows the same Iris data set but this time normalized and with the dimensions sepal width and sepal length swapped. The resultant graph looks very different and much clearer. Another interesting geometrical visualization technique is the use of Andrew s curves [Hoffmann02]. This method plots each data point as a function of the data values using a specific equation. The data point curves are usually sketched in the interval π < t < π. The function which draws these curves is shown as: f(t) = x 1 + x 2 sin(t) + x 3 cos(t) + x 4 sin(2 t) + x 5 cos(2 t) +..., 2 where x = (x 1, x 2,..., x n ) and x n are the values of the data points for the particular dimension. Consider the example of the three data points already used to explain the parallel coordinates technique (A = (1; 3; 2; 5), B = (2; 4; 1; 6) and C = (1; 4; 3; 5)). For data point A, the function f A (t) = sin(t) + 2 cos(t) + 5 sin(2 t). For data point B, the function f B (t) = sin(t) + 1 cos(t) + 6 sin(2 t). For data point C, the function f C (t) = sin(t) + 3 cos(t) + 5 sin(2 t). If you plot these three data points into one coordinate system using Matlab you obtain the result depicted in figure 11. Applying this algorithm now on the Iris data set mentioned before results in a graph (Figure 12) that looks slightly more complex.

13 3 DATA VISUALIZATION TECHNIQUES 12 Figure 10: Parallel coordinates (Iris data set) [Hoffmann99] Figure 11: Andrew s curves for data points A, B, C

14 3 DATA VISUALIZATION TECHNIQUES 13 Figure 12: Andrews curves (Iris data set) [Hoffmann99] The advantage of this algorithm is that it is easily applied to data with a large amount of dimensions. The disadvantage is the long computational time as every data point requires the calculation of a trigonometric function [Hoffmann02]. A very basic technique to visualize high-dimensional data is the application of multiple views. They are often used with scatterplots or barcharts leading to an n n cell matrix, where n is the number of dimensions. Each cell of this matrix is then a scatterplot or a barchart respectively. This method is widely employed for data sets that contain diverse attributes. It reveals correlations and disparities between variables since the representation of the different component combinations next to each other allows a visual comparison of the possible connections. In the next example the method has been applied to the car data set, another widely employed set for visualization techniques. This table contains the combinations of miles per gallon (MPG), year of manufacture, cylinders, acceleration, horsepower and weight for three different car types. The red spots in figure 13 symbolize American cars, the green ones Japanese cars and the blue ones European cars. This figure clearly identifies a positive correlation between horsepower and weight, whereas the combination of MPG and weight reveals a negative correlation [Hoffmann02]. Even though this method is a very functional tool in the visualization of data it does have several disadvantages. A very problematic one is the fact that the user becomes overwhelmed by the number of charts they have to evaluate and keep in mind while doing so. The usage of space is a more practical aspect that needs consideration. The car example produces a matrix, which is not only a manageable quantity to work with but also to display. If the data set was extended to ten dimensions for instance the presentation of

15 3 DATA VISUALIZATION TECHNIQUES 14 Figure 13: Scatter plot matrix (Car data set) [Hoffmann99] the corresponding graph in a clear way would no longer be possible. The last two techniques I would like to present in this paper belong to the division of anchor visualization methods. They are both fairly new approaches to the problem, the second being the further development of the first one. Radial Coordinate Visualization (RadViz) uses the spring paradigm [Hoffmann02]. From a centre point n equally spaced limbs of the same length spread out, each representing one dimension. The ends of the lines mark the dimensional anchor (DA) of the respective variable, which are connected forming a circle. Before the data points can be visualized by this technique they need to be normalized. After that one end of a spring is fastened to each dimensional anchor, the other end to the data point. The spring constant of each spring is the value of the data point of the respective dimension. In order to determine the location of the data point the sum of the spring forces needs to equal zero. If you apply this method to the well known Iris data set you can obtain figure 14. An advantage of RadViz is the fact that it preserves certain symmetries of the data set [Hoffmann02]. The major disadvantage is the overlap of points. The second dimensional anchor technique, which has been named PolyViz, takes remedial measures. The emerging plot is a combination of RadViz and the application of the barchart technique. It illustrates the DAs not as points as in RadViz but as lines so that the graph becomes a polygon. This technique nevertheless shows the clustering of the data points in the middle of the polygon as it uses the same spring paradigm. But it also makes a study of the distribution along the different dimensions possible since it

16 4 PROJECTION TECHNIQUES 15 Figure 14: RadViz (Iris data set) [Grinstein01] plots this scattering along the axes using the barchart technique [Hoffmann02] (Figure 15). All the techniques explained above visualize data sets without trying to change them in order to simplify the visualization. In the following chapter I will introduce non-linear projection methods that reduce the size of the dimension vector so that the display of the data sets becomes facilitated. 4 Projection techniques The general goal of projection techniques is the reduction of the dimensionality of data to obtain a spatial mapping of the particular data set in the available space. I will distinguish between two methods that differ in their side condition. The first method, multi-dimensional scaling, tries to preserve distances between data points whereas neural networks focus on the maintenance of structure. 4.1 Multi-dimensional scaling (MDS) As stated above multi-dimensional scaling has its focus on the preservation of distance. The distance d ij is the Euclidean distance between the data points x i and x j in the n-dimensional space, which is d ij = x i x j, with x i R n, i, j 1, 2,..., N. Multidimensional scaling attempts to reconstruct the distances between the data points in the n-dimensional space by the determination of dissimilarity vectors δ ij for the subvector space. With non-linear MDS the relationships between the distance vectors and

17 4 PROJECTION TECHNIQUES 16 Figure 15: PolyViz (Iris data set) [Grinstein01] dissimilarity vectors are not proportional. In order to determine the dissimilarity vector δ ij we need to apply a monotone transformation D(.), which results in the disparity matrix D ij = D(δ ij ) [Walter02]. An extensively used algorithm of multi-dimensional scaling is the Sammon s Algorithm [Walter02]. Sammon came up with the following equation: N E({x i }) = w ij (d ij D ij ) 2 i=0 j>i This formula describes a minimization problem as a sum over the weighted squares of the differences of distance and disparity vectors. The values of w ij are a means to normalize the cost function, as well as weigh the different disparities; they depend on the normalization technique (local, intermediate or global normalization). In order to calculate the cost or stress - E, Sammon recommends the employment of iterative methods such as the Newton method to recursively calculate the minimum. A different approach to the problem of preserving distance is the hyperbolic multidimensional scaling (H-MDS). Before explaining the basic idea of H-MDS I would like to give an overview of the hyperbolic space. If people think of geometry and spaces they usually assume spherical geometry. This geometry s property is the positive curvature which results in spherical surfaces like the moon. As there is geometry with positive curvature, there also exists geometry with negative curvature, which is called the hyperbolic plane - H2. The hyperbolic plane can be represented by two important equations [Walter02]. The area a and the circumference c of radius r in H2 can be defined by: a(r) = 4πsinh 2 ( r ) and c(r) = 2πsinh(r) 2

18 4 PROJECTION TECHNIQUES 17 Figure 16: Hyperbolic patch embedded into R 3. The circumference and area grow exponentially in the drawn circle. The sum of angles in the triangle is smaller than 180 [Walter]. These two equations hold an amazing feature of the hyperbolic plane. For small values of r sinh 2 ( r ) r2 and sinh(r) r, so that a(r) 2 4 πr2 and c(r) 2πr. This is identical to the functional description of circles in spherical geometry. For larger values of r though both the area and the circumference grow exponentially. In order to imagine this exponential growth you can think of the H2 as a ball of crumpled paper. If you now draw a circle on this wrinkled sheet of paper and unfold it afterwards the area and the circumference of the resulting object will be much larger than it appeared when drawn. Figure 16 shows the embedding of an extract of the H2 into the R 3. There are several approaches towards mapping the hyperbolic plane into the Euclidean surface. I will provide you with a rough overview of the Poincaré model as it is the most widely used method for this task. Basic features of this projection are the display compatibility, which means that the entire H2 space fits into the Poincaré disk (PD), and the infinite size of the circle rim. The first mentioned characteristic inspired M. Escher to create the picture in figure 17. Note that in the drawing the white lines are perpendicular to the rim at any time and represent straight lines in the H2. It also clearly illustrates the fish-eye effect, which is the larger appearance of the images in the middle in comparison to the ones in the outer parts of the circle even though they are all the same size. Coming back to hyperbolic multi-dimensional scaling we now have to adapt the distance vector d ij to the new assumptions. The distance according to Riemann [Walter02, cf. 8] is now given by: d ij = arctanh( x i x j 1 x i x j ), x i, x j P D I will not go into detail how to determine the cost function resulting from this type of distance vector but would like to mention that the non-linearity of the distance vector d ij influences the transformation matrix in H2 more than it did in the spherical geometry [Walter02]. This suggests that the choice of D(.) needs a lot more consideration.

19 4 PROJECTION TECHNIQUES 18 Figure 17: Circle Limit III (1958) 4.2 Self-organizing maps (SOMs) The second method I would like to familiarise you with, is the self-organizing maps, a method of artificial neural networks. Neural networks are adapted from neurobiological models [Oellien03]. In order to reduce dimensionality they use a combination of analytic and graphical techniques to group data while preserving the data structure [Grinstein01]. A neural network consists of several layers that are made up of neurons. The data is presented to the input layer, processed inside the network and then returned from the output layer. During the phase of processing various rules are applied to the given data set, constituting the learning algorithm. Neural networks can further be divided into supervised and unsupervised nets. Selforganizing maps (SOMs) are the most famous example of unsupervised learning. Teuvo Kohonen developed this method; for this reason they are often referred to as Kohonen s SOMs. The term unsupervised corresponds to the fact that only the input data set is presented to the neural network. The algorithm then determines similarities automatically and returns a map as the output of the process. This output map is characterized by numerous clusters, in which similar objects lie close to each other therefore showing the relationship between the input variables. Since there is no need to know anything about the underlying relationships of the data points, this technique is ideal for data sets for which the structure or system is unknown. Figure 18 shows the self-organized map of the Iris data set. In the next chapter I will deal with the need of interacting with the visualized data and will present the most essential techniques.

20 5 INTERACTION 19 Figure 18: Self organizing map of the Iris data set [Grinstein01] 5 Interaction Interaction plays an important role in the understanding of the knowledge discovery process and the data itself. The methods I present cover the most basic but crucial techniques that are necessary to become an active data analyst. The distinction between an active and a passive user is the ability to discuss problems that arise [Thearling02] and by working on the data set (in the visualization mode) find solutions for them. The order in which I present the techniques and the summarization I chose is not the only possible solution but it reflects the normal flow of interactive applications. 5.1 Filtering The interaction technique that does the central procedure of selecting data points is filtering. The use of selection is either the elimination of impossible/false data points or the focus on a particular cluster that needs further consideration. As an example for impossible data points consider the following case: The data set is the number of crimes over the age of the population. If the plot indicates crimes committed by three-years-old children or people that are 150 years old we can assume that these points have been entered incorrectly into the system and should be eliminated. In the category of filtering techniques we distinguish between browsing and querying. Browsing is the direct selection of data points. This is a suitable method for identifying clusters and, for instance, choosing them for further investigation. Querying however offers the opportunity of directly entering specifications the data points are required to meet. This could possibly eradicate the unrealistic data points mentioned in the example above.

21 5 INTERACTION Linking and brushing Other selecting tools are linking and brushing. Even though these procedures can also be used to completely delete data points, this is not their primary goal. Linking refers to the connection between different plots. In multiple views (refer to Geometrical methods/multiple views) the manipulation of data in one plot automatically affects the respective data set in the other (linked) graphs. Highlighting data points is also called brushing. It is often used in connection with linking so the user can observe the effect of highlighting data in one graph on the other views. Consider the following example. The data set is three dimensional with the dimensions product type, territory and number of sales. As a visualization technique we use scatterplots in multiple views. The first scatterplot shows the distribution of the number of sales over the different product types, whereas the second one presents the number of sales over the territories. Highlighting the data points for the first product in the first graph leads to (since the graphs are linked) highlighting the respective data points in the second plot. The highlighted cluster in the linked graph now indicates (e. g.) that the chosen product is mostly sold in a particular territory. This connection between the two attributes might not have been seen if the graphs had not been linked. This is a very simple example but the principle can easily be applied to higher dimensional data sets. It is also not limited to multiple views of the same visualization technique but is also very useful for multiple views of different plot types. 5.3 Zooming Zooming is the method of showing a particular part of the data set in detail. The problem that arises with this way of focussing on portions of the data is that you might lose the big picture. A remedial measure takes the approach of non-linear magnification. Methods that fall within this category are also known as focus+context (F+C) techniques [Sahling03]. This progressive form of zooming tries to expand the selected area while showing the original context. Examples for non-linear magnification are the fish-eye lens or the hyperbolic space. If you view data through a fish-eye lens it seems to be seen with a wide-angle camera lens [Sahling03]. The example in figure 19 shows the tram network in Washington D.C., USA. Due to its properties (see also 4.1 Multi-dimensional scaling (MDS)/HMDS) the use of the hyperbolic space is predestined for zooming interactions. 5.4 Manipulation of data Manipulation of data, mainly input data, or the removal of outliers is a very important part of the whole knowledge discovery process. The observation of the consequences in the output data (or neural network) due to a change in the input data helps to get a feel for the basics. It also allows the study of what if -questions on the particular data set. All these methods cannot be related to a particular location in the visualization or interaction process in KDD but are applicable at any time and therefore indispensable.

22 6 SOFTWARE AND APPLICATIONS 21 Figure 19: Tramnetwork F+C technique [Keahey] 6 Software and Applications Several software packets are available at the moment, most of them are commercial but there are a couple of public-domain software tools as well. I would like to give an overview of two packages as examples. XGobi is a data visualization tool that has been developed by Deborah F. Swayne, Di Cook and Andreas Buja. It handles multivariate data presentation using scatterplots. This software also offers projection techniques to reduce the dimensionality of the data sets and also supports the basic interaction methods like brushing and zooming. A very important feature of this tool is the handling of missing values. An extension and further development of XGobi is GGobi. It provides the user with a new and clearer interface and supports new technology or software such as XML and database systems. Xmdv is a public-domain software. The main focus of this tool is the interaction of user and machine. It handles multidimensional data sets by applying the visualization tools scatterplots, star glyphs, parallel coordinates and dimensional stacking (all presented in this paper). Since this software concentrates on interaction it supports the methods I dealt with in the last chapter. Furthermore it offers the opportunity to mask dimensions in order to study their impact on the clustering in the data set. A unique feature of this tool is its use of clustering data sets to visualize them without overloading the screen and therefore the user. This is a good aspect regarding clarity but it excludes the user from the process. Both of the above mentioned software packages are available for many industry branches. There often exist certain modules that adapt the general idea of the tool to those different areas of interest. Xmdv for instance offers modules for fields such as finance and geochemistry (cf. Xmdv). A very interesting application for such tools is investigative visualization in the medical field. On the one hand graphical representations of statistics such as the number of breast cancer patients for the last twenty years are essential to reveal trends. On the other hand, especially useful for research, the graphs can be used to

23 REFERENCES 22 do studies on causes of diseases. The occurrence of certain illnesses such as breast cancer depends on various variables. In order to find out which combination of these variables leads to the actual disease a graphical representation is crucial. I would like to conclude my paper with a glance at future research. 7 Outlook and Conclusion Even though there has been a large amount of research on the human-machine integration it is still an area which needs further improvement and development to more effectively and efficiently use the power that results of this symbiosis. Keim [Keim02] is of the opinion that not only computers in general need to be integrated with the human. He states that the two components, namely visualization techniques and the well-known methods applied in different areas such as statistics and operations research for instance have to be brought together more extensively. From this point of view research on the human perceptibility should also be increased [Keahey99]. Especially methods that do not reduce dimensionality can overload the user and therefore have opposite effects as the user might not be able to see the overall picture. Another question that arises [Brodbeck97] is whether to mainly present data in the 2D space or to use three-dimensional visualization techniques or perhaps even animations. Animation is a very functional visualization technique. It is widely applicable to many kinds of data sets as the last dimension allows the visualization of a flow in a variable, usually time. An advantage of 3D is that the human world is three dimensional and therefore this visualization seems to be more natural to the human user than any of the others. The problem though is again the overload 3D data produces. Humans do still have to break down the visualized data into two dimensions and compare it in their minds. Intensive interaction could be helpful, such as rotation of the data set as well as the already presented tools. Until now 2D representation is the most extensively used form of visualization. Hyperbolic planes are also an area that visualization tools should employ further. The advantages of its infinite representation space and focus+context capabilities are very important features that will be greatly needed in the future. The hindrance could be the fact that hyperbolic planes are geometries we are not familiar with. This might incur the users displeasure but I think that professional education and information about this view of the world will improve the chance of a smooth introduction of hyperbolic planes as the base of data representation. In order to conclude this paper I can say that the existing techniques use very distinct approaches to the problems. Each offers a selection opportunity, since, as mentioned before, different data types need diverse graphical representations. As I pointed out above there is still a lot of research that needs to be done but I think the requirement has been identified and we can therefore look forward to a large amount of new and innovative techniques for the visualization of data and information in the future. References [Brodbeck97] Brodbeck, D. et al.: Domesticating Bead: Adapting an Information Visualization System to a Financial Institution.

24 REFERENCES 23 [Docherty02] [Fayyad02] matthew/papers/infovis97.pdf, 1997, printed Docherty, P., Beck, A. (edt): A Visual Metaphor for Knowledge Discovery. In: Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, San Francisco (CA) 2002 Fayyad, U., Grinstein, G. (editors): Introduction. In: Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, San Francisco (CA) 2002 [Grinstein01] Grinstein, G., Trutschi, M., Cvek, U.: High-Dimensional Visualizations. mtrutsch/research/high- Dimensional Visualizations-KDD2001-color.pdf, 2001, printed [Grinstein02a] [Grinstein02b] [Hoffmann99] [Hoffmann02] Grinstein, G., Hoffmann, P., Pickett, R. (edt): Benchmark Development for the Evaluation of Visualization for Data Mining. In: Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, San Francisco (CA) 2002 Grinstein, G., Ward, M. (edt): Introduction to Data Visualization. In: Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, San Francisco (CA) 2002 Hoffmann, P. E.: Table Visualizations: A Formal Model and Its Applications, peh2.hoffman/tablevizx.pdf, 1999, printed Hoffmann, P., Grinstein, G. (edt): A Survey of Visualizations for High- Dimensional Data Mining. In: Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, San Francisco (CA) 2002 [Kaidi00] Kaidi, Z.: Data visualization. kzhao/papers/ 00 course Data visualization.pdf, 2000, printed [Keahey99] Keahey, T.A.: Visualization of High-Dimensional Clusters Using Nonlinear Magnification spie.pdf, 1999, printed [Keahey] Keahey, T.A.: A Brief Tour of Nonlinear Magnification. tkeahey/research/nlm/nlmtour.html [Keim02] Keim, D. A.: Information Visualization and Visual Data Mining. arnumber=981847&isnumber=21152, 2002, printed [Kontkanen99] Kontkanen, P. et al.: Supervised model-based visualization of high- dimensional data , printed [Koua03] Koua, E.L: Using Self-Organizing Maps for Information Visualization and Knowledge Discovery in Complex Geospatial Datasets /art proc/koua.pdf, 2003, printed

25 REFERENCES 24 [Rhodes02] [Mihalisin02] Rhodes, P. (edt): Discovering New Relationships. In: Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, San Francisco (CA) 2002 Mihalisin, T. (edt): Data Warfare and Multidimensional Education. In: Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, San Francisco (CA) 2002 [Oellien03] Oellien, F.: Data Mining und Datenvisualisierung (ch 5 of Algorithmen und Applikationen zur interaktiven Visualisierung und Analyse chemiespezifischer Datensätze). Oellien/diss/index.html, 2003, printed [Sahling03] Sahling, G. N.: Interactive 3D Scatterplots From High Dimensional Data to Insight. NSahling/masterthesis.html, 2003 [Thearling02] [Walter] Thearling, K. et al. (edt): Visualizing Data Mining Models. In: Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, San Francisco (CA) 2002 Walter, J. A.: Interactive Visualization and Navigation using the Hyperbolic Space. walter/h2vis/ [Walter02] Walter, J. A., Ritter, H.: On Interactive Visualization of High-dimensional Data using the Hyperbolic Plane. walter/pub/walter02-kdd.pdf, 2002, printed [Ward99] Ward, M. O.: A Taxonomy of Glyph Placement Strategies for Multidimensional Data Visualization. matt/courses/glyphs/, 1999 [Wezel03] Wezel, M. C. van, Kosters, W.A.: Nonmetric multidimensional scaling: Neural networks versus traditional techniques. kosters/ida00191.pdf, 2003, printed

Graphical Representation of Multivariate Data

Graphical Representation of Multivariate Data Graphical Representation of Multivariate Data One difficulty with multivariate data is their visualization, in particular when p > 3. At the very least, we can construct pairwise scatter plots of variables.

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data

More information

The Value of Visualization 2

The Value of Visualization 2 The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21 Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate

More information

Information Visualization Multivariate Data Visualization Krešimir Matković

Information Visualization Multivariate Data Visualization Krešimir Matković Information Visualization Multivariate Data Visualization Krešimir Matković Vienna University of Technology, VRVis Research Center, Vienna Multivariable >3D Data Tables have so many variables that orthogonal

More information

Visualization Techniques in Data Mining

Visualization Techniques in Data Mining Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

Common Core Unit Summary Grades 6 to 8

Common Core Unit Summary Grades 6 to 8 Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, 5-8 8-4, 8-7 1-6, 4-9

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, 5-8 8-4, 8-7 1-6, 4-9 Glencoe correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 STANDARDS 6-8 Number and Operations (NO) Standard I. Understand numbers, ways of representing numbers, relationships among numbers,

More information

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3 COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping

More information

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).

More information

High-dimensional labeled data analysis with Gabriel graphs

High-dimensional labeled data analysis with Gabriel graphs High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

Data Mining and Visualization

Data Mining and Visualization Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research

More information

INTERACTIVE DATA EXPLORATION USING MDS MAPPING

INTERACTIVE DATA EXPLORATION USING MDS MAPPING INTERACTIVE DATA EXPLORATION USING MDS MAPPING Antoine Naud and Włodzisław Duch 1 Department of Computer Methods Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract: Interactive

More information

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions What is Visualization? Information Visualization An Overview Jonathan I. Maletic, Ph.D. Computer Science Kent State University Visualize/Visualization: To form a mental image or vision of [some

More information

Visualization of Multivariate Data. Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University

Visualization of Multivariate Data. Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University Visualization of Multivariate Data Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University Introduction Multivariate (Multidimensional) Visualization Visualization

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills VISUALIZING HIERARCHICAL DATA Graham Wills SPSS Inc., http://willsfamily.org/gwills SYNONYMS Hierarchical Graph Layout, Visualizing Trees, Tree Drawing, Information Visualization on Hierarchies; Hierarchical

More information

Visualization of Breast Cancer Data by SOM Component Planes

Visualization of Breast Cancer Data by SOM Component Planes International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian

More information

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express

More information

Interactive Data Mining and Visualization

Interactive Data Mining and Visualization Interactive Data Mining and Visualization Zhitao Qiu Abstract: Interactive analysis introduces dynamic changes in Visualization. On another hand, advanced visualization can provide different perspectives

More information

Chapter 3 - Multidimensional Information Visualization II

Chapter 3 - Multidimensional Information Visualization II Chapter 3 - Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data Vorlesung Informationsvisualisierung Prof. Dr. Florian Alt, WS 2013/14 Konzept und Folien

More information

11.1. Objectives. Component Form of a Vector. Component Form of a Vector. Component Form of a Vector. Vectors and the Geometry of Space

11.1. Objectives. Component Form of a Vector. Component Form of a Vector. Component Form of a Vector. Vectors and the Geometry of Space 11 Vectors and the Geometry of Space 11.1 Vectors in the Plane Copyright Cengage Learning. All rights reserved. Copyright Cengage Learning. All rights reserved. 2 Objectives! Write the component form of

More information

Topic Maps Visualization

Topic Maps Visualization Topic Maps Visualization Bénédicte Le Grand, Laboratoire d'informatique de Paris 6 Introduction Topic maps provide a bridge between the domains of knowledge representation and information management. Topics

More information

Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz

Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz Luigi Di Caro 1, Vanessa Frias-Martinez 2, and Enrique Frias-Martinez 2 1 Department of Computer Science, Universita di Torino,

More information

Multi-Dimensional Data Visualization. Slides courtesy of Chris North

Multi-Dimensional Data Visualization. Slides courtesy of Chris North Multi-Dimensional Data Visualization Slides courtesy of Chris North What is the Cleveland s ranking for quantitative data among the visual variables: Angle, area, length, position, color Where are we?!

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

Connecting Segments for Visual Data Exploration and Interactive Mining of Decision Rules

Connecting Segments for Visual Data Exploration and Interactive Mining of Decision Rules Journal of Universal Computer Science, vol. 11, no. 11(2005), 1835-1848 submitted: 1/9/05, accepted: 1/10/05, appeared: 28/11/05 J.UCS Connecting Segments for Visual Data Exploration and Interactive Mining

More information

GeoGebra. 10 lessons. Gerrit Stols

GeoGebra. 10 lessons. Gerrit Stols GeoGebra in 10 lessons Gerrit Stols Acknowledgements GeoGebra is dynamic mathematics open source (free) software for learning and teaching mathematics in schools. It was developed by Markus Hohenwarter

More information

Hierarchical Data Visualization

Hierarchical Data Visualization Hierarchical Data Visualization 1 Hierarchical Data Hierarchical data emphasize the subordinate or membership relations between data items. Organizational Chart Classifications / Taxonomies (Species and

More information

Icon and Geometric Data Visualization with a Self-Organizing Map Grid

Icon and Geometric Data Visualization with a Self-Organizing Map Grid Icon and Geometric Data Visualization with a Self-Organizing Map Grid Alessandra Marli M. Morais 1, Marcos Gonçalves Quiles 2, and Rafael D. C. Santos 1 1 National Institute for Space Research Av dos Astronautas.

More information

Information Visualization. Ronald Peikert SciVis 2007 - Information Visualization 10-1

Information Visualization. Ronald Peikert SciVis 2007 - Information Visualization 10-1 Information Visualization Ronald Peikert SciVis 2007 - Information Visualization 10-1 Overview Techniques for high-dimensional data scatter plots, PCA parallel coordinates link + brush pixel-oriented techniques

More information

NEW MEXICO Grade 6 MATHEMATICS STANDARDS

NEW MEXICO Grade 6 MATHEMATICS STANDARDS PROCESS STANDARDS To help New Mexico students achieve the Content Standards enumerated below, teachers are encouraged to base instruction on the following Process Standards: Problem Solving Build new mathematical

More information

EVERY DAY COUNTS CALENDAR MATH 2005 correlated to

EVERY DAY COUNTS CALENDAR MATH 2005 correlated to EVERY DAY COUNTS CALENDAR MATH 2005 correlated to Illinois Mathematics Assessment Framework Grades 3-5 E D U C A T I O N G R O U P A Houghton Mifflin Company YOUR ILLINOIS GREAT SOURCE REPRESENTATIVES:

More information

Integration of Cluster Analysis and Visualization Techniques for Visual Data Analysis

Integration of Cluster Analysis and Visualization Techniques for Visual Data Analysis Integration of Cluster Analysis and Visualization Techniques for Visual Data Analysis M. Kreuseler, T. Nocke, H. Schumann, Institute of Computer Graphics University of Rostock, D-18059 Rostock, Germany

More information

Pennsylvania System of School Assessment

Pennsylvania System of School Assessment Pennsylvania System of School Assessment The Assessment Anchors, as defined by the Eligible Content, are organized into cohesive blueprints, each structured with a common labeling system that can be read

More information

The Benefits of Statistical Visualization in an Immersive Environment

The Benefits of Statistical Visualization in an Immersive Environment The Benefits of Statistical Visualization in an Immersive Environment Laura Arns 1, Dianne Cook 2, Carolina Cruz-Neira 1 1 Iowa Center for Emerging Manufacturing Technology Iowa State University, Ames

More information

Performance Level Descriptors Grade 6 Mathematics

Performance Level Descriptors Grade 6 Mathematics Performance Level Descriptors Grade 6 Mathematics Multiplying and Dividing with Fractions 6.NS.1-2 Grade 6 Math : Sub-Claim A The student solves problems involving the Major Content for grade/course with

More information

Utilizing spatial information systems for non-spatial-data analysis

Utilizing spatial information systems for non-spatial-data analysis Jointly published by Akadémiai Kiadó, Budapest Scientometrics, and Kluwer Academic Publishers, Dordrecht Vol. 51, No. 3 (2001) 563 571 Utilizing spatial information systems for non-spatial-data analysis

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

The Australian Curriculum Mathematics

The Australian Curriculum Mathematics The Australian Curriculum Mathematics Mathematics ACARA The Australian Curriculum Number Algebra Number place value Fractions decimals Real numbers Foundation Year Year 1 Year 2 Year 3 Year 4 Year 5 Year

More information

Data Visualization - A Very Rough Guide

Data Visualization - A Very Rough Guide Data Visualization - A Very Rough Guide Ken Brodlie University of Leeds 1 What is This Thing Called Visualization? Visualization Use of computersupported, interactive, visual representations of data to

More information

Illinois State Standards Alignments Grades Three through Eleven

Illinois State Standards Alignments Grades Three through Eleven Illinois State Standards Alignments Grades Three through Eleven Trademark of Renaissance Learning, Inc., and its subsidiaries, registered, common law, or pending registration in the United States and other

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Solving Simultaneous Equations and Matrices

Solving Simultaneous Equations and Matrices Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering

More information

Data Visualization. or Graphical Data Presentation. Jerzy Stefanowski Instytut Informatyki

Data Visualization. or Graphical Data Presentation. Jerzy Stefanowski Instytut Informatyki Data Visualization or Graphical Data Presentation Jerzy Stefanowski Instytut Informatyki Data mining for SE -- 2013 Ack. Inspirations are coming from: G.Piatetsky Schapiro lectures on KDD J.Han on Data

More information

USING SELF-ORGANISING MAPS FOR ANOMALOUS BEHAVIOUR DETECTION IN A COMPUTER FORENSIC INVESTIGATION

USING SELF-ORGANISING MAPS FOR ANOMALOUS BEHAVIOUR DETECTION IN A COMPUTER FORENSIC INVESTIGATION USING SELF-ORGANISING MAPS FOR ANOMALOUS BEHAVIOUR DETECTION IN A COMPUTER FORENSIC INVESTIGATION B.K.L. Fei, J.H.P. Eloff, M.S. Olivier, H.M. Tillwick and H.S. Venter Information and Computer Security

More information

Optical Illusions Essay Angela Wall EMAT 6690

Optical Illusions Essay Angela Wall EMAT 6690 Optical Illusions Essay Angela Wall EMAT 6690! Optical illusions are images that are visually perceived differently than how they actually appear in reality. These images can be very entertaining, but

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns 20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns John Aogon and Patrick J. Ogao Telecommunications operators in developing countries are faced with a problem of knowing

More information

Number Sense and Operations

Number Sense and Operations Number Sense and Operations representing as they: 6.N.1 6.N.2 6.N.3 6.N.4 6.N.5 6.N.6 6.N.7 6.N.8 6.N.9 6.N.10 6.N.11 6.N.12 6.N.13. 6.N.14 6.N.15 Demonstrate an understanding of positive integer exponents

More information

Figure 1.1 Vector A and Vector F

Figure 1.1 Vector A and Vector F CHAPTER I VECTOR QUANTITIES Quantities are anything which can be measured, and stated with number. Quantities in physics are divided into two types; scalar and vector quantities. Scalar quantities have

More information

Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009 Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level

More information

Prentice Hall Mathematics Courses 1-3 Common Core Edition 2013

Prentice Hall Mathematics Courses 1-3 Common Core Edition 2013 A Correlation of Prentice Hall Mathematics Courses 1-3 Common Core Edition 2013 to the Topics & Lessons of Pearson A Correlation of Courses 1, 2 and 3, Common Core Introduction This document demonstrates

More information

Visual decisions in the analysis of customers online shopping behavior

Visual decisions in the analysis of customers online shopping behavior Nonlinear Analysis: Modelling and Control, 2012, Vol. 17, No. 3, 355 368 355 Visual decisions in the analysis of customers online shopping behavior Julija Pragarauskaitė, Gintautas Dzemyda Institute of

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL

More information

HDDVis: An Interactive Tool for High Dimensional Data Visualization

HDDVis: An Interactive Tool for High Dimensional Data Visualization HDDVis: An Interactive Tool for High Dimensional Data Visualization Mingyue Tan Department of Computer Science University of British Columbia mtan@cs.ubc.ca ABSTRACT Current high dimensional data visualization

More information

Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress

Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress Biggar High School Mathematics Department National 5 Learning Intentions & Success Criteria: Assessing My Progress Expressions & Formulae Topic Learning Intention Success Criteria I understand this Approximation

More information

an introduction to VISUALIZING DATA by joel laumans

an introduction to VISUALIZING DATA by joel laumans an introduction to VISUALIZING DATA by joel laumans an introduction to VISUALIZING DATA iii AN INTRODUCTION TO VISUALIZING DATA by Joel Laumans Table of Contents 1 Introduction 1 Definition Purpose 2 Data

More information

38 August 2001/Vol. 44, No. 8 COMMUNICATIONS OF THE ACM

38 August 2001/Vol. 44, No. 8 COMMUNICATIONS OF THE ACM This cluster visualization shows an intermediatelevel view of a five-dimensional, 16,000-record remote-sensing data set. Lines indicate cluster centers and bands indicate the extent of the clusters in

More information

Objectives After completing this section, you should be able to:

Objectives After completing this section, you should be able to: Chapter 5 Section 1 Lesson Angle Measure Objectives After completing this section, you should be able to: Use the most common conventions to position and measure angles on the plane. Demonstrate an understanding

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. MATHEMATICS: THE LEVEL DESCRIPTIONS In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. Attainment target

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free)

How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free) Statgraphics Centurion XVII (currently in beta test) is a major upgrade to Statpoint's flagship data analysis and visualization product. It contains 32 new statistical procedures and significant upgrades

More information

Vectors 2. The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996.

Vectors 2. The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996. Vectors 2 The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996. Launch Mathematica. Type

More information

12-1 Representations of Three-Dimensional Figures

12-1 Representations of Three-Dimensional Figures Connect the dots on the isometric dot paper to represent the edges of the solid. Shade the tops of 12-1 Representations of Three-Dimensional Figures Use isometric dot paper to sketch each prism. 1. triangular

More information

Visual Data Mining with Pixel-oriented Visualization Techniques

Visual Data Mining with Pixel-oriented Visualization Techniques Visual Data Mining with Pixel-oriented Visualization Techniques Mihael Ankerst The Boeing Company P.O. Box 3707 MC 7L-70, Seattle, WA 98124 mihael.ankerst@boeing.com Abstract Pixel-oriented visualization

More information

Spreadsheet software for linear regression analysis

Spreadsheet software for linear regression analysis Spreadsheet software for linear regression analysis Robert Nau Fuqua School of Business, Duke University Copies of these slides together with individual Excel files that demonstrate each program are available

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

Structural Axial, Shear and Bending Moments

Structural Axial, Shear and Bending Moments Structural Axial, Shear and Bending Moments Positive Internal Forces Acting Recall from mechanics of materials that the internal forces P (generic axial), V (shear) and M (moment) represent resultants

More information

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015 Principles of Data Visualization for Exploratory Data Analysis Renee M. P. Teate SYS 6023 Cognitive Systems Engineering April 28, 2015 Introduction Exploratory Data Analysis (EDA) is the phase of analysis

More information

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08 9. Text & Documents Visualizing and Searching Documents Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08 Slide 1 / 37 Outline Characteristics of text data Detecting patterns SeeSoft

More information

Visual Data Exploration Techniques for System Administration. Tam Weng Seng

Visual Data Exploration Techniques for System Administration. Tam Weng Seng Visual Data Exploration Techniques for System Administration Tam Weng Seng Abstract The objective of this paper is to study terminology used in visual data exploration and to apply them to projects in

More information

Scope and Sequence KA KB 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B

Scope and Sequence KA KB 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B Scope and Sequence Earlybird Kindergarten, Standards Edition Primary Mathematics, Standards Edition Copyright 2008 [SingaporeMath.com Inc.] The check mark indicates where the topic is first introduced

More information

Common Tools for Displaying and Communicating Data for Process Improvement

Common Tools for Displaying and Communicating Data for Process Improvement Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot

More information

For example, estimate the population of the United States as 3 times 10⁸ and the

For example, estimate the population of the United States as 3 times 10⁸ and the CCSS: Mathematics The Number System CCSS: Grade 8 8.NS.A. Know that there are numbers that are not rational, and approximate them by rational numbers. 8.NS.A.1. Understand informally that every number

More information

High Dimensional Data Visualization

High Dimensional Data Visualization High Dimensional Data Visualization Sándor Kromesch, Sándor Juhász Department of Automation and Applied Informatics, Budapest University of Technology and Economics, Budapest, Hungary Tel.: +36-1-463-3969;

More information

A Survey, Taxonomy, and Analysis of Network Security Visualization Techniques

A Survey, Taxonomy, and Analysis of Network Security Visualization Techniques Georgia State University ScholarWorks @ Georgia State University Computer Science Theses Department of Computer Science 1-12-2006 A Survey, Taxonomy, and Analysis of Network Security Visualization Techniques

More information

FOREWORD. Executive Secretary

FOREWORD. Executive Secretary FOREWORD The Botswana Examinations Council is pleased to authorise the publication of the revised assessment procedures for the Junior Certificate Examination programme. According to the Revised National

More information

Standards and progression point examples

Standards and progression point examples Mathematics Progressing towards Foundation Progression Point 0.5 At 0.5, a student progressing towards the standard at Foundation may, for example: connect number names and numerals with sets of up to

More information

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error

More information

Interactive Exploration of Decision Tree Results

Interactive Exploration of Decision Tree Results Interactive Exploration of Decision Tree Results 1 IRISA Campus de Beaulieu F35042 Rennes Cedex, France (email: pnguyenk,amorin@irisa.fr) 2 INRIA Futurs L.R.I., University Paris-Sud F91405 ORSAY Cedex,

More information

Everyday Mathematics CCSS EDITION CCSS EDITION. Content Strand: Number and Numeration

Everyday Mathematics CCSS EDITION CCSS EDITION. Content Strand: Number and Numeration CCSS EDITION Overview of -6 Grade-Level Goals CCSS EDITION Content Strand: Number and Numeration Program Goal: Understand the Meanings, Uses, and Representations of Numbers Content Thread: Rote Counting

More information

CAMI Education linked to CAPS: Mathematics

CAMI Education linked to CAPS: Mathematics - 1 - TOPIC 1.1 Whole numbers _CAPS curriculum TERM 1 CONTENT Mental calculations Revise: Multiplication of whole numbers to at least 12 12 Ordering and comparing whole numbers Revise prime numbers to

More information

Visual Data Mining : the case of VITAMIN System and other software

Visual Data Mining : the case of VITAMIN System and other software Visual Data Mining : the case of VITAMIN System and other software Alain MORINEAU a.morineau@noos.fr Data mining is an extension of Exploratory Data Analysis in the sense that both approaches have the

More information

Visualization Quick Guide

Visualization Quick Guide Visualization Quick Guide A best practice guide to help you find the right visualization for your data WHAT IS DOMO? Domo is a new form of business intelligence (BI) unlike anything before an executive

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Such As Statements, Kindergarten Grade 8

Such As Statements, Kindergarten Grade 8 Such As Statements, Kindergarten Grade 8 This document contains the such as statements that were included in the review committees final recommendations for revisions to the mathematics Texas Essential

More information