Tutorial on Geographic and Spatial Data Mining

Size: px
Start display at page:

Download "Tutorial on Geographic and Spatial Data Mining"

Transcription

1 Tutorial on Geographic and Spatial Data Mining 5th Italian Symposium on Advanced Database Systems - SEBD 7 Torre Canne, Italy June 7th Fraunhofer Society Joseph von Fraunhofer, German physicist and entrepreneur Fraunhofer mission: - do state-of-the-art research and use it in challenging customer projects - Funding is 33% research grants, 33% customer projects, 33% institutional funding 57 institutes, 4 locations, 2. employees, bill. annual volume Best-known invention: MP3 2

2 Fraunhofer IAIS: Intelligent Analysis- and Information Systems From sensor data to business intelligence, from media analysis to visual information systems: Our research allows companies to do more with data New name, long-standing experience - Founded in 26 as a merger of the Fraunhofer institutes AIS and IMK 23 people: scientists, project engineers, technical and administrative staff Located on Fraunhofer Campus Schloss Birlinghoven/Bonn Joint research groups and cooperation with Univ. Bonn 3 Fraunhofer IAIS: research and projects Core research areas: Machine learning and adaptive systems Data Mining and Business Intelligence Automated media analysis Interactive access and exploration Autonomous systems 4 2

3 Objectives Although it is about statistical concepts, algorithms and data structures, the tutorial has a practical, application oriented focus Integration of various technologies and algorithms. How do they combine? Covers a broad range I do not assume familiarity with spatial concepts, but some basic familiarity with data mining approaches Three Objectives: - to stimulate research on spatial data mining related issues - to stimulate development of more efficient spatial databases tailored for data mining applications - to stimulate real-world applications 5 A main message Spatial Data Mining is not an esoteric research topic; it is practically and commercially very important and sometimes business critical field! Later I give an example where the value of several dozens of companies directly depends on the predictions given by our spatial data mining algorithms. 6 3

4 Spatial vs. Geographic Data Mining Geographic Data is data related to the earth Spatial Data Mining deals with physical space in general, from molecular to astronomical level Geographic Data Mining is a subset of Spatial Data Mining Allmost all geographic data mining algorithms can work in a general spatial setting (with the same dimensionality) This tutorial focuses on geographic data in 2D, but most algorithms work on spatial data in general I do not talk about specificties of molecular data, face detection, etc. 7 Agenda Introduction Spatial and Geographic Data Mining Part I: Basic Concepts Spatial Databases and GIS Spatial Data Types Spatial Queries Construction of Complex Features Part II: Exploratory Analysis of Spatial Data Part III: Spatial and Geographic Data Mining Methods Autocorrelation Mining Point Data Clustering, Kriging Mining Points, Lines Areas Clustering, Subgroup Discovery, Association Rules Mining Networks A practical case study Mining Tracks in Space and Time Mining from GPS-Data Challenges Summary 8 4

5 Introduction Spatial Data Mining p n ( p) ( p p ) 9 A classical example of spatial analysis Disease cluster Dr. John Snow Investigating causes of a cholera epidemia London, September 854 Infected water pump? A good representation is often the key to solving a problem 5

6 Good representation because... Represents spatial relation of objects of the same type Represents spatial relation of objects to other objects Shows only relevant aspects and hides irrelevant It is not only important where a cluster is but also, what else is there (e.g. a water-pump)! Goals of Spatial Data Mining Identifying spatial patterns Identifying spatial objects that are potential generators of patterns Identifying information relevant for explaining the spatial pattern (and hiding irrelevant information) Presenting the information in a way that is intuitive to the analyst and supports further analysis 2 6

7 Spatial Data Mining Data Mining p n ( p) ( p p ) + Geographic Information Systems = Spatial Mining 3 Basic Concepts Spatial Databases and GIS p n ( p) ( p p ) 4 7

8 Public Sector Are there clusters of a certain disease? Is there a relationship between poverty and death rate? Are there crime hot spots or patterns? Commercial Where to build a new supermarket? Where are the customers that want to buy new product X? How many cars pass the main road per hour? Does it pay to install new antennas? What percentage of young females sees a billboard 5 located in Ripley avenue? Buildings Streets Schools Hospitals Rivers Factory Attribute Data Person p. Household No. of Cars Long-term illness Age Profession Ethnic group Unemployment Education Migrants Medical establishment 6 Shopping areas... 8

9 Elements of a spatial database Spatial Query Language SELECT c.holding_company, c.location FROM competitor c, bank b WHERE b.site_id = 64 AND SDO_WITHIN_DISTANCE(c.location, b.location, 'distance=2 unit=mile') = 'TRUE' Spatial Operators Spatial Data Types Metadata INSIDE Spatial Indexes 7 Examples from Oracle Spatial Spatial Datatypes p n ( p) ( p p ) 8 9

10 Two basic types of representation: Fields and Discrete Objects Fields: Raster Data Line Discrete Objects: Vector Data Model Area 9 Vector Data: Data Structure Ordered sets of xy-coordinates defining points, lines, or polygons 3D or 4D also possible Straight lines between points Draw line from last to first coordinate Easy to scale (linear transformation) Data Structure Point Line (Polyline) Area (Polygon) (5,) ((5,),(9,6),(2,7)) ((5,),(9,6),(2,7), ) Storage efficient Relationships between objects (e.g. overlap) are not explicitly represented Aka Spaghetti Model 2

11 Two Main Types of Vector Data - non regular tesselations closed polylines that partition the space - discrete isolated objects: point, line, area Point Line Area (Polygon) Tesselations very useful for aggregation of discrete objects and for feature extraction 2 UK, Greater Manchester, Stockport Buildings ID Geometry Address (,),(2,2), Gladstone Street 5 2 (3,3),(4,4), Islington Road 2 Geometry Address Type 3 (5,5),(6,6), Ripley Avenue 23 Type 2 Factory ID GeometryName Ty pe (,), Gladstone Street 5 2 (3,3), Islington Road (5,5), Ripley Avenue 23 Hospitals Geometry Address Schools Streets ID GeometryName Ty pe (,), Gladstone Street 5 2 (3,3), Islington Road (5,5), Ripley Avenue 23 ID GeometryName Ty pe (,), Gladstone Street 5 2 (3,3), Islington Road (5,5), Ripley Avenue 23 Phone #Beds ID 2 Geometry (,),(2,2), (3,3),(4,4), Address Stepping Hill Great Moore Phone Rivers ID GeometryName Ty pe (,), Gladstone Street 5 2 (3,3), Islington Road (5,5), Ripley Avenue Description of objects are organized in relations (database tables) Each row in a table describes one object Different categories of objects are organized in separate relations each having its own set of attributes.

12 Hierarchy Often data are organized in spatial hierarchies, e.g. Country State Zip Area Voting District District Parcel County District 2 UK census data District n Hierarchies may overlap Ward Ward 2 Ward n Ward Ward Ward 23 Representation of data in a spatial database A set of relations R,...,R n such that each relation R i has a geometry attribute G i or an identifier A i such that R i can be linked (joined) to a relation R k having a geometry attribute G k - Geometry attributes G i consist of ordered sets of x,y-pairs defining points, lines, or polygons - Different types of spatial objects are organized in different relations R i (geographic layers), e.g. streets, rivers, enumeration districts, buildings, and - each layer can have its own set of attributes A,..., A n and at most one geometry attribute G 24 2

13 Representation of data in a spatial database A set of relations R,...,R n such that each relation R i has a geometry attribute G i or an identifier A i such that R i can be linked (joined) to a relation R k having a geometry attribute G k - Geometry attributes G i consist of ordered sets of x,y-pairs defining points, lines, or polygons - Different types of spatial objects are organized in different relations R i (geographic layers), e.g. streets, rivers, enumeration districts, buildings, and Does not fit well to - each layer can have its own set of attributes A,..., A n and standard at most data mining one geometry attribute G approaches! This is where the specific research challenge for geographical data mining comes from! 25 Raster Data How to represent phenomena conceived as fields? Divide the world into square cells No variation within cells Cell value may be average, max, min, sum,central point, Represent discrete objects as collections of one or more cells Represent fields by assigning attribute values to cells Legend Raster representation. Each color represents a different value of a nominalscale field Mixed conifer Douglas fir Oak savannah Grassland Longley et al (2) 26 3

14 Raster and Vector: Comparison Legend Mixed conifer Douglas fir Oak savannah Grassland Raster Modell Advantages: Simple data structure Simple logical and algebraic structures Disadvantages: Large data volumes imprecise geometry expensive transformations of coordinates implicit coordinates Vector Model Advantages: Specify geometry by coordinates Topological relationships High geometric accuracy Storage efficient Disadvantages: Complex data structure Compute intensive logical and algebraic operations Remember: Raster is vaster and vector is correcter 27 Spatial Queries p n ( p) ( p p ) 28 4

15 Spatial Queries Problem: Vector data model does not explicitly capture relationships among objects. They have to be inferred using spatial predicates Spatial predicates evaluate to true or false for given objects A query returns the set of objects of which the statement is true; or using aggregates the [minimum,maximum,sum,average, ], object(s) of which the statement is true Queries are evaluated using a spatial join among different relations (layers) Here s where database technology and spatial indexing comes in to do the job efficiently! Still, they can be extremely time consuming! 29 Spatial Predicates: Egenhofer s 9-intersection model Each object has interior (i), exterior (e) and boundary (b) This results in a 9-intersection matrix for the relation between two spatial objects A and B A cell contains a iff the intersection of point sets is non-empty A meets B A overlaps B A contains B A B b i e b i e b i e b b b A i i i e e e B 3 5

16 Spatial Predicates 9-intersection model for 2 regions (Egenhofer 99) A disjoint B, B disjoint A A meets B, B meets A A overlaps B, B overlaps A A equals B, B equals A A covers B, B covered by A A covered-by B, B covers A A contains B, B inside A A inside B, B contains A INSIDE 3 Spatial Queries: Distance Metric spaces: Symmetry: d(i,j) = d(j,i) triangle inequality: d(i,k) d(i,j)+ d(j,k) i j k - Euclidian Distance: d e (i,j) = 2 2 ( x i x j ) + ( yi + y j ) Distance relation between polygons: Minimum distance between any 2 points of the polygons 32 6

17 Spatial Queries: Distance and Proximity Selects nearest neighbor in space Select all object within a certain distance Example: Oracle Spatial Select all competitors and locations within 2 miles distance from bank with id 64 SELECT c.holding_company, c.location FROM competitor c, bank b WHERE b.site_id = 64 AND SDO_WITHIN_DISTANCE(c.location, b.location, 'distance=2 unit=mile') = 'TRUE' X Distance Hospital # Main Street Hospital #2 33 Distance non-metric non metric spaces Asymmetry: d(i,j) d(j,i) triangle inequality does not hold drive time driving distance costs 34 7

18 Stockport Database Schema River Water spatially interacts Shopping Region Spatial Join spatially interacts spatially interacts ED Standard Join =zone_id =zone_id TAB... TAB6 Attribute data 95 tables with census data, ~8 attributes Spatial Hierarchy Geographical Layers Building inside Street spatially interact spatially interact Vegetation =zone_id... TAB95 County District Wards Enumeration district 85 tables Relations between objects 35 implicit; very flexible and storage efficient, but compute intensive Implementation of Spatial Databases Many popular databases have spatial extensions by now: Oracle Spatial PostgreSQL MySQL (since 4.) 36 8

19 Construction of Complex Features p n ( p) ( p p ) 37 Spatial Functions Example: Oracle Spatial g Return a geometry - Union - Difference - Intersect Constructs new geometry objects from existing ones using point set theory Original Union - XOR - Buffer - CenterPoint - ConvexHull Return a number - Length Efficient implementation using computational geometry Difference XOR Intersect - Area - Distance

20 Constructing Cells: Buffer How many competitors are in the catchment area of my shop? = How many shops are within the buffer? Simplistic approximation Does not take account of barriers (rivers, highways) Does not take into account road system 39 Voronoi diagramm Which are my nearest competitors? What is the cover of my radio antenna? Decompose space into regions around each point in a set of points S such that all the points in the region around p i are closer to p i than to any other point in S Complexity: O( n lg n) Related data structure: Delaunay triangulation (graph of Voronoi neighbors) = Find voronoi neighbors Approximation Does not take account of barriers (rivers, highways) Does not take into account road system 4 2

21 Drive-Time Zone (Dijkstra) How many competitors are in the catchment area of my shop? All streets segments within a drive time distance <= d from a given starting point Use Dijkstra s algortihm Complexity: 2 O ( V ) O( V lgv + E) depending on data structures used for implementation Realistic approximation Take account of barriers (rivers, highways) take into account road system, maximum speed on road 4 Pre-procesing Several of the feature extractions are computationally quite expensive (at least for large data sets) and there is often a combinatorial explosion of features that might be constructed. Several strategies are used in Spatial Warehouse Design: Selective Pre-processing: materializing important joins in advance (storage requirements!) Approximate precomputing: e.g. using Minimum Bound Rectangle to approximate polygon Schema Design (e.g. Star-Schema with selective materialization): Han J., Stefanovic N., Koperski K. Selective Materialization: An Efficient Method for Spatial Data Cube Construction. PAKDD,

22 Spatial Database of Vector Objects: Discussion Relations between objects implicit Very flexible: depending on analysis task different relationsships can be constructed storage efficient; no overhead for storing relationship information compute intensive (thus spatial Indexing very important) Consider what and when to materialize Very rich possibilities to create new, non-trivial objects from existing ones Makes feature extraction an important topic for Data Mining Inherently multi-relational setting (but not first-order) Could also be formulated in a deductive database setting 43 Interactive Visualization of Spatial Data Exploratory Data Analysis p n ( p) ( p p ) 44 22

23 Interactive Visualization of Spatial Data Exploratory Data Analysis (work by G. Andrienko & N. Andrienko, H. Voss and others at Fraunhofer IAIS) For the theory behind CommonGIS, see the book Andrienko, N. and Andrienko G.: Exploratory Analysis of Spatial and Temporal Data - A Systematic Approach, Springer, Geographic Information Systems and CommonGIS Many commercial tools available - ESRI ARC GIS - Mapinfo - Intergraph - Manifold But CommonGIS is different and unique - Map-based exploratory data analysis - stresses interactive visualization manipulation of statistical data in space - elaborated facilities for time-series visualization CommonGIS can be aquired for non-commercial use by educational instutions for no fee See web page

24 CommonGIS = Fraunhofer IAIS Tool for Map-based Exploratory Data Analysis - combines interactive cartography and statistics Multi-dimensional - Time-series visualization and analysis - Combines Vector-Raster transformation Decision support - Weighted Sums - Ideal Point Analysis Multivariate - Similarity analysis - Dominant Attribut - Integration with Weka (Clustering, Decision Trees) 47 CommonGIS: Visual analysis of spatial data Interactive spatial search for geographic objects and recognition of spatial patterns: dynamic choropleth maps, pie charts, bar charts, etc. with dynamic removal of outliers and dynamic queries Comparison of attribute values of geographic objects (relations and correlations) and comparison of spatial patterns (spatial correlations): (Linked) dynamic maps and interactive diagrams multiple (linked) dynamic maps 48 24

25 CommonGIS: Visual analysis of spatio-temporal data CommonGIS as an interactive browser to study how a spatial pattern evolves over time: time aware maps (animations) time series charts CommonGIS as an interactive browser for temporal behaviours of objects: set of controls for analysing time intervals (object animations) CommonGIS as an interactive browser of discrete space-time events to find spatiotemporal clusters: space-time cube 49 Time Series Sales per Shop and Product Category 5 25

26 Time-Series: Sales per Shop and Product Category Bäckerei Stehcafé Sitzcafé Terrasse Different Time Hierarchies (Year, Quarter, Month, Day ) 5 CommonGIS: Data transformation Transformation of data for further analysis: Attribute transformations: calculate statistical indices transform and combine attribute data arithmetically dynamic classifiers (linked with dynamic choropleth map) cross classifiers (linked with dynamic choropleth map) Geographic transformations: query, transform, combine, derive raster data illumination model raster -> vector transformations (i.e. raster -> area aggregation) point/line -> raster transformations 52 26

27 CommonGIS: Combination of Vector and image data 53 27

28 Geographic and Spatial Data Mining Methods p n ( p) ( p p ) 54 Autocorrelation p n ( p) ( p p ) 55

29 Spatial Variation Field Soil Moisture How are variables distributed in space? Tobler s First Law of Geography: Everything is related to everything else, but near things are more related than distant things. distribution of variables depends on space variables are autocorrelated Franke, diploma thesis, Leipzig Univ., Spatial Autocorrelation: Binary Example binary attribute (blue, white) autocorrelation to four immediate neighbors Moran Index (here): n I = n equal equal n + n change change - equal - change I =.86 I =.39 I =. I = -. Goodchild, CATMOG, GeoBooks, Norwich,

30 Moran s I Morans s I is a measure for spatial autocorrelation. It is a weighted correlation coefficient used to detect departures from spatial randomness. Departures from randomness indicate spatial patterns such as clusters and geographic trend. Values of I larger than indicate positive spatial autocorrelation; values smaller than indicate negative spatial autocorrelation. Moran's I is a weighted product-moment correlation coefficient, where the weights reflect geographic proximity. z attribute of interest; w weight; n number of areal objects I n n n, j i ij, i j i i= j= = n n n i= j= w w ( z z)( z ij, i j i= ( z z) i j z) 2 Example: n = 4 A C B D w ij A B C A B C D weight matrix D 58 Spatial Autocorrelation similarity in location indicates similarity in attribute value differs from temporal autocorrelation - dimensional autocorrelation in time series, spatial autocorrelation spreads in 2 or 3 dimensions - only forward causality in time series, direction of causality not restricted in space depends on scale # sunspots Sunspot Time Series Temperature of Sunspots year 59 3

31 Effects of Autocorrelation makes spatial abstraction possible makes standard approaches of analysis impossible - most statistics assume iid makes local inference attractive - Kriging, knn, makes choice of sampling interval hard - autocorrelation depends on scale makes interpolation easier than extrapolation correlation + - spatial autocorrelation distance zero autocorrelation = independence of location 6 Problem types for Spatial Data Mining Spatial Data Mining := partially automated search for patterns and models in large spatial databases Classification of methods along the following hierarchy Points Points, Lines and Area Networks Tracks in space and time 6 4

32 Handling spatial data in Data Mining Basic Options Treat as ordinary variables no special algorithms needed spatial properties ignored, e. g. discontiguous areas Make spatial relationships explicit e. g. infer topological relationship expensive, but allows normal algorithms to be used Can by done as pre-processing or dynamically (latter requires specialized algortihms) Specialized algorithms - Neighborhood methods, kriging, Gaussian processes, density-based clustering Use proper combination of data, preprocessing, algorithms, and interaction software! 62 Mining Point Data p n ( p) ( p p ) 63 5

33 Mining Point Data Time Complexity Points Space Complexity 64 Clustering spatial point data Point data conceived as discrete objects Many approaches exists for clustering spatial point data In statistics, measures of spatial randomness or non-randomness have been developed (e.g. Ripley 99, Cressie 993) - Ripley s K function as measuring deviation from complete spatial randomness (as exemplified by a Poisson process) - Moran s I, which measures autocorrelation Bayesian approaches often coming from image analysis (cf. Lawson et al 22) In Geography, spatial clustering algorithms have been developed (Openshaw, GAM, 99) 65 6

34 Density Based Clustering a KDD approach [Ester et al. 996] Suitable for large databases Discovers areas of high density and turns them into clusters Discovers clusters of arbitrary shape Can handle noise Algorithm DBSCAN Note: Relatively straightforward extension to vector data possible (GDBSCAN); requires more complex definition of some key concepts (neighborhood and MinPts) 66 Clustering spatial data distance-based clustering is inherently spatial but assumption of convex clusters (e.g. k-means) inappropriate for many geographical tasks X X X X X X X X X X X X 67 source: Ester et al 997 7

35 Definitions Eps-neighborhood of a point p N ε (p) := {q D dist (p, q) ε} Definition of Eps is a crucial parameter! A point p is directly density-reachable from q iff. p N ε (q) 2. N ε (q) >MinPts ( q is core object ) - Not necessarily symmetric p: border object q:core object q p q p P directly density reachable from q Q not directly density 68 reachable from p Definitions 2 density-reachable = p is density-reachable from point q wrt to Eps and MinPts iff there is a chain of points p,,p n, p =q,p n =p such that p i+ is directly density-reachable from p i Transitive, not symmetric p is density-connected to q iff there is point o such that p and q are density-reachable from o wrt to Eps and MinPts. Symmetric q p o p and q densityconnected to each other by o p density reachable from q q not density reachable from p p p 69 8

36 Density-connected clustering A cluster C wrt. To Eps and MinPts is a non-empty subset of database D, where () p,q: if p C and q is density-reachable from p wrt Eps and MinPts, then q C (2) p,q C: p is density connected to q wrt to Eps and MinPts. Non-covered points are noise Each cluster contains at least MinPts Exactly one clustering 7 Algorithm DBScan Basic Idea Check Eps-Neigborhood of every unclassified point in database If neighborhood of p contains more than MinPts, a new cluster with p as core object is build Collect directly density reachable objects from this set, merging clusters as necessary Terminate when no new point can be added to any cluster Complexity: O(n log n) when spatial index is used, otherwise O(n 2 ) 7 9

37 Kriging-Spatial Interpolation p n ( p) ( p p ) 72 Kriging developed by G. Matheron in the 96s based on work of D. Krige geostatistical method of interpolation Point data conceived as samples from a continuous surface results are smoothly varying surfaces provides optimality given assumptions (best linear unbiased estimate) variety of methods, e.g. Ordinary Kriging, Universal Kriging, Co-Kriging, Block Kriging, Stratified Kriging, Indicator Kriging,???? measurements??? unknown values Good introduction: Burrough, P., McDonnell, R

38 Spatial Variation Problem: spatial variation of a continuous attribute is often too irregular to be modelled by a simple, smooth mathematical function Solution: variation can be described by stochastic surface A stochastic process is a family of random variables Z(x) over the index set D R n : { Z( x) : x D} x location in n-dimensional space Z(x) random variable of interest, e.g. soil moisture A Gaussian process is a stochastic process for which any finite set of Z-variables has a joint multivariate Gaussian distribution. 74 Components of Spatial Variation structural component, having a constant mean or trend random, but spatially correlated component (regionalized variable) spatially uncorrelated random noise term Z(x) trend autocorrelation random noise Z( x) = m( x) + ε '( x) + ε '' value at location x is random variable X 75

39 Stationarity Problem: spatial data set is single realization of random process inference is impossible without further restrictions on spatial variation Intrinsic Stationarity (stationarity under translation): constant mean (E[...] = ) or trend (E[...] > ): [ Z( x) Z( x + h) ] const. E = variance of differences h is independent of location: 2 E {Z(x) Z(x + h)} = 2 γ(h) x h x+h Isotropy (stationarity under rotation) : spatial process evolves the same in all directions 76 Ordinary Kriging Assumptions: intrinsic stationarity with a constant mean h x+h - constant mean value in sampling area x E [ Z( x) Z( x + h) ] = - variance of differences depends only on the distance h between sites Var [ Z( x) Z( x + h) ] 2 = E[{ Z( x) Z( x + h)} ] = 2γ ( h) 2 = E[{ ε '( x) ε '( x + h)} ] Once structural effects have been accounted for, remaining variation is homogeneous in variance so that difference at sites are merely a function of differences between them. semivariance 77 2

40 Ordinary Kriging Proceedure:. Estimate semivariance γ(h) from data sample 2. Plot the experimental variogram 3. Fit a theoretical model to the experimental variogram 4. Estimate unknown values as weighted sum of neighboring measurements, determine optimal weights from variogram 78 Semivariance and Experimental Variogram semivariance depends only on distance (lag) h estimate semivariance between all pairs of measurements with distance h (repeat for all possible h) γˆ( h) = 2n n { z( xi) z( xi+ h) } i= 2 γ(h) Experimental Variogram lag h 79 3

41 Variogram nugget: γˆ( h) = 2n n { z( xi) z( xi+ h) } i= 2 - γ(h) = (by definition) - nugget effect represents small scale variation and measurement errors - estimate of ε γ(h) range sill range: - spatial dependency - here, variance of differences increases with distance - two points are more similar the closer they are nugget lag h sill: - semivariance levels off - variance of differences h is independent of distance 8 Variogram Models Spherical Model Exponential Model experimental variogram must be fitted to an appropriate variogram model γ(h) γ(h) lag h lag h most commonly used are the spherical, exponential, linear or Gaussian model γ(h) Linear Model γ(h) Gaussian Model lag h lag h 8 4

42 Interpolation of unknown Values unknown value at location x is estimated as weighted sum of neighboring measurements * Z ( x ) = n i= w Z( i x i ) weights w i are determined according to two restrictions - Z*(x ) is an unbiased estimate of Z(x ) - Z*(x ) is an optimal estimate Have to solve system of n+ linear equations of semivariances and weights 82 Equation System restriction on weights introduces Lagrange parameter φ (Restriction ) system of (n+) equations must be solved to obtain optimal weights for each x γ(x x ) K γ(x x n) w γ(x x ) M O M M M M = γ(xn x ) L γ(xn x n) w n γ(xn x ) L φ Ordinary Kriging is an exact interpolator, i.e. interpolated value of a sample location will be identical with the measurement taken 83 5

43 Variants of Kriging Universal Kriging structural component may contain a external trend Co-Kriging interpolation for one attribute incorporates information of another, correlated attribute sparse measurements of an expensive variable are supported by plenty measurements of a cheap variable Stratified Kriging interpolation within sub-areas equations are adjusted to avoid discontinuities on boundaries More Details: Burrough, P., McDonnell, R Mining Points, Lines, and Areas p n ( p) ( p p ) 85 6

44 Points, Lines and Areas Time Complexity Points, Lines, and Areas Points Space Complexity 86 Points, Lines and Areas Requirements: Point data Polygons aggregations Applications Customer Segmentation, Catchment Areas, Location Planning, Radio Network Analysis Examples: GDBScan Clustering Spatial Subgroup Minig Spatial Association Rules Spatial Model Trees 87 7

45 Clustering of Vector Data: GDBScan [Sander et al 998] Extension of DBSCan - Sample Instantiations dist < ε intersects/meets neighbor S MinCard areas MinArea f (S) MinF 88 Spatial Subgroup Mining p n ( p) ( p p ) 89 8

46 Typical Data Mining representation spreadsheet data exactly table atomic values Data Mining for spatial data: very different from this representation 9 Subgroup Discovery Search (Klösgen 996, Wrobel 997) Subgroup discovery searches deviation patterns for subgroups overproportionally high share of target value (or mean of target variable) Top-down search from most general to most specific subgroups, exploiting partial ordering of subgroups S S 2 S more general than S 2 Beam search expands only the n best ones at each level Evaluating hypothesis according to quality function: N= Total population n= subgroup size p( T C) p( T ) p( T )( p( T )) n N N n p(t)= target share in total population p(t C)= target share in subgroup Extension to multi-relational representation in Wrobel (997) 9 9

47 Translating Multirelational Subgroups to Object-relational SQL Domain: relational database schema D = {R,..., R n } having geometry attributes G i Hypothesis Language Multirelational subgroups are represented by a concept set C = {C i }, where each C i consists of a set of attribute value-pairs {A =v,...,a n =v n } from a relation in D, a set of links L={L i } linking concepts C i, C k via their attributes A m, A k of the form (C i /A m {= inside overlaps... spatially_interact} C k /A n ) target attribute can be non-numeric (A =v ) or numeric aggregate (avg(a)=n) Example: C= {{district.long_term_illness=high, district.unemplyoment=high},{street.name= Manchester Road }} L= {{district.geometry spatially_interact street.geometry}} Enumeration districts with high rate of long term illness and unemplyoment crossed by Manchester Road Testing satisfaction of subgroup descriptions The number of tuples in D that satisfies a subgroup description is evaluated using SQL select statements including joins over multiple relations. 92 Approach: Translation of Spatial Subgroup Mining to SQL (Klösgen, May 22) Representing subgroups in object-relational SQL, i.e. multi-relational representation Using representation for spatial geometry based on Spatial Database Division of work between RDBMS and Search Manager Combining visualization in abstract and physical space 93 2

48 Division of labour between RDBMS and Search Manager (May, Savinov 23) mining query Database Server Search Algorithm statistics Database integration: efficiently organize mining queries Mining query delivers statistics (aggregations) sufficient for evaluating many hypotheses Mining Server search in hypothesis space generation and evaluation of hypotheses (subgroup patterns) 94 SPIN! Spatial Data Mining System Workspace Property Editor Flowchart-Tool Subgroup Result List Subgroup Viewer 95 2

49 Interactive Exploratory Analysis Parallel Coordinate Plot Choropleth Maps Combination of spatial and non-spatial visualization User selects and manipulates variables Powerful for analysis in low dimensions (3-4) Display dynamically linked Scatter Plot 96 Visualization of spatial sugroups High long-term illness in districts crossed by M6 p(t C) vs. p(c) Subgroup Overview Spatial Venn Diagram Subgroup Linked Display 97 22

50 Radio Network Planning in Telecommunication SPIN! Mapviewer (Common GIS) High cut of call ration in mountanous regions crossed by highways having a certain technical configuration Legende: Blau: Autobahn Braun: große Höhe Schwarz: Subgruppe 98 Other commercial applications of Subgroup Discovery How are my customers characterized. Are there interesting profiles? Where to open the next supermarket? Does it create competition for my other supermarkets? Should I invest in UMTS in rural areas? 99 23

51 Spatial Association Rules work and slides by Donato Malerba et al., Univ. Bari p n ( p) ( p p ) Spatial association rules An association pattern P (s%) is a spatial association pattern if it contains at least one spatial relation A large town intersects a road and is adjacent to water (62%) An association rule Q R (s%, c%) is a spatial association rule if Q R is a spatial association pattern IF a large town intersects a road THEN it is also adjacent to water (62%, 89%) Seminal work by Koperski & Han 995 Malerba et al 24

52 The problem Given a spatial database (SDB) with a set of reference objects S, some set R k, k m, of task-relevant objects some spatial hierarchies H involving objects in R k k M granularity levels in the descriptions aset of granularity assignments ψ k which associate each object in H k with a granularity level a couple of thresholds minsup[l] and minconf[l] for each granularity level a domain knowledge Find strong multiple-level spatial association rules. Malerba et al 2 The solution Solution (Appice et al., IDA Journal, 23) based on an Inductive Logic Programming (ILP) approach spatial relations easily handled spatial pattern conjuction of first-order logic atoms θ-subsumption orders the space of spatial patterns monotonicity of support w.r.t. θ-subsumption pruning of patterns at the same granularity level in the candidate generation phase monotonicity of pattern frequency w.r.t. granularity level pruning of patterns at different granularity levels in the candidate generation phase Implemented in SPADA (Spatial Pattern Discovery Algorithm) European project SPIN (Spatial Mining for Data of Public Interest) 3 25

53 Extensions of initial solutions Efficiency improvement of pattern evaluation by caching support objects for each stored pattern Definition of a declarative bias to filter out rules on the basis of users preferences efficiency improvement is a byproduct - In real-world applications a large number of spatial patterns can be generated even for a few hundred spatial objects. - Most of discovered patterns are useless for the application at hand - Urban accessibility application: only spatial patterns involving some sociological factor (household with no car) are interesting. Integration of SPADA in the ARES system that interfaces a Spatial DB (Oracle Spatial) 4 Mining Network Data p n ( p) ( p p ) 5 26

54 Networks Time complexity Networks Points, Lines, and Areas Points Space Complexity 6 Points and Networks Requirements: Point Data Polygons Aggregations Spatial dependencies and relations, networks Examples: Traffic frequency prediction Method: knn 7 27

55 Case Study: Outdoor Advertising - Frequency Atlas Customer: Fachverband für Außenwerbung (FAW; German Outdoor Advertising Association) Task: Performance value assessment of advertising media Traffic volume forecast separate for private cars, public transport, pedestrians 8 Determining reach of a poster board Gesellschaft für Konsumforschung Frequency + Media factories = poster reach 9 28

56 The project in numbers Complete model for all German cities with more than 5. inhabitants (92 cities) = ca.. street segments! Complete model includes, for each segment, item - car frequency - pedestrian frequency - public transport frequency The model is presently beeing extended to to all cities with between. and 5. inhabitants Basic Data: traffic measurements Manual traffic measurement at selected poster locations - 4 times 6 minutes at four days of the week at four times of day Additional empirical model of day totals Properties - Well defined measurements - Extended measurement period, so concept drift can not be excluded Total of 96. manual measurements 29

57 Secondary data Street network Sociodemographics + Socioeconomics Points of Interest (POI) Frequency measurements Public transport network DATA MINING Frequency classes 2 How Spatial Autocorrelation helps Local Measurements Inhomogeneous measurements on the same street

58 Spatial knn Attributes of street segments: - Name, type,. class - Points of Interest - Spatial coordinates Locations with measurement values Distance beetween two segments x a, x b d Selection of the k closest x,, x k Prediction for new segment x q (Project has actually used specially adapted distance measure) M ( xa, xb ) = yˆ q = k i= x am m= w y i i x bm k i= w i w i = with d ( x, x ) q i Segment 4 Spatial KNN - Properties knn captures well autocorrelation inherent in the data Allows to bring in background knowledge by fine-tuning distance function Database Integrated (Oracle Spatial) Performs dynamic spatial query (minimum distances among polygons) Performance improvements Spatial Queries use Index Structures (R-Tree), still relatively costly (i.e. dominates overall runtime) Partial evaluation of distance function based on lower bounds for distance to minimize number of spatial queries Can handle data sets that do not fit into main memory 5 3

59 Smoothing based on flow constraints Measurement errors lead to inconsistencies Need plausible assignment of frequencies Solution: Use Kirchhoff s law as constraint - Sum of inputs = sum of outputs Smoothing algorithm finds locally optimal solution using constraint relaxation 6 Explaining frequencies Problem: Customer wants transparent values, not a black box => Problem for Spatial knn Solution: Fit an explanatory model to the predicted values Allows to understand why predictions are as they are Allows to identify potential outliers and areas of high uncertainty Use Model Trees Geographic Space encoded in x-y coordinates 7 32

60 Numerical prediction with model trees ORTSTEIL = INNENSTADT (LR)... Straßenkategorie: Fussgängerzone: Nebenstr. Hauptstr. Bahnhof Nein Ja Nein Ja Distanz_zu_Bahnhof: <= 5 > 5 Anzahl_Restaurants : <= 5 > 5 Anzahl_Restaurants : <= 5 > 5 X-Koordinate <= > Y-Koordinate LM LM2 LM3 LM4 LM5 <= 9.6 > 9.6 LM FREQUENZ = * X * ANZAHL_EINKAUF * MESSE LM6 8 Improving model by spotting outliers based on model tree prediction Points with great prediction error are checked - Visual inspection - Getting additional empirical input by taking new measurements Corrected values are basis for next round in model building, leading to improved results 9 33

61 Final Result: Frequency Map Cars Pedestrians Public Cars Transport Public Transport Pedestrians 2 Final result: frequency atlas (cars, public transport, pedestrians) ~ ~ Million Million street street segments segments predicted predicted based based on on measurements measurements Used for determining poster prices in Germany since 26 2 Rare instance of a spatial data mining problem that has become business critical 34

62 Spatial Model Trees [Malerba, Appice, Cecci 25] Standard Model Trees (e.g. M5 ) can do Spatial Mining by splitting along x and y coordinates Mrs-Smoti (Malerba et al. 24) is a variant of Model Trees that - Allows regression nodes as interior nodes - Handles directly autocorrelation: Spatial regression model with dependencies in response variables: spatially lagged response It inputs spatial objects eventually belonging to separate thematic layers stored in a spatial database S - target objects (main subject of analysis) - non target objects (relevant for the task in hand) and outputs a spatial model tree T by - partitioning training spatial data according to intra-layer and inter-layer relationships - associating different regression models to disjoint spatial areas Integrates spatial database queries (see Subgroup Discovery) T Y=a+bX X 3 α 2 7 X 2 β Y =i+lx Y =c+dx X 4 γ Y =e+fx 2 Y =g+hx 22 3 Mining Tracks in Space and Time p n ( p) ( p p ) 23 35

63 Tracks in Space and Time Time complexity Tracks in Space and Time Networks Points, Lines, and Areas Points Space Complexity 24 Tracks in space and time Requirements: Point daa Polygons Aggregations Networks Tracks, GPS/RFID/Sensor-Measurement Applications: Traffic prediction, Mobility analysis Examples Sampling, Event analysis, non-linear optimization 25 36

64 Mobility analysis based on GPS-tracks introduction of new pricing model for poster sites based on GPS tracks registration of contact frequencies with poster sites contact extrapolation for target groups: - socio-demographic characteristics - residential areas 26 Media Trend Journal, Nov, 26 Time patterns Patterns / Questions - How long (days) does it take till x% of objects visit all locations? - How long does it take till x% of objects visit at least one location twice? Applications - determine mobility of a group of people - reach of poster networks - find popularity of locations (theatres, supermarkets, hospitals) 27 37

65 Modelling tasks Modelling mobility for cities with GPS-measurements for the overall population Predicting mobility for cities without measurements (hard task!) Extrapolating predictions in time 28 GeoPKDD - FET Project IST-495 Geographic Privacy-aware Knowledge Discovery and Delivery December 25 November 28 Project Leader: Fosca Giannotti General Project Idea extracting user-consumable forms of knowledge from large amounts of raw geographic data referenced in space and in time. knowledge discovery and analysis methods for trajectories of moving objects, which change their position in time, and possibly also their shape or other significant features devising privacy-preserving methods for data mining from sources that typically contain personal sensitive data 29 38

66 The Consortium ID Acronym Partner Country KDDLAB Knowledge Discovery and Delivery Laboratory, ISTI-CNR, Istituto di Scienza e Tecnologie dell Informazione, Pisa. - jointly with Univ. Pisa, Dept. of Computer Science I 2 LUC Univ. Limburg, Theoretical Computer Science Group. B 3 EPFL EPFL, Lab. DB, Lausanne. CH 4 FAIS Fraunhofer Institute for Autonomous Intelligent Systems, Sankt Augustin. D 5 WUR Wageningen UR, Centre for GeoInformation. NL 6 CTI Research Academic Computer Technology Institute, Research and Development Division. - jointly with Univ. Piraeus, Dept. of Informatics GR 7 UNISAB Sabanci University, Faculty of Engineering and Natural Sciences. TK 8 Michael WIND May WIND Telecomunicazioni SpA, Direzione Reti Wind Progetti Finanziati & Technology Scouting. 3 I Geographic Privacy-aware Knowledge Discovery Process Aggregative Location-based services Bandwidth/Power optimization Mobile cells planning Traffic Management Accessibility of services Mobility evolution Urban planning. Telecommunication company (WIND) Privacy-aware Data mining interpretation visualization Public administration or business companies GeoKnowledge trajectory reconstruction p(x)=.2 ST patterns warehouse Trajectories warehouse Privacy enforcement 3 39

67 GeoPKDD Specific Goals models for moving objects, and data warehouse methods to store their trajectories knowledge discovery and analysis methods for moving objects and trajectories, techniques to make such methods privacy-preserving techniques for reasoning on spatio-temporal knowledge and on background knowledge techniques for delivering the extracted knowledge within the geographic framework 32 From Traces to Trajectories: the Source Data GSM network Source: Pedreschi & Giannotti, 25 streams of log data of mobile phones, e.g. cells in the GSM/UMTS network Entering the cell - e.g. (UserID, time, IDcell, in) Exiting the cell - e.g. (UserID, time, IDcell, out) Movements inside the cell? - Eg (UserID, time, X,Y, Idcell Real trajectories are continuous functions Logs are discrete sampling of real trajectories, dependent on the wireless network technology - unregular granularity in time and space - possible imperfection/imprecision An approximated reconstruction of the real trajectory from its log traces is needed 33 4

68 Movement patterns Clustering Group together similar trajectories For each group produce a summary Frequent patterns Discover frequently followed (sub)paths Classification 2 Extract behaviour rules from history % 7%? Use them to predict behaviour of future users 6 % Source: Pedreschi & Giannotti, 25 5% 8% 34 Why emphasis on privacy? More, better data are gathered, more vulnerability from correlation On the other hand, more and new data bring new opportunities Need to maintain privacy without giving up opportunities Need to obtain social acceptance through demonstrably trustworthy solutions Privacy in GeoPKDD... is a technical issue, besides ethical, social and legal, in the specific context of ST data How to formalize privacy constraints over ST data and ST patterns? - E.g., anonymity threshold on clusters of individual trajectories How to design DM algorithms that, by construction, only yield patterns that meet the privacy constraints? 35 4

69 Challenges p n ( p) ( p p ) 36 Causal Inference from Statistical Spatio-Temporal Data Current project at IAIS for newspaper publisher: Sales prediction of individual shops. What happens if a shop closes or is sold out? Predict to which alternative shop customers go. Spatio-Temporal Clustering of shops Time Series Prediction Modeling customer behavior Causal inference about customer behavior If shop A closes, n% of A s customers go to B, m% to C 37 42

70 Sales data per day per shop for several years available Use similarity of time series over some period for determining anomaly in behavior 38 Closed Shop Alternative shops Other shops Use spatial structure to infer potential alternative shops. strong weak People went from A to B when A is closed and B shows anomaly in behavior that cannot be explained otherwise 39 43

71 Closed Shop Alternative shops Other shops Diagramms such as this one can be generated automatically for historic cases Challenge: based on historic examples come up with a predictive model strong 4 weak Ubiquitous Knowledge Discovery Ubiquitous Knowledge Discovery (Embedded Data Mining and mobile and /or distributed mobile, micro processors) Grid Mining (Distributed Architecture, Grid Computing) Knowledge Discovery in mobile Systems (Robots, RFID, GPS, mobile phones, Cars,...) Static and dynamic Sensor networks (Reality Mining) Privacy-Preserving Data Mining KDUbiq Coordination Action (EU, 25-28)

72 Ubiquitous Knowledge Discovery Characteristics of ubiquitous knowledge discovery systems objects are distributed in time and space dynamic infrastructure (moving objects, appear and disappear) analysis situation is in real-time, models evolve incrementally objects have access to local information only, never see the global picture: only knowledge of local spatial environment typically, objects exchange information with other objects Spatial Data Mining is a key issue here! KDUbiq reflects the future research challenges involved in this area 42 Summary Spatial Data form a rich environment for analysis Feature extraction and construction (Spatial Queries & Functions, Voronoi, ) play a very important role Efficiency is often a big concern A variety of approaches to Spatial Data Mining exist, coming from Statistics, Databases, Machine Learning We have seen examples for density based clustering, kriging, subgroup discovery, association rules, model trees, knn, Survival Analysis Methods are different in the data types they can handle Real-world applications are feasible today Many more challenges in the future due to ubiquous environments! 43 45

73 Literature () Andrienko, N. and Andrienko G.: Exploratory Analysis of Spatial and Temporal Data - A Systematic Approach, Springer, 25 Appice, A., M. Ceci, A. Lanza, F.A. Lisi, & D. Malerba (23). Discovery of Spatial Association Rules in Georeferenced Census Data: A Relational Mining Approach, Intelligent Data Analysis, 7, 6. Burrough, P., McDonnell, R., Principles of Geographical Information Systems, OUP, 998 Cressie, N, 993. Statistics for Spatial Data, Wiley Egenhofer, M.. Reasoning about binary topological relations. In Gunther O. and Schek H.-J., editors, Second Symposium on Large Spatial Databases, volume 525 of LNCS, pages Springer, 99. Ester M., Kriegel H.-P., Sander J. and Xu X A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. Portland, OR, Giannotti, F., Nanni, M., Pedreschi, P.: Efficient Mining of Temporally Annotated Sequences. SDM 26 Goodchild, M.F., Spatial Autocorrelation. CATMOG 47,Geobooks. 986, Norwich UK. Han J., Stefanovic N., Koperski K. Selective Materialization: An Efficient Method for Spatial Data Cube Construction. PAKDD, 998. Klösgen, W. (996) Explora: A multipattern and multistrategy discovery assistant In Fayyad, Advances in Knowledge Discovery and Data Mining. MIT Press. Klösgen, W., May, M.: Spatial Subgroup Mining Integrated in an Object-Relational Spatial Database. PKDD 22: Klösgen, W., May, M., Petch, J. 23, Mining census data for spatial effects on mortality, Intelligent Data Analysis Issue: Volume 7, Number 6 / 23 Pages: Literature (2) Koperski, K., Han, J, Discovery of Spatial Association Rules in Geographic Information Databases (995), Proc. 4th Int. Symp. Advances in Spatial Databases, SSD Koperski, K., J. Adhikary and J. Han, `` Spatial Data Mining: Progress and Challenges'', 996 SIGMOD'96 Workshop. on Research Issues on Data Mining and Knowledge Discovery (DMKD'96), Montreal, Canada, June 996 Lawson, A. B. and Denison, D. (22) (eds) Spatial Cluster Modelling Chapman & Hall CRC, London. Lisi, F.A, D. Malerba (24). Inducing Multi-Level Association Rules from Multiple Relations. Machine Learning, 55:75-2. Longley, P., Goodchild, M, MacGuire, D., Rhind, D, 2. Geographic Informations Systems and Science, Wiley Malerba, D., Appice, A., Cecci, M. 25, Mining Model Trees from Spatial Data, LNCS, PKDD25 May, M., Ragia, L. 22, Spatial Subgroup Discovery Applied to the Analysis of Vegetation Data, PAKM 22, LNCS 2569 May, M., Savinov, A 24 SPIN!-An Enterprise Architecture for Spatial Data Mining, Knowledge-Based Intelligent Information and Engineering Systems, LNCS 2773, 23 Openshaw, S., and Craft, A., (99) 'Using geographical analysis machines to search for evidence of cluster and clustering in childhood leukaemia and non-hodgkin Lymphomas in Britain. In G. Draper (ed) 'The Geographical Epidemiology of Childhood Leukaemia and non-hodgkin Lymphomas in Great Britain ', Studies in Medical and Population Subjects No 53, OPCS, London, HMSOBurroughs Ripley, B. 988, Statistical Inference for Spatial Processes, CUP Sander, J., M. Ester, H.-P. Kriegel, and X. Xu. Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery, 2(2):69--94, 998. Wrobel, S. : An Algorithm for Multi-relational Discovery of Subgroups. PKDD 997:

74 Thanks! Fraunhofer IAIS Knowledge Discovery Dr. Contact: Schloss Birlinghoven Sankt Augustin Tel: 224 / / [email protected] 46 47

Geointelligence New Opportunities and Research Challenges in Spatial Mining and Business Intelligence

Geointelligence New Opportunities and Research Challenges in Spatial Mining and Business Intelligence Geointelligence New Opportunities and Research Challenges in Spatial Mining and Business Intelligence Stefan Wrobel Christine Körner, Michael May, Hans Voss Fraunhofer Society Joseph von Fraunhofer, German

More information

Geography 4203 / 5203. GIS Modeling. Class (Block) 9: Variogram & Kriging

Geography 4203 / 5203. GIS Modeling. Class (Block) 9: Variogram & Kriging Geography 4203 / 5203 GIS Modeling Class (Block) 9: Variogram & Kriging Some Updates Today class + one proposal presentation Feb 22 Proposal Presentations Feb 25 Readings discussion (Interpolation) Last

More information

Introduction. Introduction. Spatial Data Mining: Definition WHAT S THE DIFFERENCE?

Introduction. Introduction. Spatial Data Mining: Definition WHAT S THE DIFFERENCE? Introduction Spatial Data Mining: Progress and Challenges Survey Paper Krzysztof Koperski, Junas Adhikary, and Jiawei Han (1996) Review by Brad Danielson CMPUT 695 01/11/2007 Authors objectives: Describe

More information

CHAPTER-24 Mining Spatial Databases

CHAPTER-24 Mining Spatial Databases CHAPTER-24 Mining Spatial Databases 24.1 Introduction 24.2 Spatial Data Cube Construction and Spatial OLAP 24.3 Spatial Association Analysis 24.4 Spatial Clustering Methods 24.5 Spatial Classification

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Exploratory Data Analysis for Ecological Modelling and Decision Support

Exploratory Data Analysis for Ecological Modelling and Decision Support Exploratory Data Analysis for Ecological Modelling and Decision Support Gennady Andrienko & Natalia Andrienko Fraunhofer Institute AIS Sankt Augustin Germany http://www.ais.fraunhofer.de/and 5th ECEM conference,

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Spatial Data Analysis

Spatial Data Analysis 14 Spatial Data Analysis OVERVIEW This chapter is the first in a set of three dealing with geographic analysis and modeling methods. The chapter begins with a review of the relevant terms, and an outlines

More information

Linköpings Universitet - ITN TNM033 2011-11-30 DBSCAN. A Density-Based Spatial Clustering of Application with Noise

Linköpings Universitet - ITN TNM033 2011-11-30 DBSCAN. A Density-Based Spatial Clustering of Application with Noise DBSCAN A Density-Based Spatial Clustering of Application with Noise Henrik Bäcklund (henba892), Anders Hedblom (andh893), Niklas Neijman (nikne866) 1 1. Introduction Today data is received automatically

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Spatial Data Preparation for Knowledge Discovery

Spatial Data Preparation for Knowledge Discovery Spatial Data Preparation for Knowledge Discovery Vania Bogorny 1, Paulo Martins Engel 1, Luis Otavio Alvares 1 1 Instituto de Informática Universidade Federal do Rio Grande do Sul (UFRGS) Caixa Postal

More information

EXPLORING SPATIAL PATTERNS IN YOUR DATA

EXPLORING SPATIAL PATTERNS IN YOUR DATA EXPLORING SPATIAL PATTERNS IN YOUR DATA OBJECTIVES Learn how to examine your data using the Geostatistical Analysis tools in ArcMap. Learn how to use descriptive statistics in ArcMap and Geoda to analyze

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Introduction to GIS (Basics, Data, Analysis) & Case Studies. 13 th May 2004. Content. What is GIS?

Introduction to GIS (Basics, Data, Analysis) & Case Studies. 13 th May 2004. Content. What is GIS? Introduction to GIS (Basics, Data, Analysis) & Case Studies 13 th May 2004 Content Introduction to GIS Data concepts Data input Analysis Applications selected examples What is GIS? Geographic Information

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey

Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey 1 Recommendations in Mobile Environments Professor Hui Xiong Rutgers Business School Rutgers University ADMA-2014 Rutgers, the State University of New Jersey Big Data 3 Big Data Application Requirements

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

Oracle8i Spatial: Experiences with Extensible Databases

Oracle8i Spatial: Experiences with Extensible Databases Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction

More information

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]

More information

High-dimensional labeled data analysis with Gabriel graphs

High-dimensional labeled data analysis with Gabriel graphs High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use

More information

The STC for Event Analysis: Scalability Issues

The STC for Event Analysis: Scalability Issues The STC for Event Analysis: Scalability Issues Georg Fuchs Gennady Andrienko http://geoanalytics.net Events Something [significant] happened somewhere, sometime Analysis goal and domain dependent, e.g.

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

GIS & Spatial Modeling

GIS & Spatial Modeling Geography 4203 / 5203 GIS & Spatial Modeling Class 2: Spatial Doing - A discourse about analysis and modeling in a spatial context Updates Class homepage at: http://www.colorado.edu/geography/class_homepages/geog_4203

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining 1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining techniques are most likely to be successful, and Identify

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

NEW MEXICO Grade 6 MATHEMATICS STANDARDS

NEW MEXICO Grade 6 MATHEMATICS STANDARDS PROCESS STANDARDS To help New Mexico students achieve the Content Standards enumerated below, teachers are encouraged to base instruction on the following Process Standards: Problem Solving Build new mathematical

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

OUTLIER ANALYSIS. Data Mining 1

OUTLIER ANALYSIS. Data Mining 1 OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Introduction to time series analysis

Introduction to time series analysis Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

More information

Oracle Platform GIS & Location-Based Services. Fred Louis Solution Architect Ohio Valley

Oracle Platform GIS & Location-Based Services. Fred Louis Solution Architect Ohio Valley Oracle Platform GIS & Location-Based Services Fred Louis Solution Architect Ohio Valley Overview Geospatial Technology Trends Oracle s Spatial Technologies Oracle10g Locator Spatial Oracle Application

More information

Spatial Data Mining Methods and Problems

Spatial Data Mining Methods and Problems Spatial Data Mining Methods and Problems Abstract Use summarizing method,characteristics of each spatial data mining and spatial data mining method applied in GIS,Pointed out that the space limitations

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Introduction to Spatial Data Mining

Introduction to Spatial Data Mining Introduction to Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering 7.6 Outlier Detection Introduction: a classic

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

More information

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool. International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm

More information

Scalable Cluster Analysis of Spatial Events

Scalable Cluster Analysis of Spatial Events International Workshop on Visual Analytics (2012) K. Matkovic and G. Santucci (Editors) Scalable Cluster Analysis of Spatial Events I. Peca 1, G. Fuchs 1, K. Vrotsou 1,2, N. Andrienko 1 & G. Andrienko

More information

Spatial Data Warehouse and Mining. Rajiv Gandhi

Spatial Data Warehouse and Mining. Rajiv Gandhi Spatial Data Warehouse and Mining Rajiv Gandhi Roll Number 05331002 Centre of Studies in Resource Engineering Indian Institute of Technology Bombay Powai, Mumbai -400076 India. As part of the first stage

More information

Data Visualization Techniques and Practices Introduction to GIS Technology

Data Visualization Techniques and Practices Introduction to GIS Technology Data Visualization Techniques and Practices Introduction to GIS Technology Michael Greene Advanced Analytics & Modeling, Deloitte Consulting LLP March 16 th, 2010 Antitrust Notice The Casualty Actuarial

More information

An Overview of Database management System, Data warehousing and Data Mining

An Overview of Database management System, Data warehousing and Data Mining An Overview of Database management System, Data warehousing and Data Mining Ramandeep Kaur 1, Amanpreet Kaur 2, Sarabjeet Kaur 3, Amandeep Kaur 4, Ranbir Kaur 5 Assistant Prof., Deptt. Of Computer Science,

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort [email protected] Motivation Location matters! Observed value at one location is

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Tracking Groups of Pedestrians in Video Sequences

Tracking Groups of Pedestrians in Video Sequences Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal

More information

An Introduction to Point Pattern Analysis using CrimeStat

An Introduction to Point Pattern Analysis using CrimeStat Introduction An Introduction to Point Pattern Analysis using CrimeStat Luc Anselin Spatial Analysis Laboratory Department of Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

More information

Fuzzy Spatial Data Warehouse: A Multidimensional Model

Fuzzy Spatial Data Warehouse: A Multidimensional Model 4 Fuzzy Spatial Data Warehouse: A Multidimensional Model Pérez David, Somodevilla María J. and Pineda Ivo H. Facultad de Ciencias de la Computación, BUAP, Mexico 1. Introduction A data warehouse is defined

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Algorithms and Applications for Spatial Data Mining

Algorithms and Applications for Spatial Data Mining Published in Geographic Data Mining and Knowledge Discovery, Research Monographs in GIS, Taylor and Francis, 2001. Algorithms and Applications for Spatial Data Mining Martin Ester, Hans-Peter Kriegel,

More information

GEOENGINE MSc in Geomatics Engineering (Master Thesis) Anamelechi, Falasy Ebere

GEOENGINE MSc in Geomatics Engineering (Master Thesis) Anamelechi, Falasy Ebere Master s Thesis: ANAMELECHI, FALASY EBERE Analysis of a Raster DEM Creation for a Farm Management Information System based on GNSS and Total Station Coordinates Duration of the Thesis: 6 Months Completion

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

What is GIS? Geographic Information Systems. Introduction to ArcGIS. GIS Maps Contain Layers. What Can You Do With GIS? Layers Can Contain Features

What is GIS? Geographic Information Systems. Introduction to ArcGIS. GIS Maps Contain Layers. What Can You Do With GIS? Layers Can Contain Features What is GIS? Geographic Information Systems Introduction to ArcGIS A database system in which the organizing principle is explicitly SPATIAL For CPSC 178 Visualization: Data, Pixels, and Ideas. What Can

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001 A comparison of the OpenGIS TM Abstract Specification with the CIDOC CRM 3.2 Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001 1 Introduction This Mapping has the purpose to identify, if the OpenGIS

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia

More information

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express

More information

Quality Assessment in Spatial Clustering of Data Mining

Quality Assessment in Spatial Clustering of Data Mining Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering

More information

10. Creating and Maintaining Geographic Databases. Learning objectives. Keywords and concepts. Overview. Definitions

10. Creating and Maintaining Geographic Databases. Learning objectives. Keywords and concepts. Overview. Definitions 10. Creating and Maintaining Geographic Databases Geographic Information Systems and Science SECOND EDITION Paul A. Longley, Michael F. Goodchild, David J. Maguire, David W. Rhind 005 John Wiley and Sons,

More information

Public Transportation BigData Clustering

Public Transportation BigData Clustering Public Transportation BigData Clustering Preliminary Communication Tomislav Galba J.J. Strossmayer University of Osijek Faculty of Electrical Engineering Cara Hadriana 10b, 31000 Osijek, Croatia [email protected]

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

Data Mining mit der JMSL Numerical Library for Java Applications

Data Mining mit der JMSL Numerical Library for Java Applications Data Mining mit der JMSL Numerical Library for Java Applications Stefan Sineux 8. Java Forum Stuttgart 07.07.2005 Agenda Visual Numerics JMSL TM Numerical Library Neuronale Netze (Hintergrund) Demos Neuronale

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel [email protected] 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Joseph Twagilimana, University of Louisville, Louisville, KY

Joseph Twagilimana, University of Louisville, Louisville, KY ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

Mapping Linear Networks Based on Cellular Phone Tracking

Mapping Linear Networks Based on Cellular Phone Tracking Ronen RYBOWSKI, Aaron BELLER and Yerach DOYTSHER, Israel Key words: Cellular Phones, Cellular Network, Linear Networks, Mapping. ABSTRACT The paper investigates the ability of accurately mapping linear

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases

A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases Published in the Proceedings of 14th International Conference on Data Engineering (ICDE 98) A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases Xiaowei Xu, Martin Ester, Hans-Peter

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

SPATIAL ANALYSIS IN GEOGRAPHICAL INFORMATION SYSTEMS. A DATA MODEL ORffiNTED APPROACH

SPATIAL ANALYSIS IN GEOGRAPHICAL INFORMATION SYSTEMS. A DATA MODEL ORffiNTED APPROACH POSTER SESSIONS 247 SPATIAL ANALYSIS IN GEOGRAPHICAL INFORMATION SYSTEMS. A DATA MODEL ORffiNTED APPROACH Kirsi Artimo Helsinki University of Technology Department of Surveying Otakaari 1.02150 Espoo,

More information