The Art of Spatial Data Mining A Review of A New Algorithm for Discovery of Spatial Association Rules



Similar documents
CHAPTER-24 Mining Spatial Databases

Introduction. Introduction. Spatial Data Mining: Definition WHAT S THE DIFFERENCE?

Seminar Report. Algorithm of Spatial Data Mining. Rajiv Gandhi Roll no CSRE, IIT Bombay.

Spatial Data Warehouse and Mining. Rajiv Gandhi

Oracle8i Spatial: Experiences with Extensible Databases

Spatial Data Preparation for Knowledge Discovery

Principles of Data Mining by Hand&Mannila&Smyth

II. SPATIAL DATA MINING DEFINITION

International Journal of Advance Research in Computer Science and Management Studies

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Spatial Data Preparation for Knowledge Discovery

Optimal Cell Towers Distribution by using Spatial Mining and Geographic Information System

Introduction. A. Bellaachia Page: 1

Using an Ontology-based Approach for Geospatial Clustering Analysis

SPATIAL DATA CLASSIFICATION AND DATA MINING

Data Mining and Database Systems: Where is the Intersection?

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

A Spatial Decision Support System for Property Valuation

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

DATA MINING - SELECTED TOPICS

Extraction of Satellite Image using Particle Swarm Optimization

Survey On: Nearest Neighbour Search With Keywords In Spatial Databases

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Distance Learning and Examining Systems

College information system research based on data mining

Reading Questions. Lo and Yeung, 2007: Schuurman, 2004: Chapter What distinguishes data from information? How are data represented?

Efficient Storage and Management of Environmental Information

Algorithms and Applications for Spatial Data Mining

DATA MINING TECHNIQUES AND APPLICATIONS

Categorical Data Visualization and Clustering Using Subjective Factors

Vector storage and access; algorithms in GIS. This is lecture 6

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

(b) How data mining is different from knowledge discovery in databases (KDD)? Explain.

The process of database development. Logical model: relational DBMS. Relation

Tracking System for GPS Devices and Mining of Spatial Data

Building Data Cubes and Mining Them. Jelena Jovanovic

Adobe Insight, powered by Omniture

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

A New Approach for Evaluation of Data Mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques

Application of Data Mining Techniques in Intrusion Detection

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734

Information Management course

Environmental Remote Sensing GEOG 2021

Classification and Prediction

COURSE RECOMMENDER SYSTEM IN E-LEARNING

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

INTEGRATING GIS AND SPATIAL DATA MINING TECHNIQUE FOR TARGET MARKETING OF UNIVERSITY COURSES

Mining various patterns in sequential data in an SQL-like manner *

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

How To Use Neural Networks In Data Mining

An architecture for open and scalable WebGIS

Object Recognition. Selim Aksoy. Bilkent University

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001

Prediction of Heart Disease Using Naïve Bayes Algorithm

EVENT CENTRIC MODELING APPROACH IN CO- LOCATION PATTERN ANALYSIS FROM SPATIAL DATA

MINING CLICKSTREAM-BASED DATA CUBES

Representing Geography

Fuzzy Spatial Data Warehouse: A Multidimensional Model

USING SPATIAL DATA MINING TO DISCOVER THE HIDDEN RULES IN THE CRIME DATA

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Jordan University of Science & Technology Computer Science Department CS 728: Advanced Database Systems Midterm Exam First 2009/2010

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

民 國 九 十 七 年 四 月 第 38 卷 第 2 期

Proc. of the 3rd Intl. Conf. on Document Analysis and Recognition, Montreal, Canada, August

Data Mining: Principles and Algorithms

Spatial Data Mining Methods and Problems

IMPLEMENTING SPATIAL DATA WAREHOUSE HIERARCHIES IN OBJECT-RELATIONAL DBMSs

Efficient Integration of Data Mining Techniques in Database Management Systems

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Multi-Resolution Pruning Based Co-Location Identification In Spatial Data

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Quality Assessment in Spatial Clustering of Data Mining

Indian Agriculture Land through Decision Tree in Data Mining

An Overview of Knowledge Discovery Database and Data mining Techniques

Graph Mining and Social Network Analysis

Introduction to Data Mining

Data Mining: A Preprocessing Engine

Determining optimal window size for texture feature extraction methods

An Overview of Database management System, Data warehousing and Data Mining

Search Result Optimization using Annotators

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

Big Data: Rethinking Text Visualization

Spatial Data Mining and University Courses Marketing

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Multi-dimensional index structures Part I: motivation

CubeView: A System for Traffic Data Visualization

Data Warehousing und Data Mining

Weka-GDPM Integrating Classical Data Mining Toolkit to Geographic Information Systems

SECONDARY STORAGE TERRAIN VISUALIZATION IN A CLIENT-SERVER ENVIRONMENT: A SURVEY

A Study of Web Log Analysis Using Clustering Techniques

The Scientific Data Mining Process

Integrating Pattern Mining in Relational Databases

Mapping Linear Networks Based on Cellular Phone Tracking

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Transcription:

The Art of Spatial Data Mining A Review of A New Algorithm for Discovery of Spatial Association Rules Expert System Prof. Glenn Shafer Fall 00 Anonymous

Abstract This paper is a literature review of a new algorithm for mining association rules in a spatial database. Spatial data mining, or knowledge discovery in large spatial databases, is the process of extracting implicit knowledge, spatial relations, or other patterns not explicitly stored in spatial databases. Recently, there has been a lot of research in data mining and these studies led to a set of interesting techniques, including methods for mining strong association and dependency rules, attribute-oriented induction for mining characteristic and discriminant rules, etc. Such studies set a foundation and provide some interesting methods for the exploration of highly promising spatial data mining techniques. Based on previous studies on spatial data mining and mining association rules in transaction-based databases, this paper will introduce and study an interesting method for mining strong spatial association rules in large spatial databases [6]. Discovery of spatial association rules may disclose interesting relationship among spatial and non-spatial data in large spatial database and thus it represents a new and promising direction in spatial data warehousing and spatial data mining. Basically the method that will be presented in this paper explores efficient mining of spatial association rules at multiple approximation and abstraction levels. It proposes first to perform less costly, approximate spatial computation to obtain approximate spatial relationships at a high abstraction level and then refine the spatial computation only for those data or predicates whose refined computation may contribute to the discovery of strong association rules. Such two-step spatial mining algorithm facilitates mining strong spatial association rules at multiple concept levels by a top-down, progressive deepening technique [6]. This method is based on the assumption that a user has reasonably good knowledge on what kind of rules he wants to find from the database, and that there exists good knowledge, such as concept or operation hierarchies, for spatial or non-spatial generalization. Such assumptions may rule out naive users and complex spatial databases with poorly understood structures. This paper is related to my current research topics of spatial database, spatial database warehouse modeling and spatial database mining techniques under the supervision of Prof. Adam and Prof. Atluri. I am doing a comprehensive survey right now to try to understand and organize what are the most recent theories and techniques in this area, and the algorithm presented in this paper is one of those that deserve more research efforts, in my opinion.

. Introduction to spatial data mining a. General introduction A spatial database stores a large amount of space-related data, such as maps, preprocessed remote sensing or medical imaging data, and VLSI chip layout data. Spatial databases have many features distinguish them from relational databases. They carry topological or distance information, usually organized by sophisticated, multidimensional spatial indexing structures that are accessed by spatial data access methods and often require spatial reasoning, geometric computation, and spatial knowledge representation techniques. Spatial data mining refers to the extraction of knowledge, spatial relationships, or other interesting patterns not explicitly stored in spatial databases. Such mining demands an integration of data mining with spatial database technologies. It can be used for understanding spatial data, discovering spatial relationships and relationships between spatial and non-spatial data, constructing spatial knowledge bases, reorganizing spatial databases, and optimizing spatial queries. It is expected to have wide applications in geographic information systems (GIS), geomarketing, remote sensing, image database exploration, medical imaging, navigation, traffic control, environmental studies, and many other areas where spatial data are used. However, extracting interesting and useful patterns from spatial databases is much more difficult than extracting corresponding patterns from traditional numeric and characterized data due to the complexity of spatial data types, spatial relationships, and spatial autocorrelation [8]. b. How to categorize spatial data mining based on the kinds of rules? Current spatial data mining is divided into three common fields: the first examines the classification of spatial datasets, the second applies the generalization of association rules to spatial co-location patterns, and the third focuses on detecting spatial outliers. Specifically, spatial data mining can also be categorized based on the kinds of rules to be discovered in spatial databases. A spatial characteristic rule is a general description of a set of spatial-related data. For example, the description of the general weather patterns in a set of geographic regions is a spatial characteristic rule. A spatial discriminant rule is the general description of the contrasting or discriminating features of a class of spatial-related data from other classes. For example, the comparison of two weather patterns in two geographic regions is a spatial discriminant rule. A spatial association rule, which will be discussed in this paper, is a rule that describes the implication of one or a set of features by another set of features in spatial databases. For example, a rule like most cities in Canada are close to the Canada-US border is a spatial association rule [6]. There have been some interesting studies related to the mining of spatial databases. In this paper, I will study the extension of the techniques for mining association rules in transaction-based databases. c. What is a spatial association rule? 3

A spatial association rule is a rule of the form A B, where A and B are sets of predicates and some of which are spatial ones. In a large database, many association relationships may exist, but some of them may occur rarely or may not hold in most cases. People are only interested in the association rules that occur very strongly, i.e., which occur frequently and hold in most cases. Due to this fact, the concepts of minimum support and minimum confidence are introduced. Informally, the support of a pattern A in a set of spatial objects S is the probability that a member of S satisfies pattern A, and the confidence of A B is the probability that pattern B occurs if pattern A occurs. A user or an expert may specify thresholds to confine the rules to be discovered as strong ones. For example, we may find that 9% of cities within British Columbia (BC) and adjacent to water are close to USA, which associates predicates is_a, within, and adjacent_to and spatial predicate close_to in the following format: is_a(x, city) within(x, BC) adjacent_to(x, water) close_to(x, USA). (9%) Although such rules are usually not 00% true, they have some nontrivial and valuable knowledge about spatial associations, and thus it is interesting to discover them from large spatial databases. Also, various kinds of spatial predicates can constitute a spatial association rule. Examples include distance information such as close_to and far_away, topological relations like intersect, overlap and disjoint, and spatial orientation like left_of and west_of. Since spatial association mining needs to evaluate multiple spatial relationships among a large number of spatial objects, the process could be quite costly. In this paper, an efficient method for mining spatial association rule is studied, with a topdown, progressive deepening search technique proposed. The technique firstly searches at a high concept level for strong patterns and implication relationships among the large patterns at a coarse resolution scale. Secondly, only for those large patterns, it deepens the search to lower concept levels (i.e., their lower level descendants). Such a deepening search process continues until no large patterns can be found. An important optimization technique is that the search for large patterns at high concept levels may apply efficient spatial computation algorithms at a coarse resolution scale, such as generalized close_to by using approximate spatial computation algorithm like R-trees or plane-sweep techniques operating on minimum bounding rectangles(mbr). Only the candidate spatial predicates, which are worth detailed examination, will be computed by refined spatial techniques. Such a multiple level approach saves much computation because it is very expensive to perform detailed spatial computation for all the possible spatial association relationships [6].. Some existing methods related to spatial data mining As we know, statistical analysis is widely used for data mining, thus it is reasonable to think using statistical techniques for spatial data mining. Actually, statistical spatial data analysis has been a popular approach to analyze spatial data. The approach handles numerical data well and usually proposes realistic models of spatial phenomena []. 4

However, it typically assumes statistical independence among the spatially distributed data, which is not true in reality since spatial objects are always inter-related. This assumption violates Tobler s first law of Geography: everything is related to everything else, but nearby things are more related than distant things. In other words, the values of attributes of nearby spatial objects tend to systematically affect each other. In spatial statistics, an area within statistics devoted to the analysis of spatial data, is called spatial autocorrelation, where researchers have created, adapted, and applied statistical techniques to spatial data. For example, in image processing and vision, Markov Random Field (MRF) is a popular model to incorporate context for image segmentation and classification. Another major approach in spatial data mining is to apply generalization techniques to spatial and non-spatial data to generalize detailed spatial data to certain high level and study the general characteristics and data distribution at this level. An attributeoriented induction method is quite popular and basically it generalizes data to high level concepts and describes general relationships between spatial and non-spatial data. Two algorithms were proposed: nonspatial-dominant generalization and spatialdominant generalization. The nonspatial-dominant generalization algorithm first performs attribute-oriented generalization on task-relevant nonspatial data describing the properties of spatial objects. In this step, numerical data can be generalized to ranges or descriptive high level concepts, and symbolic values to higher level concepts. By doing so, low level distinctive values may be generalized to identical high-level values, and such highlevel identical values among different tuples can be merged together with their spatial pointers clustered into one slot in the spatial attribute. Finally, the map consists of a small number of regions with high-level descriptions. On the other side, the spatial-dominant generalization first performs on query-related spatial data. Data are generalized using spatial data hierarchies such as geographic or administrative regions provided by users or hierarchical data structures such as quadtrees or R-trees. The generalized spatial entities cluster the related nonspatial data together. After generalization of spatial data, every region can be described at a high concept level by one or a set of predicates. Also, knowledge mining in image databases, which can be treated as a major type of spatial databases, has been studied recently. Method for the classification of sky objects and another method for recognition of volcanoes on the surface of Venus are studied, where classification trees were used to make final decisions. Finally, the spatial data mining techniques are closely related to traditional data mining methods in relational databases. In most cases, we first study mining algorithms for traditional cases, then apply them to spatial data to see if it is feasible or if it needs more modifications. 3. A new method for mining spatial association rules 5

a. Deep insights into spatial association rules Various kinds of spatial predicates can be involved in spatial association rules. They may represent topological relationships between spatial objects, such as disjoint, intersects, inside/outside, adjacent_to, covers/covered_by, equal, etc. They may also represent spatial orientation or ordering, such as left, right, north, east, etc, or some distance information, such as close_to, far_away, etc. For deep insights into the mining of spatial association rules, let s first introduce one formal definition. Definition: A spatial association rule is a rule in the form of: P Pm Q Qn (c%) Where at least one of the predicates is a spatial predicates, and c% is the confidence of the rule which indicates that c% of the objects satisfying the antecedent of the rule will also satisfy the consequent of the rule. Obviously, most people are only interested in the patterns that occur relatively frequently (with large supports) and the rules that have strong implications (with high confidence). The rules with large supports and high confidence are strong rules. Based on this, two kinds of thresholds: minimum support and minimum confidence can be introduces, which are set in advance by users or experts. Moreover, since many predicates and concepts may have strong association relationships at a relatively high concept level, the thresholds should be defined at different concept levels. For example, it is kind of difficult to find regular association patterns between a particular house and a particular beach, however, there may be strong association between many expensive houses and luxurious beaches. Therefore, it is expected that many spatial association rules are expressed at a relatively high concept level. To facilitate the specification of the specification of the primitives for spatial data mining, an SQL-like spatial data mining query interface, which is designed based on a spatial SQL, has been proposed to explain the following example, which is thoroughly studied in [6]: Example : Let the spatial database to be studied adopt an extended-relational data model and a SAND (spatial-and-nonspatial database) architecture, which is, it consists of a set of spatial objects and a relational database describing nonspatial properties of these objects. This example is confined to British Columbia (BC), a province in Canada, with the following database relations (tables) for organizing and representing spatial objects:. town (name, type, population, geo, ). road (name, type, geo, ) 3. water (name, type, geo, ) 4. mine (name, type, geo, ) 5. boundary (name, type, admin_region_, admin_region_, geo, ). Where the attribute geo represents a spatial object (a point, line, area, etc) whose spatial pointer is stored in a tuple of the relation (a row of the table) and points to a 6

geographic map. The attribute type of a relation is used to categorize the types of spatial objects in the relation. For example, the types of road could be (national highway, local highway, street), and the type for water could be (ocean, sea, inlets, lakes, rivers, bay, creeks). The boundary relation specifies the boundary between two Administrative regions. The omitted fields could be other pieces of information, such as the area of a lake and the flow of a river. Suppose a user is interested in finding within the map of British Columbia (BC) the strong spatial association relationships between large towns and other near_by objects including mines, country boundaries, water and major highways. The SQL query could be presented below: discover spatial association rules inside BC from road R, water W, mines M, boundary B in relevance to town T where g_close_to(t.geo, X.geo) and X in {R, W, M, B} and T.type = large and R.type = divided_highway and W.type in {sea, ocean, large_lake, large_river} and B.admin_region_ in BC and B.admin_region_ in USA In this query, a relation variable X is used to represent one of a set of four variables {R, W, M, B}, a predicate close_to(a, B) says that a spatial object A is close to another spatial object B, and g_closed_to is a predefined generalized predicate which covers a set of spatial predicates: intersect, adjacent_to, contains, close_to. Moreover, close_to is a condition dependent predicate and is defined by a set of knowledge rules, for example, if X is a town and Y is a country, then X is close to Y if their distance is within 80 mile, however, close_to between a town and a road will be defined by a smaller distance such as 5 miles. To facilitate mining multiple-level association rules and efficient processing, concept hierarchies are provided for both data and spatial predicates, defined as follows: (town(large_town(big_city, midium_city), small_town)( ) ) ). (water(sea(strait(george_strait, ), inlet( ), ), river(large_river(fraser_river, ), ), lake(large_lake(pkanagan_lake, ), ), ), ). (road(national_highway(routel, ), provincial_highway(highway_3, ), city_drive(hasting St., Kingsway, ), city_street(e_ st, Ave, ), ), ). Also, spatial predicates (topological relations) should be arranged into a hierarchy for computation of approximation spatial relations using coarse resolution at a high 7

concept level and refine the computation when it is confined to a set of more focused candidate objects. See the following for an example: g_close_to not_disjoint close_to intersects inside contains equal adjacent_to intersects covered_by inside covers contains b. A new algorithm for mining spatial association rules We examine how the data mining query posted in example is processed, which intuitively illustrates the method for mining spatial association rules. Firstly, the set of relevant data is retrieved by execution of the data retrieval methods of the data mining query, which extracts the following data sets whose spatial portion is inside BC: () towns: only large towns; () roads: only divided highways; (3) water: only seas, oceans, large lakes and large rivers; (4) mines: any mines; (5) boundary: only the boundary of BC and USA. Secondly, the generalized close_to (g_close_to) relationship between large towns and the other four classes of entities is computed at a relatively coarse resolution level using a less expensive spatial algorithm such as the MBR (minimum bounding rectangles) data structure and a plane sweeping algorithm, or R-tree and other approximation methods. The derived spatial predicates are collected in a g_close_to table, see table, which follows an extended relational model: each slot of the table may contain a set of entries. The support of each entry is then computed and those whose support is below the minimum support threshold, such as the column mine, are removed from the table. From the computed g_close_to relation, interesting large item sets can be discovered at different concept levels and the spatial association rules can be presented accordingly. Town Water Road Boundary Mine Victoria Juan_de_Fuca_Strait Highway_, US highway_7 Saanich Juan_de_Fuca_Strait Highway_, US highway_7 Prince_George Highway_97 Pentincton Okanagan_Lake Highway_97 US Alalla Table: The computed g_close_to relation 8

Since many people may not be satisfied with approximate spatial relationships, such as g_close_to, more detailed spatial computation are needed to performed to find the refined or precise spatial relationship in the spatial predicate hierarchy, thus we have the following refined computation which is performed on the large predicate sets, i.e., those retained in the g_close_to table. Each g_close_to predicate is replaced by one or a set of concrete predicates such as intersect, adjacent_to, close_to, inside, etc. Such a process results in Table. Town Water Road Boundary Victoria <adjacent_to, J.Fuca_Strait> <intersects, highway_>, <intersects, <close_to, US> Saanich <adjacent_to, J.Fuca_Strait> highway_7> <intersects, highway_>, <intersects, highway_7> <close_to, US> Prince_George <intersects, highway_97> Pentincton <adjacent_to, <intersects, <close_to, US> Pkanagan_Lake> highway_97> Table: Detailed spatial relationships for large sets Table forms a base for the computation of detailed spatial relationships at multiple concept levels. Based in this, the level-by-level detailed computation of large predicates and the corresponding association rules is presented. The computation starts at the top-most level and computes large predicates at this level. For example, for each row in the Table, i.e., for each large town, if the water attribute is nonempty, the count of water is incremented by one. Such a count accumulation forms -predicate rows (with k=) of Table 3 where the support count registered. If the support count of a row is smaller than the minimum support threshold, the row is removed from the table. Suppose the minimum support is set to 50% at level, a row whose count is less than 0 is removed. Similarly, the -predicate rows (with k=) are formed by the pair-wise combination of the large -predicates, with their support counts accumulated by checking against Table, and the rows with the count smaller than the minimum support will be removed. The same procedure applies to 3- predicates computation. Finally, the computation of large k-predicates results in Table 3. k large k-predicates set Count <adjacent_to, water> <intersects, highway> <close_to, highway> <close_to, us_boundary> 3 9 9 8 9

3 <adjacent_to, water>, <intersects, highway> <adjacent_to, water>, <close_to, us_boundary> <close_to, us_boundary>, <intersects, highway> <adjacent_to, water>, <close_to, us_boundary>, <intersects, highway> 5 3 6 Table3: large k-predicates sets at the top concept level (for 40 large towns in BC) Thirdly, spatial association rules can be extracted directly from Table 3. For example, since <intersects, highway> has a support count of 9, and <adjacent_to, water> and <intersects, highway> has count of 5, and 5/9 = 86%, we get the following association rule: is_a(x, large_town) intersects(x, highway) adjacent_to(x, water). (86%). Since we are only dealing with large towns, is_a(x, large_town) is added here in the antecedent of the rule. If we set the minimum confidence threshold at 90%, this rule would have been removed from the list of the association rules to be generated. Finally, after mining rules at the highest level of the concept hierarchy, large k- predicates can be computed in the same way at the lower concept levels, which are Table 4 and Table 5. And similarly, spatial association rules can be derived directly from these tables for detail level and 3. k large k-predicates set count 3 <adjacent_to, sea> <adjacent_to, large_river> <close_to, us_boundary> <intersects, provincial highway> <close_to, provincial highway> <adjacent_to, sea>, <close_to, us_boundary> <close_to, us_boundary>, <intersects, provincial highway> <adjacent_to, sea>, <close_to, provincial highway> <close_to, us_boundary>, <close_to, provincial highway> <adjacent_to, sea>, <close_to, us_boundary>, <close_to, provincial highway> 8 4 5 9 0 Table 4: large k-predicates sets at the second level (for 40 large towns in BC) K large k-predicate set count <adjacent_to, Georgia strait> <adjacent_to, fraser_river> <close_to, us_boundary> <adjacent_to, Georgia strait>, <close_to, us_boundary> 9 0 8 7 Table 5: large k-predicates sets at the third level (for 40 large towns in BC) 0

For example, the following two rules can be derived from these tables: is_a(x, large_towns) adjacent_to(x, seas) (5.5%: /40 towns) level is_a(x, large_towns) adjacent_to(x, George_strait) close_to(x, US). (78%) level 3 Notice that only the descendants of the large -predicates will be examined at a lower concept level, and the mining process stops at the lowest level of the hierarchies or when an empty large -predicate set is derived. The above rule mining process can be summarized in the following algorithm: Algorithm: mining the spatial association rules defined by Definition in a large spatial database. Input: a spatial database, a mining query, and a set of thresholds:. a database consists of 3 parts: a spatial database SDB containing a set of spatial objects; a relational database RDB describing nonspatial properties of spatial objects; and a set of concept hierarchies.. a query consists of 3 parts: a reference class S; a set of task-relevant classes for spatial objects C,, Cn; a set of task-relevant spatial relations. 3. two thresholds: minimum support and minimum confidence for each level l of description. Output: strong multiple-level spatial association rules for the relevant sets of objects and relations. Method: mining spatial association rules proceeds as follows: Step : Task_relevant_DB := extract_task_objects(sdb, RDB); Step : Coarse_predicate_DB := coarse_spatial_computation(task_relevant_db); Step 3: Large_coarse_predicate_DB := filtering_with_minimum_support(coarse_predicate_db); Step 4: Fine_predicate_DB := refined_spatial_computation(large_coarse_predicate_db); Step 5: Find_large_predicates_and_mine_rules(Fine_predicate_DB). Pseudo code: where LL[l] is the large predicate set table at level l, and L[l, k] is the large k-predicate set table at level l. The syntax procedure is similar to Pascal. () Procedure find_large_predicates_and_mine_rules(db); () for (i :=; L[i, ] 0 and i < max_level; i++) do begin (3) L[i,] := get_large predicate_sets(db,i); (4) for (k :=; L[i,k-] 0; k++) do begin (5) P k := get_candidate_set(l[i,k-]); (6) foreach object s in S do begin (7) P s := get_subsets(p k,s); {Candidates satisfied by s} (8) foreach candidate p P s do p.support++; (9) end; (0) L[i,k] := {p P k p.support minsup[i]}; () end; () LL[i] := U k L[i,k]; (3) output := generate_association_rules(ll[i]);

(4) end (5) end c. A discussion of the algorithm Firstly, we discuss the correctness of this method as we normally do for evaluating algorithms. This method discovers the correct and complete set of association rules in the following steps. At the beginning, a query processing process extracts all data that are relevant to the spatial data mining process based on the completeness and correctness of query processing. Then the method applies a coarse spatial computation method that computes the whole set of relevant data and thus ensures completeness and correctness. After that, it filters out those -predicates whose support is smaller than the minimum support. Then it applies a fine spatial computation method that computes predicates from a set of derived coarse predicates and thus still ensure the completeness and correctness. At last, the method finds the complete set of association rules at multiple concept levels based on the previous studies at mining multiple-level association rules. From the above descriptions, we can see that each step ensures and discovers the correct and completes set of association rules above the minimum support threshold. Secondly, a theorem is presented to show the time complexity/efficiency for this method. Let the average costs for computing each spatial predicate at a coarse and fine resolution level be Cc and Cf respectively, the worst case time complexity of step -5 is O(Cc * Nc + Cc * Nf + Cnonspatial), where Nc is the number of predicates to be coarsely computed in the relevant spatial data sets, Nf is the number of predicates to be finely computed from the coarse predicate database, and Cnonspatial is the total cost of rule mining in a predicate database, which we don t discuss in the paper. Thirdly, the spatial data mining algorithm developed above has the following major strength for mining spatial association rules as stated in [6]: ). Focused data mining guided by users query The data mining process is directed by a user s query that specifies the relevant objects and spatial association relationship to be explored. This not only confines the mining process to a relatively small set of data and rules for efficient processing but also leads to desirable results. ). User-controlled interactive mining Uses may control, usually via a graphical user interface, minimum support and confidence thresholds at each abstraction level interactively based on the currently returned mining results. 3). Approximate spatial computation: substantial reduction of the candidate set Less costly but approximate spatial computation is performed at an abstraction level first on a relatively large set of data which substantially reduces the set of candidate data to be examined in the future.

4). Detailed spatial computation: performed once and used for knowledge mining at multiple levels The computation of support counts at each level can be performed by scanning through the same computed spatial predicate table. 5). Optimization on computation of k-predicate sets and on multiple-level mining These two optimization techniques are shared with the techniques for mining other nonspatial multiple association rules. First, it uses the (k-) predicate sets to derive the candidate k predicate sets at each level, which is similar to the apriori algorithm. Second, it starts at the top-most concept level and applies a progressive deepening technique to examine at a lower level only the descendants of the large l-predicates. Furthermore, many variations and extensions of the method can be explored to enhance the power and performance of spatial association rule mining as follows: ). Integration with nonspatial attributes and predicates The relevant set of predicates are mainly spatial ones, such as close_to, inside. Such a process can be integrated with the generalization and association of nonspatial data. ). Mining spatial association rules in multiple thematic maps In principle, the method developed here can be applied to handle the spatial databases with multiple thematic maps. The rule mining process will be similar to the one presented above since the judgment of g_close-to(x, Y) or intersect(x, Y) can be performed by an approximate or detailed map overlay. The mining algorithm itself will remain intact. 3). Multiple and dynamic concept hierarchies This method can also deal with the cases when there exist multiple concept hierarchies or when the concept hierarchies need to be adjusted dynamically based on data distributions. For example, towns can be classified into large or small according to an existing hierarchy, coast or in-land according to their distance to the ocean, or southwest, southeast according to their geographic areas. Different characteristics will be discovered based on different hierarchies or their adjustments, which is similar to execute the same algorithm based on different knowledge bases. 4. Conclusions Basically, the algorithm presented in this paper discusses efficient mining procedures for spatial association rules, which explores techniques at multiple approximation and abstraction levels. It proposes first to perform less costly, approximate spatial computation to obtain approximate spatial relationships at a high abstraction level and then refine the spatial computation only for those data or predicates whose refined computation may contribute to the discovery of strong association rules. Such two- 3

step spatial mining algorithm facilitates mining strong spatial association rules at multiple concept levels by a top-down, progressive deepening technique. This method is based on the assumption that a user has reasonably good knowledge on what kind of rules he wants to find from the database, and that there exists good knowledge, such as concept or operation hierarchies, for spatial or non-spatial generalization. Such assumptions may rule out naive users and complex spatial databases with poorly understood structures or knowledge, which needs more studies in the future. References: [] Tom Barclay, Jim Gray and Don Slutz, Microsoft TerraServer: a spatial data warehouse, Proceedings of the 000 ACM SIGMOD on Management of data, pages 307-38. [] Peter Baumann, Web-enabled Raster GIS Services for Large Image and Map Databases, Proceedings of the ACM DEXA00, pages 870-874. [3] Wendolin Bosques, Ricardo Rodriguez, Angelica Rondon and Ramon Vasquez, "A Spatial Data Retrieval and Image Processing Expert System for the World Wide Web," st International Conference on Computers and Industrial Engineering, 997, pages 433-436. [4] Volker Coors, Volker Jung, Using VRML as an Interface to the 3D Data Warehouse, Proceedings of the third symposium on Virtual reality modeling language, 998, Page - 9. [5] Martin Ester, Hans-Peter Kriegel, Jorg Sabder, Knowledge Discovery in Spatial Databases, Invited Paper at 3rd German Conf. on Artificial Intelligence (KI 99), Bonn, Germany, 999.. [6] Jiawei Han, Krzysztof Koperski, Discovery of Spatial Association Rule in Geographic Information Databases, Proceedings of the Pacific-Asia conference on Knowledge Discovery and Data mining, 998. [7] Shashi Shekhar, Sanjay Chawla, Siva Ravadam Andrew Fetterer, Xuan Liu and Chang-tien Lu, Spatial Databases Accomplishments and Research Needs, IEEE Transactions on Knowledge and Data Engineering, Vol., No., 999. [8] N. Widmann, P. Baumann, Towards Comprehensive Database Support for Geoscientific Raster Data, Proceedings of ACM-GIS'97, Las Vegas/USA, November 997 4