MODELING, VISUALIZING, AND MINING HYDROLOGIC SPATIAL HIERARCHIES FOR WATER QUALITY MANAGEMENT INTRODUCTION

Size: px
Start display at page:

Download "MODELING, VISUALIZING, AND MINING HYDROLOGIC SPATIAL HIERARCHIES FOR WATER QUALITY MANAGEMENT INTRODUCTION"

Transcription

1 MODELING, VISUALIZING, AND MINING HYDROLOGIC SPATIAL HIERARCHIES FOR WATER QUALITY MANAGEMENT Michael P. McGuire Center for Urban Environmental Research and Education Aryya Gangopadhyay Department of Information Systems University of Maryland, Baltimore County Baltimore, MD ABSTRACT Water quality managers analyze data collected in the field to assess environmental conditions and enact policy based on water quality impairments identified in this analysis. Water quality is often based on measures of water chemistry and the health of biological communities. There are many factors spread across the landscape that contribute to water quality. This adds a spatial dimension to the problem. Furthermore, data are often analyzed based on aggregations of site level data to multiple hierarchies of watersheds. This paper presents a multidimensional data model which incorporates hydrological spatial hierarchies for the purpose of analyzing water quality data at multiple resolutions. The data model was implemented in a relational database management system and linked with a geographic information system to provide visual exploration of data across multiple levels within the spatial hierarchy. Data mining techniques such as classification and association rule generation were applied to data at multiple levels of the hydrologic spatial hierarchy. Classification was applied to predict the health of fish communities based on site habitat characteristics and measures of water chemistry. Association rules were developed to determine relationships between site characteristic and water quality variables and fish community health. The results of the classification and association rules were then compared across two levels of the hydrographic spatial hierarchy. INTRODUCTION The Federal Clean Water Act requires State Governments to assess and identify impaired water bodies. The identification of impaired water bodies is very much a data driven process. Environmental managers collect data on various measures which are used to determine water quality. Commonly used measures include water chemistry (nitrogen, PH, dissolved oxygen, etc ) and biological community health (fish index of biotic integrity and benthic index of biotic integrity). Data are typically collected at monitoring sites located on a stream reach. When a site is identified as impaired, the causes of the impairment are possibly numerous and can be located throughout the watershed which drains to the monitoring point. Thus when determining water quality, an environmental manager must take into consideration all characteristics of the monitoring site and its watershed. This creates an analysis problem where the manager must evaluate data at multiple spatial dimensions. These dimensions can be represented by a hierarchy of geographic primitives. For example, the monitoring site is a point located along a stream which is a line located within a stream network. The area which drains into a stream network is represented by a polygon. Furthermore watersheds can be grouped to form larger watersheds. There is a need to easily give managers access to water quality data at various levels within the hierarchy and to provide novel techniques for knowledge discovery across multiple spatial dimensions Data warehouse and online analytical processing (OLAP) systems allow users to explore data across a large number of dimensions. Since water quality managers typically analyze data on multiple spatial and temporal dimensions, the application of OLAP in this domain area is very much needed. Furthermore, there is a need for tools to mine water quality data at multiple spatial hierarchical levels such as monitoring site, stream, and watershed. This paper presents a multidimensional data model which incorporates hydrological spatial hierarchies for water quality data, demonstrates how the data warehouse can be implemented and linked with a geographic information system (GIS) to allow the exploration of data at multiple hierarchical levels, and applies classification and

2 association rule mining techniques at different levels within the hydrologic spatial hierarchy to test the impact of water quality variables on the Fish Index of Biotic Integrity. RELATED WORK Multidimensional data modeling and OLAP have been widely used for decision support in business applications (Kimball, 2002). Data in an OLAP system is conceptually represented by a multidimensional data cube where multiple dimensions within the data are represented along each axis of the cube and facts are the measures within the cube (Han and Kamber, 2001). Spatial dimensions such as city, state, or address are typically common in most applications. One area of study focuses on modeling spatial dimensions in data warehouses. Building on the entity relationship model, a number of studies incorporate extensive spatial dimensions in data warehouses for locationbased services (Jensen, et al., 2004 and Timko and Pedersen, 2004). Furthermore, it is possible to create a conceptual data warehouse model that includes spatial primitives such as points, lines and polygons and the relationships and hierarchies that are associated with each (Malinowski and Zim, 2004). There are a number of studies that focus on designing a spatial data warehouse schema where spatial dimensions are most commonly represented using a star schema and in cases where spatial hierarchies are normalized, a star/snowflake schema (Han and Kamber, 2001). A star schema gets its name from the fact that the dimension tables surround a single fact table which when modeled forms the shape of a star. The snowflake schema is similar to the star schema but the normalized dimension tables more resemble the shape of a snowflake. The cascaded star schema (Adam, et al., 2002) is based on a model that treats spatial dimensions as having multiple dimensions; thus each dimension forms a star of its own. A number of studies develop methods to integrate nonspatial OLAP with spatial databases to allow users to roll up, drill down, slice, and dice measures along multiple spatial dimensions (Stefanovic, 1997 and Shekhar, et al., 1999). Another area of research focuses on methods to aggregate spatial hierarchies (Zhou, et al., 1999 and Prasher and Zhou, 2004) and the selective materialization of spatial data cubes with regards to performance in a relational database Han, et al., 1997). One particular study proposes a method to improve system performance for the aggregation of spatial data in a data warehouse by using spatial index trees (Rao, et al., 2003). Another area of study is that of the integration of OLAP systems and geographic information systems GIS for the visualization of spatial data cubes. Spatial OLAP (SOLAP), the integration of geographic information systems (GIS) and OLAP, can improve knowledge discovery from spatial distributions and relationships by allowing users to explore multidimensional data through spatial visualization. There are a number of studies which apply SOLAP to decision support in a specific domain such as transportation planning (Shekhar, et al., 2001 and 2002) and public health (Scotch and Parmanto, 2005). However, there has not yet been any work that provides this functionality in the domain of water quality management. A great deal of work has been done in the field of data mining particularly in the areas of classification and association rules. It has been demonstrated that a number of these data mining techniques can be applied in data warehouse environments (Han, 1998). There have been a number of studies that develop methods for mining spatial data (Koperski and Han, 1995; Han, et al., 1997; and Chawla, et al., 2000). More specific to this study, techniques have also been developed to assign spatio-temporal assignment of the environmental factors which control the distribution of a living organism (Su, et al., 2004). This study will build on the related work by presenting a multidimensional data model for water quality monitoring which focuses spatial dimensional hierarchies of hydrologic features and allows water quality managers to explore the data warehouse through spatial visualization. Also classification and association rule data mining techniques will be tested at multiple levels within the hydrologic spatial hierarchy. HYDROLOGIC SPATIAL HIERARCHIES The hydrologic spatial hierarchy represented in the multidimensional data model for water quality management is depicted in Figure 1. This representation shows a nested hierarchy of spatial primitives. The lowest level of the hierarchy is represented by a point on a stream. The next level up is represented by a stream network. The highest level in the hierarchy is the watershed or the land area which drains into a stream network. The watershed can be further scaled up to larger watersheds or scaled down to smaller watersheds depending on the level of detail required for analysis.

3 There are many spatial datasets that represent hydrographic features at various scales. The United States Geological Survey s National Hydrography Dataset (NHD) Watershed is a comprehensive dataset for the conterminous United States containing stream features and watersheds at a number of hierarchical levels. The NHD will be used as a model for the hydrographic spatial hierarchy included in the multidimensional data model for water quality management represented in this paper. The NHD Spatial hierarchical levels are shown in Table 1. In the NHD, the hydrographic spatial hierarchy is represented by stream reaches and Monitoring Site various levels of watersheds termed hydrologic units. Hydrologic units are scalable so that it is possible to perform watershed analysis from National level scales to local level scales. Each hydrologic unit is identified by a Figure 1. Hydrologic Spatial Hierarchy. hydrologic unit code (HUC). As the resolution of the hydrologic unit increases, the HUC increases by two digits. Thus each level within the hierarchy is often referred to by the number of digits present in the HUC. The highest Table 1. NHD Hydrologic Units Hierarchy Name Digits in HUC Example Region 2 02 Chesapeake Bay Sub-region Susquehanna River Basin Lower Susquehanna River Sub-basin Patuxent River Watershed Patuxent River Sub-watershed Little Patuxent River level in the hierarchy is the two-digit HUC, termed the region level. For example, the drainage area for the Chesapeake Bay is HUC 02. The next level down in the hierarchy, the four-digit HUC, is called the sub-region level. The four-digit HUC is typically the drainage area for a major river such as the Susquehanna River (HUC 0205). The sub-region is further divided into the basin level identified by the six-digit HUC. The six-digit HUC represents individual basins associated with a major river such as the Lower Susquehanna River (HUC ). The next level down in the hierarchy is the sub-basin level identified by the eight-digit HUC. An example of the sub-basin level watershed is the Patuxent River (HUC ). The sub-basin level is further subdivided into watersheds identified by a ten-digit HUC. An example of a watershed is represented by a subdivision of the Patuxent River (HUC ). The lowest level watershed hierarchical level represented in the NHD is the sub-watershed level which is identified by the twelve-digit HUC. An example of the twelve digit watershed is the Little Patuxent River (HUC ). The stream network is also included as a feature in the NHD dataset and the individual reaches that make up the network represent the lowest level of the spatial hierarchy underlying the hydrologic units. The hierarchy presented here gives the water quality manager the ability to aggregate monitoring data and drill down to individual monitoring sites or to roll up to progressively larger watershed levels. The NHD hydrological unit hierarchy will be implemented in the multidimensional data model presented in the next section. Stream Network A MULTIDIMENSIONAL MODEL FOR WATER QUALITY MANAGEMENT The multidimensional data model presented in this section is designed to provide multi-resolution analysis for Maryland Biological Stream Survey (MBSS) data. The MBSS is an ongoing effort by the Maryland Department of Natural Resources to assess biological community health in Maryland s streams and address adverse affects to stream biology due to water chemistry conditions. Data for the MBSS are collected at predetermined monitoring sites located in streams throughout the state of Maryland. Data is collected on water chemistry, aquatic organisms, habitat, and land use. The implementation presented here focuses on data collected to evaluate fish communities. A multidimensional data model for water quality management must include a number of spatial and temporal dimensions for multi-resolution analysis. Multidimensional data models are typically designed using a star schema where dimension tables are related to a central fact table. Sometimes, when dimensions can be normalized, a star/snowflake schema is used. Spatial dimensional hierarchies typically can be normalized and can be represented

4 Figure 2. Data Warehouse for Maryland DNR Biological Stream Survey Data. using snowflaking. Figure 2 depicts a star/snowflake schema model of a data warehouse for the purpose of analyzing MBSS fish monitoring data. The fact table in this model consists of the primary keys of the dimension tables and various measurements of water chemistry, stream characteristics, indices of biological integrity, and fish species counts. The fact table includes measurements of water chemistry that are taken in the field and in the laboratory including water temperature, dissolved oxygen, laboratory PH, field PH, laboratory conductance, field conductance, acid neutralizing capacity, dissolved organic carbon, nitrate nitrogen, and sulfate. The fact table also contains quantitative habitat information that is collected when the site is surveyed. The habitat variables include instream habitat structure, epifaunal substrate, velocity/depth diversity, pool/glide/eddy quality, riffle/run quality, channel alteration, bank stability, embeddedness, channel flow status, shading, remoteness, aesthetic rating, number of woody debris, number of rootwads, riparian buffer width, maximum depth, stream gradient, average wetted width, average thalweg depth, average velocity, and stream flow. The fact table also includes information on fish sampling results such as the total number of fish species, total nongame fish weight, total gamefish weight, percent of fish with anomalies, and total counts for each species of fish. The fact table also has three indices of biotic integrity (IBI): Fish IBI, Benthic IBI, and Hilsenhoff IBI. These indices are calculated based on the number of native fish and benthic invertebrate species found at a site. The Fish IBI and Benthic IBI range from 1 (very poor) to 5 (good). The Hilsenhoff IBI ranges from 0 (good) to 10 (very poor). There are a number of spatial and temporal dimensions in the schema including the NHD spatial hierarchy. The NHD spatial hierarchy can be normalized based on one to many relationships between parent and child tables and therefore snowflaking is used to represent this dimension. The reach dimension, the lowest level in the NHD spatial hierarchy, ties the fact table to a particular section of stream. The FROM_ID and TO_ID attributes allow for the creation of relationships between connected stream segments. Using these attributes, it is possible to connect monitoring point locations along the stream network and calculate the approximate distance between them. The next level in the hydrographic spatial hierarchy is the sub-watershed, which represents the land area that drains to a particular point along a stream segment and is represented by a polygon feature. Since this is the lowest level aerial unit in the spatial dimensional hierarchy, other spatial distributions such as land cover, which have a major impact on water quality impairment are summarized at this level. The remainder of the hierarchy consists of progressively

5 larger aggregations of sub-watersheds to watersheds to sub-basins and so on. In this model, the NHD hierarchy does not extend to the sub-region and region levels because they are beyond the spatial boundaries needed for environmental managers in Maryland. The site characteristics dimension contains attributes representing the site location and the habitat conditions surrounding the monitoring site. MBSS sites are sampled twice, once in the spring, and once in the summer. Because of this, there are two sample date dimensions, a spring sample date dimension, and a summer sample date dimension. These two dimensions provide the ability for temporal analysis. However, because this paper focuses on analysis at multiple spatial hierarchies, the temporal dimension is not explored. There are three other spatial dimensions that are included in the schema. The physiographic province dimension represents the physiographic regions (e.g. coastal plain, piedmont plateau, blue ridge, etc ) that comprise the landscape of Maryland. The county dimension represents the twenty three Counties in Maryland. The MBSS study divides Maryland into east, central, and west regions. This is represented by the region dimension. BUILDING THE DATA WAREHOUSE The test implementation described here is based on a dataset extracted from the MBSS for the Patuxent River watershed (Figure 3). Before the data was ready to be loaded into the multidimensional schema, a number of steps had to be taken to 1) assign spatial hierarchical relationships to fact and dimension tables and 2) summarize spatial layers such as land use and impervious surfaces at the sub-watershed level. All spatial data preprocessing was performed in ArcGIS by Environmental Systems Research Institute (ESRI). In order to build the hydrologic spatial hierarchy, each monitoring site had to be assigned the ID of the stream layer in the NHD. This was done by performing a spatial join operation similar to that found in (Orenstein, 1986). Figure 4 illustrates the spatial join function. The function uses a spatial search algorithm to find the closest stream segment to the monitoring Figure 3. Patuxent River Watershed point. Once the nearest feature is found, the monitoring site is assigned the attributes of the nearest stream segment. This step is essential to creating the hydrologic SITE_ID STREAM_ID spatial hierarchy because the reach, as a part of the NHD, already has the ID of the sub-watershed included in it s attribute table. Stream (line) Other steps were taken to calculate the ID - 18 ID percentage of land use/land cover and impervious Monitoring Site Location (point) surfaces in each sub-watershed. This was necessary Figure 4. Spatial Join because of the importance of land use/land cover and impervious surfaces in water quality analysis (Sponseller, et al., 2001). The land use/land cover information was derived for each watershed by using the intersect function found in ArcGIS. The intersect function combines two GIS layers, in this case two polygon layers, so that the resulting layer is the product of the two layers and contains the attributes of both. Figure 5 shows a Watershed Intersection Land Use Figure 5. Intersect Function

6 conceptual illustration of the intersect function. Once the intersect s completed, land use/land cover values can summarized for each watershed. Another variable that needed to be added to each sub-watershed was the percentage of impervious surface. It has been shown that impervious surfaces have a negative impact on stream habitat (Wang, et al., 2001). Thus, the percentage of impervious surfaces in each watershed is an important factor to include in any analysis of water quality. The impervious surface layer used in this application was downloaded from the Chesapeake Bay Program ( The Chesapeake Bay Program impervious surface layer is in raster format. Because of this, a zonal statistics function was used to calculate the percent imperviousness for each watershed (Shekhar and Chawla, 2003). The zonal statistics function calculates summary statistics for pixel values within a zone. In this application, the zone is the sub-watershed boundary. Once the spatial relationships and layer information were added, the data was ready to be loaded into the data warehouse. Oracle 10g was used to implement the data warehouse model. Each table was converted into CSV format and imported into the schema using the SQL Loader extension in Oracle 10g. Then all relationship constraints were added. Once the relationships were in place, aggregations were created using the AVG, SUM, and GROUP BY functions in SQL Plus. Figure 6 shows an example GROUP BY query. The result of this query aggregates the monitoring point data to the sub-watershed level. For the purposes of this pilot implementation, the aggregations were only computed to the watershed level. CREATE TABLE AGG_SUBSHED AS SELECT NHD_SUBWATERSHED.SUBSHED_ID, AVG(TEMP_FLD) AS AVG_TEMP_FLD, AVG(DO_FLD) AS AVG_DO_FLD, AVG(PH_LAB) AS AVG_PH_LAB, AVG(PH_FLD) AS AVG_PH_FLD, FROM FISH_FACT, REACH, NHD_SUBWATERSHED WHERE FISH_FACT.REACH_ID = REACH.REACH_ID AND REACH.SUBSHED_ID = NHD_SUBWATERSHED.SUBSHED_ID GROUP BY NHD_SUBWATERSHED.SUBSHED_ID Figure 6. Example Aggregation Query. VISUALIZATION OF SPATIAL HIERARCHIES USING GIS A data warehouse can prove to be a very powerful tool for visualizing data across multiple dimensions. Combining a data warehouse with GIS can provide visualization of data within a spatial context. Therefore, when there are multiple spatial hierarchies in a data warehouse that can be linked to spatial data, the integration of the data warehouse with GIS tools provides a very effective tool for knowledge discovery. This section will focus on integrating the data warehouse with GIS and drilling down into data aggregated at multiple hydrologic spatial hierarchies. Water quality data is typically characterized and viewed by managers at the watershed level. A need exists for a tool to allow water quality managers to interactively view monitoring data at multiple levels of resolution. The following example integrates the data warehouse for water quality management with ESRI s ArcGIS. The data warehouse in Oracle 10g was linked to Figure 7. OLAP and GIS Watershed Level. ArcGIS via the Oracle ODBC Driver Figure 7 shows the Patuxent River NHD sub-basin which is divided into seven watersheds. This first view is the entry point into the exploratory analysis. Each watershed is color coded to show the Fish Index of Biotic Integrity (Fish IBI). At the watershed level, the water quality manager

7 Figure 8. OLAP and GIS Sub-watershed Level. can see that watershed 1 has a poor Fish IBI. To further investigate watershed 1, and gain a more in depth understanding of why the Fish IBI is rated as poor, the water quality manager can zoom in and display the data at the sub-watershed level. Figure 8 shows the sub-watersheds in watershed 1 along with the monitoring stations in each watershed. The water quality manager can learn a lot from this visualization. First it is easy to locate the sub-watersheds within watershed 1 that have a poor Fish IBI and where the monitoring sites with poor Fish IBI are located within the sub-watersheds. There are five monitoring sites in watershed 1, two in sub-watershed 4, two in subwatershed 3, and one in sub-watershed 1, that are causing the poor average Fish IBI in watershed 1. This will allow the water quality manager to focus on one site to determine what may be causing the fish population to be low in that area. Also, it is evident that there are no monitoring sites in sub-watershed 2. This might suggest to the water quality manager that because the streams in sub-watersheds 1 and 3 are degraded, there might be a need for a new monitoring site to be placed in watershed 2. GIS also allows the capability to visualize other spatial datasets that may have an impact on water quality. For example, it has been demonstrated that urban land uses often degrade water quality and fish habitat. The water quality manager can overlay the data derived from the data warehouse with land use data to visualize what land use or combination of land uses are contributing to the degradation of water quality at a particular monitoring site. Figure 9 shows the Buzzard Island Circle monitoring site and the surrounding land uses. There are a number of urban land uses that may be affecting the water quality of this site. For example, the site is located near a low density residential use and an institutional use. The activities associated with these land uses might be contributing factors to a very poor Fish IBI. This capability gives the water quality manager the ability to target specific causes to the water quality impairment. This can be very useful for pinpointing areas for more detailed monitoring and enforcement of environmental permits to make sure that individual land owners are not causing water quality degradation. Figure 9. Integration with Other Relevant Spatial Datasets: Land Use.

8 MINING DATA AT MULTIPLE HIERARCHICAL LEVELS The data warehouse is very effective in providing visualizations of data at multiple spatial hierarchies. Additionally, mining the data warehouse at multiple spatial hierarchies could provide opportunities for knowledge discovery to water quality managers. This section will compare results of data mining experiments performed at the site and sub-watershed levels in the hydrographic spatial hierarchy. A classifier was developed at the two hierarchical levels in an attempt to predict the Fish IBI based on site characteristics and water chemistry data. Then association rules were developed for the hierarchical levels to determine relationships between site characteristics, water chemistry, and Fish IBI. All experiments were performed using the Oracle Data Miner. Classification The goals of the classification experiment were first, to see if the Fish IBI rating could be predicted based on site characteristics and water quality measures and second, to compare the results of the classification at the site and sub-watershed hierarchy levels. A Naive Bayesian Classifier was first trained on the site level monitoring data. The model was configured to use Fish IBI Rating as the target variable and the active variables are listed in table 2. The model used a pairwise threshold of 0 and a singleton threshold of 0. Because of the small number of samples (82), the model was tested on the training dataset using cross validation. The same settings were used to build a Naive Bayesian Classifier on the subwatershed level data. The sub-watershed level data was also tested using cross-validation. The classifier that was trained and tested at the monitoring site level data had an accuracy of 53.6%. The confusion matrix for the site level classification is shown in Table 3. While slightly more than half of the instances were classified correctly, the confusion matrix suggests that the site characteristic and water quality variables are somewhat random in their ability to predict Fish IBI. When a Naive Bayesian Classifier based on the same target and active variables was trained and tested on the sub-watershed level data, the predictive accuracy decreased slightly to 52.6%. This suggests that applying a classifier to data aggregated at a higher level in the hydrologic spatial hierarchy can predict the average Fish IBI for that watershed with a level of accuracy that is comparable to lower levels in the hierarchy. Table 4 shows the confusion matrix for the sub-watershed level classification. It is evident that there is still a level of randomness in the data in terms of its ability to predict Fish IBI. In both the monitoring site level and sub-watershed level classifications, this has more to do with the level of randomness in the data and the complexity of predicting Fish IBI and the number of possible factors that control fish species richness. Table 2. Active variables in Naive Bayesian Classifier Variable Description AESTHET aesthetic rating ANC_LAB acid neutralizing capacity BANKSTAB bank stability CHAN_ALT channel alteration CH_FLOW channel flow COND_FLD in-situ conductance COND_LAB lab conductance DOC_LAB dissolved organic carbon DO_FLD dissolved oxygen EMBEDDED embeddedness EPI_SUB epifaunal substrate EPT_TAXA ephemerpotera, plecoptera, and trichoptera taxa richness INSTRHAB instream habitat MAXDEPTH maximum depth NG_WT total nongame fish weight NO3_LAB nitrate nitrogen NSPECFISH total number of fish species NUMROOT number of rootwads PER_ANOM percent fish with anomalies PH_FLD in-situ PH PH_LAB lab PH POOLQUAL pool/glide/eddy quality REMOTE remoteness RIFFQUAL riffle/run quality RIP_WID riparian width SHADING shading SO4_LAB sulfate TEMP_FLD in-situ temperature TGAM_WT total game fish weight VEL_DPTH velocity/depth diversity Table 3. Confusion Matrix for Site Level Data Fish IBI Rating fair good poor very poor fair good poor very poor ROWS = ACTUAL, COLUMNS = PREDICTED Table 4. Confusion Matrix for Sub-watershed Level Data Fish IBI Rating fair good poor very poor fair good poor very poor ROWS = ACTUAL, COLUMNS = PREDICTED Association Rules Association rules were generated for the monitoring site level data and the sub-watershed level data. The association rule model was built using the same attributes as the classification model shown in Table 2. Both models were configured to use a minimum support of 0.1 and a minimum confidence of 0.5. The maximum number

9 of items per rule was set to 2. To determine the attributes and values that are associated with Fish IBI, all rules between Fish IBI and any other attribute were retrieved with a minimum confidence of 75%. For the monitoring site level only two association rules were generated with a confidence of 75% or higher. The association rules for the site level are shown in Table 5. In mining the site level data, the water quality manager would only develop associations between Fish IBI and the total number of fish species. Table 5. Association Rules for Site Level Data If (condition) Then (association) Confidence Support total number of fish species = FIBI_RATING=good total number of fish species =0-4.6 FIBI_RATING=very poor Table 6. Association Rules for Sub-watershed Level Data If (condition) Then (association) Confidence Support channel flow= FIBI_RATING=GOOD embeddedness = FIBI_RATING=POOR instream habitat = FIBI_RATING=FAIR instream habitat =2-4.6 FIBI_RATING=VERY POOR total nongame fish weight = FIBI_RATING=GOOD NO3 laboratory =3.4-4 FIBI_RATING=FAIR percent fish with anomalies = FIBI_RATING=GOOD pool quality =6 FIBI_RATING=VERY POOR pool quality =8 FIBI_RATING=POOR remoteness =3 FIBI_RATING=POOR remoteness =8 FIBI_RATING=GOOD riffle/run quality =8 FIBI_RATING=GOOD riffle/run quality =13 FIBI_RATING=FAIR number of rootwads =2.6-3 FIBI_RATING=FAIR total number of fish species= FIBI_RATING=GOOD percent fish with anomalies = FIBI_RATING=GOOD percent fish with anomalies =0-1.4 FIBI_RATING=VERY POOR The association rules generated from data aggregated at the sub-watershed level were more numerous and had higher confidence levels. Table 6 shows the association rules generated for the sub-watershed level data that have a confidence of 75% or higher. In fact, fourteen out of a total of eighteen rules had a confidence of 100%. This suggests that when data is aggregated to the sub-watershed level, stronger associations can be determined between the site characteristics/water quality variables and Fish IBI. A possible explanation for this is that the effect of aggregating the data at the sub-watershed level removes noise from the dataset. Therefore, association rule mining at multiple levels within the hydrologic spatial hierarchy could prove to be very fruitful for knowledge discovery. Water quality managers can use the results from this type of analysis to generate and test hypotheses as to what factors control fish populations. CONCLUSIONS AND DISCUSSION The study and management of water quality can benefit greatly from data warehousing and data mining technologies. Modeling hydrographic spatial hierarchies in a multidimensional data model can be a very powerful tool for water quality managers. Furthermore, linking the data warehouse with GIS can provide spatial visualizations for rolling up and drilling down within the spatial hierarchy. The implementation described in this study was somewhat limited by the spatial extent of the test dataset and data was only aggregated to the subwatershed level. Further steps should include data from a much larger area such as the state of Maryland or the entire Chesapeake Bay region. This would allow for aggregation at higher levels within the hydrologic spatial hierarchy. Also, the star/snowflake schema model presented in this paper included data collected by one monitoring effort. The schema could be extended to create a fact constellation schema where data from other monitoring efforts are integrated into the data warehouse as multiple fact tables. These fact tables would share the dimensions in the hydrologic spatial hierarchy. This would provide a very powerful knowledge discovery tool and allow water quality managers to ask questions that would span across multiple monitoring efforts.

10 It was suggested that predicting the Fish IBI rating based on site characteristics and water quality measures could possibly be confounded by randomness in the dataset or the complexity in all the factors which determine fish species distributions. However, the classifier performed with a similar accuracy on both the monitoring site and sub-watershed levels. This suggests that predictive models can be applied at multiple levels within the spatial hierarchy without a significant degradation in accuracy. Some limitations may exist because of the small number of samples in the test dataset. Therefore, further experimentation is needed to characterize classifier accuracy at multiple levels within the hydrographic spatial hierarchy. Also, experimentation with data smoothing techniques such as discrete wavelet transformation could be applied to improve overall classifier accuracy. Association rule mining provided insight into possible attributes and their values that are associated with certain Fish IBI ratings. Association rules developed at the sub-watershed level were much more fruitful than those developed at the monitoring site level. A possible explanation for this is that aggregating data in the form of averages might remove noise from the dataset therefore generating more interesting association rules. Again, the size of the test dataset is a limiting factor in association rule mining. Further study should be undertaken to mine association rules from a much larger sample dataset across larger hydrologic spatial hierarchies. ACKNOWLEDGEMENTS This material is based upon work partly supported by the National Science Foundation under Grant No. BES to C. Welty and M. McGuire and by United States Environmental Protection Agency under grants R to A.J. Miller and CR to C. Welty. Although the research described in this article has been funded wholly or in part by the United States Environmental Protection Agency, it has not been subjected to the Agency's required peer and policy review and therefore does not necessarily reflect the views of the Agency and no official endorsement should be inferred. Special thanks to Matthew Rowe of the Maryland Department of the Environment and Martin Hurd of the Maryland Department of Natural Resources for their help in providing a water quality management foundation for this analysis. REFERENCES Adam, N., V. Atluri, S. Yu, and Y. Yesha. (2002). Efficient Storage and Management of Environmental Information. Proc. of the 19th IEEE Symposium on Mass Storage System and Technologies(MSST'02) Maryland. Bedard, Y. (2003). Integrating gis components with knowledge discovery technology for environmental health decision support. International journal of medical informatics, 70(1), Chawla, S., S. Shekhar, and W. L. Wu. "Modeling spatial dependencies for mining geospatial data: A statistical approach." University of Minnesota, Twin Cities, Han, J., K. Koperski and N. Stefanovic. (1997). GeoMiner: a system prototype for spatial data mining. Proceedings ACM SIGMOD International Conference on Management of Data Tucson, Arizona Han, J. (1998). Toward on-line analytical mining in large databases, ACM Sigmod Record, 27(1) Han, J. and M. Kamber. (2001). Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann Publishers. Jensen, C., A. Kligys, T. Pedersen, and I. Timko. (2004). "Multidimensional data modeling for location-based services." The VLDB Journal - The International Journal on Very Large Data Bases 13(1) Kimball, R. (2002). The Data Warehouse Toolkit. NewYork: John Wiley & Sons. Koperski, K. and J. Han. (1995) "Discovery of Spatial Association Rules in Geographic Information Databases." Proceedings of the 4th International Symposium on Advances in Spatial Databases Timko, I. and T.B. Pedersen. (2004). Capturing complex multidimensional data in location-based data warehouses. GIS '04: Proceedings of the 12th annual ACM international workshop on Geographic information systems. Washington, DC. Malinowski, E. and E Zim. (2004). "Representing spatiality in a conceptual multidimensional model." GIS '04: Proceedings of the 12th annual ACM international workshop on Geographic information systems Washington, DC. Orenstein, J. (1986) Spatial query processing in an object-oriented database system. Proceedings of the ACM SIGMOD, Washington D.C.,

11 Prasher, S., and X. Zhou. (2004) "Multiresolution amalgamation: dynamic spatial data cube generation." Proceedings of the 15th Australian Database Conference Dunedin, New Zealand: Rao, F., L. Zhang, X. L. Yu, Y, Li, and Y. Chen. (2003). Spatial hierarchy and OLAP-favored search in spatial data warehouse. DOLAP '03: Proceedings of the 6th ACM international workshop on Data warehousing and OLAP New Orleans, Louisiana. Scotch, M. and B. Parmanto. (2005). SOVAT: Spatial OLAP Visualization and Analysis Tool. Hawaii International Conference on System Sciences Waikoloa Village, Hawaii. Shekhar, S., C.T.Lu, X. Tan and S. Chawla. (1999). Map Cube: A visualization tool for spatial data warehouses. Proceedings of the NSF workshop on Data Mining in GIS Shekhar, S., C. Lu, S. Chawla and P. Zhang. (2001) "Data Mining and Visualization of Twin-Cities Traffic Data." University of Minnesota, Department of Computer Science and Engineering, Shekhar, S., C. T. Lu, R. Liu and C. Zhou. (2002) "CubeView: A System for Traffic Data Visualization." accessed from Shekhar, S. and S. Chawla. (2003). Spatial Databases: A Tour. Upper Saddle River, NJ: Prentice Hall. Sponseller, R. A., Benfield, E. F. and Valett, H. M. (2001). Relationships between land use, spatial scale and stream macroinvertebrate communities. Freshwater Biology 46(10) Stefanovic, N. (1997). Design and implementation of on-line analytical processing (OLAP) of Spatial Data. Simon Frasier University. Su, F., C. Zhou, V. Lyne, Y. Du and W. Shi. (2004). "A data-mining approach to determine the spatio-temporal relationship between environmental factors and fish distribution." Ecological Modeling Wang, L., Lyons, J., and Kanehl, P. (2001) Impacts of urbanization on stream habitat and fish across multiple scales. Environmental Management 28(2) Zhou, X., D. Truffet and J, Han. (1999) "Efficient polygon amalgamation methods for spatial OLAP and spatial data mining." Lecture Notes in Computer Science

A Brief Tutorial on Database Queries, Data Mining, and OLAP

A Brief Tutorial on Database Queries, Data Mining, and OLAP A Brief Tutorial on Database Queries, Data Mining, and OLAP Lutz Hamel Department of Computer Science and Statistics University of Rhode Island Tyler Hall Kingston, RI 02881 Tel: (401) 480-9499 Fax: (401)

More information

CubeView: A System for Traffic Data Visualization

CubeView: A System for Traffic Data Visualization CUBEVIEW: A SYSTEM FOR TRAFFIC DATA VISUALIZATION 1 CubeView: A System for Traffic Data Visualization S. Shekhar, C.T. Lu, R. Liu, C. Zhou Computer Science Department, University of Minnesota 200 Union

More information

IMPLEMENTING SPATIAL DATA WAREHOUSE HIERARCHIES IN OBJECT-RELATIONAL DBMSs

IMPLEMENTING SPATIAL DATA WAREHOUSE HIERARCHIES IN OBJECT-RELATIONAL DBMSs IMPLEMENTING SPATIAL DATA WAREHOUSE HIERARCHIES IN OBJECT-RELATIONAL DBMSs Elzbieta Malinowski and Esteban Zimányi Computer & Decision Engineering Department, Université Libre de Bruxelles 50 av.f.d.roosevelt,

More information

Chesapeake Bay FieldScope Activity Teacher Guide

Chesapeake Bay FieldScope Activity Teacher Guide Chesapeake Bay FieldScope Activity Teacher Guide About FieldScope FieldScope is a National Geographic pilot project, designed to encourage students to learn about water quality issues in their area. The

More information

Tracking System for GPS Devices and Mining of Spatial Data

Tracking System for GPS Devices and Mining of Spatial Data Tracking System for GPS Devices and Mining of Spatial Data AIDA ALISPAHIC, DZENANA DONKO Department for Computer Science and Informatics Faculty of Electrical Engineering, University of Sarajevo Zmaja

More information

CHAPTER-24 Mining Spatial Databases

CHAPTER-24 Mining Spatial Databases CHAPTER-24 Mining Spatial Databases 24.1 Introduction 24.2 Spatial Data Cube Construction and Spatial OLAP 24.3 Spatial Association Analysis 24.4 Spatial Clustering Methods 24.5 Spatial Classification

More information

A HYDROLOGIC NETWORK SUPPORTING SPATIALLY REFERENCED REGRESSION MODELING IN THE CHESAPEAKE BAY WATERSHED

A HYDROLOGIC NETWORK SUPPORTING SPATIALLY REFERENCED REGRESSION MODELING IN THE CHESAPEAKE BAY WATERSHED A HYDROLOGIC NETWORK SUPPORTING SPATIALLY REFERENCED REGRESSION MODELING IN THE CHESAPEAKE BAY WATERSHED JOHN W. BRAKEBILL 1* AND STEPHEN D. PRESTON 2 1 U.S. Geological Survey, Baltimore, MD, USA; 2 U.S.

More information

CONTINUOUS DATA WAREHOUSE: CONCEPTS, CHALLENGES AND POTENTIALS

CONTINUOUS DATA WAREHOUSE: CONCEPTS, CHALLENGES AND POTENTIALS Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 CONTINUOUS DATA WAREHOUSE: CONCEPTS,

More information

RESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE

RESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE RESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE WANG Jizhou, LI Chengming Institute of GIS, Chinese Academy of Surveying and Mapping No.16, Road Beitaiping, District Haidian, Beijing, P.R.China,

More information

GIS Databases With focused on ArcSDE

GIS Databases With focused on ArcSDE Linköpings universitet / IDA / Div. for human-centered systems GIS Databases With focused on ArcSDE Imad Abugessaisa g-imaab@ida.liu.se 20071004 1 GIS and SDBMS Geographical data is spatial data whose

More information

Data warehouses. Data Mining. Abraham Otero. Data Mining. Agenda

Data warehouses. Data Mining. Abraham Otero. Data Mining. Agenda Data warehouses 1/36 Agenda Why do I need a data warehouse? ETL systems Real-Time Data Warehousing Open problems 2/36 1 Why do I need a data warehouse? Why do I need a data warehouse? Maybe you do not

More information

Development of an Impervious-Surface Database for the Little Blackwater River Watershed, Dorchester County, Maryland

Development of an Impervious-Surface Database for the Little Blackwater River Watershed, Dorchester County, Maryland Development of an Impervious-Surface Database for the Little Blackwater River Watershed, Dorchester County, Maryland By Lesley E. Milheim, John W. Jones, and Roger A. Barlow Open-File Report 2007 1308

More information

Natural Resource-Based Planning*

Natural Resource-Based Planning* Natural Resource-Based Planning* Planning, when done well, is among the most powerful tools available to communities. A solid plan, based on good natural resource information, guides rational land-use

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

Fuzzy Spatial Data Warehouse: A Multidimensional Model

Fuzzy Spatial Data Warehouse: A Multidimensional Model 4 Fuzzy Spatial Data Warehouse: A Multidimensional Model Pérez David, Somodevilla María J. and Pineda Ivo H. Facultad de Ciencias de la Computación, BUAP, Mexico 1. Introduction A data warehouse is defined

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary

More information

II. OLAP(ONLINE ANALYTICAL PROCESSING)

II. OLAP(ONLINE ANALYTICAL PROCESSING) Association Rule Mining Method On OLAP Cube Jigna J. Jadav*, Mahesh Panchal** *( PG-CSE Student, Department of Computer Engineering, Kalol Institute of Technology & Research Centre, Gujarat, India) **

More information

Requirements engineering for a user centric spatial data warehouse

Requirements engineering for a user centric spatial data warehouse Int. J. Open Problems Compt. Math., Vol. 7, No. 3, September 2014 ISSN 1998-6262; Copyright ICSRS Publication, 2014 www.i-csrs.org Requirements engineering for a user centric spatial data warehouse Vinay

More information

A Design and implementation of a data warehouse for research administration universities

A Design and implementation of a data warehouse for research administration universities A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon

More information

Spatial Data Warehouse and Mining. Rajiv Gandhi

Spatial Data Warehouse and Mining. Rajiv Gandhi Spatial Data Warehouse and Mining Rajiv Gandhi Roll Number 05331002 Centre of Studies in Resource Engineering Indian Institute of Technology Bombay Powai, Mumbai -400076 India. As part of the first stage

More information

Maryland Biological Stream Survey

Maryland Biological Stream Survey Page 1 of 7 Maryland Biological Stream Survey Thumbnail Not Available Tags WADEABLE STREAMS, BENTHOS, WATER QUALITY, biota, environment, Biology, Ecology, Ecosystem, Environment, Indicator, Marine, Monitoring,

More information

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.

More information

Spatial Data Preparation for Knowledge Discovery

Spatial Data Preparation for Knowledge Discovery Spatial Data Preparation for Knowledge Discovery Vania Bogorny 1, Paulo Martins Engel 1, Luis Otavio Alvares 1 1 Instituto de Informática Universidade Federal do Rio Grande do Sul (UFRGS) Caixa Postal

More information

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the

More information

Continuous Spatial Data Warehousing

Continuous Spatial Data Warehousing Continuous Spatial Data Warehousing Taher Omran Ahmed Faculty of Science Aljabal Algharby University Azzentan - Libya Taher.ahmed@insa-lyon.fr Abstract Decision support systems are usually based on multidimensional

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

BUILDING OLAP TOOLS OVER LARGE DATABASES

BUILDING OLAP TOOLS OVER LARGE DATABASES BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,

More information

Guidance for Flood Risk Analysis and Mapping. Changes Since Last FIRM

Guidance for Flood Risk Analysis and Mapping. Changes Since Last FIRM Guidance for Flood Risk Analysis and Mapping Changes Since Last FIRM May 2014 This guidance document supports effective and efficient implementation of flood risk analysis and mapping standards codified

More information

Dimensional Modeling for Data Warehouse

Dimensional Modeling for Data Warehouse Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or

More information

SWAMP DATA MANAGEMENT PLAN

SWAMP DATA MANAGEMENT PLAN SWAMP DATA MANAGEMENT PLAN Station Template Surface Water Ambient Monitoring Program August 27, 2013 TABLE OF CONTENTS C. Data Entry...3 D. Field Data Entry...3 1. PROGRAMS NEEDED FOR SWAMP DATA ENTRY

More information

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex, Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex, Inc. Overview Introduction What is Business Intelligence?

More information

Analyzing Polls and News Headlines Using Business Intelligence Techniques

Analyzing Polls and News Headlines Using Business Intelligence Techniques Analyzing Polls and News Headlines Using Business Intelligence Techniques Eleni Fanara, Gerasimos Marketos, Nikos Pelekis and Yannis Theodoridis Department of Informatics, University of Piraeus, 80 Karaoli-Dimitriou

More information

What is GIS? Geographic Information Systems. Introduction to ArcGIS. GIS Maps Contain Layers. What Can You Do With GIS? Layers Can Contain Features

What is GIS? Geographic Information Systems. Introduction to ArcGIS. GIS Maps Contain Layers. What Can You Do With GIS? Layers Can Contain Features What is GIS? Geographic Information Systems Introduction to ArcGIS A database system in which the organizing principle is explicitly SPATIAL For CPSC 178 Visualization: Data, Pixels, and Ideas. What Can

More information

Report for 2003PA14B: Spruce Creek Watershed Keystone Project

Report for 2003PA14B: Spruce Creek Watershed Keystone Project Report for 2003PA14B: Spruce Creek Watershed Keystone Project There are no reported publications resulting from this project. Report Follows Abstract: This proposal seeks support for a graduate assistant

More information

Subject Description Form

Subject Description Form Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives

More information

Sustainability Brief: Water Quality and Watershed Integrity

Sustainability Brief: Water Quality and Watershed Integrity Sustainability Brief: and Watershed Integrity New Jersey depends on water resources for the health of our people, the strength of our economy, and the vitality of our ecosystems. The quality of our water

More information

DEVELOPMENT OF A SOLAP PATRIMONY MANAGEMENT APPLICATION SYSTEM: FEZ MEDINA AS A CASE STUDY

DEVELOPMENT OF A SOLAP PATRIMONY MANAGEMENT APPLICATION SYSTEM: FEZ MEDINA AS A CASE STUDY International Journal of Computer Science and Applications, 2008, Vol. 5, No. 3a, pp 57-66 Technomathematics Research Foundation, DEVELOPMENT OF A SOLAP PATRIMONY MANAGEMENT APPLICATION SYSTEM: FEZ MEDINA

More information

Oracle8i Spatial: Experiences with Extensible Databases

Oracle8i Spatial: Experiences with Extensible Databases Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction

More information

Distance Learning and Examining Systems

Distance Learning and Examining Systems Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

A Method Using ArcMap to Create a Hydrologically conditioned Digital Elevation Model

A Method Using ArcMap to Create a Hydrologically conditioned Digital Elevation Model A Method Using ArcMap to Create a Hydrologically conditioned Digital Elevation Model High resolution topography derived from LiDAR data is becoming more readily available. This new data source of topography

More information

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many

More information

INTEROPERABILITY IN DATA WAREHOUSES

INTEROPERABILITY IN DATA WAREHOUSES INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

Using R to Analyze Data from Probabilistic Monitoring in Oklahoma. Jean Lemmon

Using R to Analyze Data from Probabilistic Monitoring in Oklahoma. Jean Lemmon Using R to Analyze Data from Probabilistic Monitoring in Oklahoma Jean Lemmon Background 2000 - Ambient monitoring program known as the Small Watershed Rotating Basin Monitoring Program o 414 USGS 11-digit

More information

Mario Guarracino. Data warehousing

Mario Guarracino. Data warehousing Data warehousing Introduction Since the mid-nineties, it became clear that the databases for analysis and business intelligence need to be separate from operational. In this lecture we will review the

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Introduction to GIS (Basics, Data, Analysis) & Case Studies. 13 th May 2004. Content. What is GIS?

Introduction to GIS (Basics, Data, Analysis) & Case Studies. 13 th May 2004. Content. What is GIS? Introduction to GIS (Basics, Data, Analysis) & Case Studies 13 th May 2004 Content Introduction to GIS Data concepts Data input Analysis Applications selected examples What is GIS? Geographic Information

More information

Dr. Shih-Lung Shaw s Research on Space-Time GIS, Human Dynamics and Big Data

Dr. Shih-Lung Shaw s Research on Space-Time GIS, Human Dynamics and Big Data Dr. Shih-Lung Shaw s Research on Space-Time GIS, Human Dynamics and Big Data for Geography Department s Faculty Research Highlight October 12, 2014 Shih-Lung Shaw, Ph.D. Alvin and Sally Beaman Professor

More information

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining 1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining techniques are most likely to be successful, and Identify

More information

GIS Initiative: Developing an atmospheric data model for GIS. Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP)

GIS Initiative: Developing an atmospheric data model for GIS. Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP) GIS Initiative: Developing an atmospheric data model for GIS Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP) Unidata seminar August 30, 2004 Presentation Outline Overview

More information

Course Design Document. IS417: Data Warehousing and Business Analytics

Course Design Document. IS417: Data Warehousing and Business Analytics Course Design Document IS417: Data Warehousing and Business Analytics Version 2.1 20 June 2009 IS417 Data Warehousing and Business Analytics Page 1 Table of Contents 1. Versions History... 3 2. Overview

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

DATA WAREHOUSING - OLAP

DATA WAREHOUSING - OLAP http://www.tutorialspoint.com/dwh/dwh_olap.htm DATA WAREHOUSING - OLAP Copyright tutorialspoint.com Online Analytical Processing Server OLAP is based on the multidimensional data model. It allows managers,

More information

3. The submittal shall include a proposed scope of work to confirm the provided project description;

3. The submittal shall include a proposed scope of work to confirm the provided project description; QIN Shoreline Master Program Project Summary The Shoreline Master Program (SMP) development process for the Quinault Indian Nation (QIN) includes the completion of inventory and analysis report with corresponding

More information

Geodatabase Programming with SQL

Geodatabase Programming with SQL DevSummit DC February 11, 2015 Washington, DC Geodatabase Programming with SQL Craig Gillgrass Assumptions Basic knowledge of SQL and relational databases Basic knowledge of the Geodatabase We ll hold

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

Introduction. Introduction. Spatial Data Mining: Definition WHAT S THE DIFFERENCE?

Introduction. Introduction. Spatial Data Mining: Definition WHAT S THE DIFFERENCE? Introduction Spatial Data Mining: Progress and Challenges Survey Paper Krzysztof Koperski, Junas Adhikary, and Jiawei Han (1996) Review by Brad Danielson CMPUT 695 01/11/2007 Authors objectives: Describe

More information

CAPS Landscape Metrics November 2011

CAPS Landscape Metrics November 2011 CAPS Landscape Metrics November 2011 This appendix describes the landscape metrics available in CAPS. These metrics are weighted and combined separately for each community, using the community model listed

More information

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses Thiago Luís Lopes Siqueira Ricardo Rodrigues Ciferri Valéria Cesário Times Cristina Dutra de

More information

Data W a Ware r house house and and OLAP II Week 6 1

Data W a Ware r house house and and OLAP II Week 6 1 Data Warehouse and OLAP II Week 6 1 Team Homework Assignment #8 Using a data warehousing tool and a data set, play four OLAP operations (Roll up (drill up), Drill down (roll down), Slice and dice, Pivot

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

CRMS Website Training

CRMS Website Training CRMS Website Training March 2013 http://www.lacoast.gov/crms Coastwide Reference Monitoring System - Wetlands CWPPRA Restoration Projects Congressionally funded in 1990 Multiple restoration techniques

More information

Visual Data Mining in Indian Election System

Visual Data Mining in Indian Election System Visual Data Mining in Indian Election System Prof. T. M. Kodinariya Asst. Professor, Department of Computer Engineering, Atmiya Institute of Technology & Science, Rajkot Gujarat, India trupti.kodinariya@gmail.com

More information

A Seismic Data Management and Mining System

A Seismic Data Management and Mining System A Seismic Data Management and Mining System Sotiris Brakatsoulas and Yannis Theodoridis Computer Technology Institute, P.O. Box 1122, GR-26110 Patras, Greece http://www.cti.gr/rd3/dke Abstract. A Seismic

More information

CHAPTER 3. Data Warehouses and OLAP

CHAPTER 3. Data Warehouses and OLAP CHAPTER 3 Data Warehouses and OLAP 3.1 Data Warehouse 3.2 Differences between Operational Systems and Data Warehouses 3.3 A Multidimensional Data Model 3.4Stars, snowflakes and Fact Constellations: 3.5

More information

MINING CLICKSTREAM-BASED DATA CUBES

MINING CLICKSTREAM-BASED DATA CUBES MINING CLICKSTREAM-BASED DATA CUBES Ronnie Alves and Orlando Belo Departament of Informatics,School of Engineering, University of Minho Campus de Gualtar, 4710-057 Braga, Portugal Email: {alvesrco,obelo}@di.uminho.pt

More information

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker

More information

Facilitating Adaptive Management in the Chesapeake Bay Watershed through the Use of Online Decision Support Tools

Facilitating Adaptive Management in the Chesapeake Bay Watershed through the Use of Online Decision Support Tools Facilitating Adaptive Management in the Chesapeake Bay Watershed through the Use of Online Decision Support Tools Cassandra Mullinix, Scott Phillips, Kelly Shenk, Paul Hearn, Olivia Devereux Abstract The

More information

Institute of Natural Resources Departament of General Geology and Land use planning Work with a MAPS

Institute of Natural Resources Departament of General Geology and Land use planning Work with a MAPS Institute of Natural Resources Departament of General Geology and Land use planning Work with a MAPS Lecturers: Berchuk V.Y. Gutareva N.Y. Contents: 1. Qgis; 2. General information; 3. Qgis desktop; 4.

More information

CHAPTER 4 Data Warehouse Architecture

CHAPTER 4 Data Warehouse Architecture CHAPTER 4 Data Warehouse Architecture 4.1 Data Warehouse Architecture 4.2 Three-tier data warehouse architecture 4.3 Types of OLAP servers: ROLAP versus MOLAP versus HOLAP 4.4 Further development of Data

More information

New Approach of Computing Data Cubes in Data Warehousing

New Approach of Computing Data Cubes in Data Warehousing International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

4-06-35. John R. Vacca INSIDE

4-06-35. John R. Vacca INSIDE 4-06-35 INFORMATION MANAGEMENT: STRATEGY, SYSTEMS, AND TECHNOLOGIES ONLINE DATA MINING John R. Vacca INSIDE Online Analytical Modeling (OLAM); OLAM Architecture and Features; Implementation Mechanisms;

More information

Monitoring and Protection Program Recent Highlights and Future Directions

Monitoring and Protection Program Recent Highlights and Future Directions Monitoring and Protection Program Recent Highlights and Future Directions Water Quality Advisory Committee Meeting Harrisburg, PA December 10, 2014 Overview SRBC s Variety of Monitoring Projects Highlights

More information

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Part 22. Data Warehousing

Part 22. Data Warehousing Part 22 Data Warehousing The Decision Support System (DSS) Tools to assist decision-making Used at all levels in the organization Sometimes focused on a single area Sometimes focused on a single problem

More information

Towards a Logical Multidimensional Model for Spatial Data Warehousing and OLAP Marcus Costa Sampaio André Gomes de Sousa Cláudio de Souza Baptista

Towards a Logical Multidimensional Model for Spatial Data Warehousing and OLAP Marcus Costa Sampaio André Gomes de Sousa Cláudio de Souza Baptista Towards a Logical Multidimensional Model for Data Warehousing and OLAP Marcus Costa Sampaio André Gomes de Sousa Cláudio de Souza Baptista Information System Laboratory - LSI, Federal University of Campina

More information

1.7.0 Floodplain Modification Criteria

1.7.0 Floodplain Modification Criteria 1.7.0 Floodplain Modification Criteria 1.7.1 Introduction These guidelines set out standards for evaluating and processing proposed modifications of the 100- year floodplain with the following objectives:

More information

Understanding Raster Data

Understanding Raster Data Introduction The following document is intended to provide a basic understanding of raster data. Raster data layers (commonly referred to as grids) are the essential data layers used in all tools developed

More information

Implementing GIS in Optical Fiber. Communication

Implementing GIS in Optical Fiber. Communication KING FAHD UNIVERSITY OF PETROLEUM AND MINERALS COLLEGE OF ENVIRONMENTAL DESIGN CITY & RIGINAL PLANNING DEPARTMENT TERM ROJECT Implementing GIS in Optical Fiber Communication By Ahmed Saeed Bagazi ID# 201102590

More information

Develop and Implement a Pilot Status and Trend Monitoring Program for Salmonids and their Habitat in the Wenatchee and Grande Ronde River Basins.

Develop and Implement a Pilot Status and Trend Monitoring Program for Salmonids and their Habitat in the Wenatchee and Grande Ronde River Basins. Project ID: 35019 Title: Develop and Implement a Pilot Status and Trend Monitoring Program for Salmonids and their Habitat in the Wenatchee and Grande Ronde River Basins. Response to ISRP Comments A. This

More information

Business Intelligence and Process Modelling

Business Intelligence and Process Modelling Business Intelligence and Process Modelling F.W. Takes Universiteit Leiden Lecture 2: Business Intelligence & Visual Analytics BIPM Lecture 2: Business Intelligence & Visual Analytics 1 / 72 Business Intelligence

More information

What is OLAP - On-line analytical processing

What is OLAP - On-line analytical processing What is OLAP - On-line analytical processing Vladimir Estivill-Castro School of Computing and Information Technology With contributions for J. Han 1 Introduction When a company has received/accumulated

More information

Week 3 lecture slides

Week 3 lecture slides Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically

More information

Vulnerability Assessment of New England Streams: Developing a Monitoring Network to Detect Climate Change Effects

Vulnerability Assessment of New England Streams: Developing a Monitoring Network to Detect Climate Change Effects Vulnerability Assessment of New England Streams: Developing a Monitoring Network to Detect Climate Change Effects National Water Quality Monitoring Council 2012 Meeting Britta Bierwagen, National Center

More information

The UCC-21 cognitive skills that are listed above will be met via the following objectives.

The UCC-21 cognitive skills that are listed above will be met via the following objectives. Master Syllabus Department of Geography GEOG 265: Introduction to Geographic Information Systems Course Description Fundamentals of geographic information systems (GIS). How to visualize geographic information

More information

CHAPTER 5: BUSINESS ANALYTICS

CHAPTER 5: BUSINESS ANALYTICS Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse

More information

Fort Dodge Stormwater Master Planning. Prepared By: Ralph C. Stark, Jr., P.E., C.F.M. Joel N. Krause, P.E., C.F.M.

Fort Dodge Stormwater Master Planning. Prepared By: Ralph C. Stark, Jr., P.E., C.F.M. Joel N. Krause, P.E., C.F.M. Fort Dodge Stormwater Master Planning Prepared By: Ralph C. Stark, Jr., P.E., C.F.M. Joel N. Krause, P.E., C.F.M. Project Location Project Background Flooding History Localized flooding and storm sewer

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Using D2K Data Mining Platform for Understanding the Dynamic Evolution of Land-Surface Variables

Using D2K Data Mining Platform for Understanding the Dynamic Evolution of Land-Surface Variables Using D2K Data Mining Platform for Understanding the Dynamic Evolution of Land-Surface Variables Praveen Kumar 1, Peter Bajcsy 2, David Tcheng 2, David Clutter 2, Vikas Mehra 1, Wei-Wen Feng 2, Pratyush

More information

A Business Intelligence Training Document Using the Walton College Enterprise Systems Platform and Teradata University Network Tools Abstract

A Business Intelligence Training Document Using the Walton College Enterprise Systems Platform and Teradata University Network Tools Abstract A Business Intelligence Training Document Using the Walton College Enterprise Systems Platform and Teradata University Network Tools Jeffrey M. Stewart College of Business University of Cincinnati stewajw@mail.uc.edu

More information

Research On The Classification Of High Resolution Image Based On Object-oriented And Class Rule

Research On The Classification Of High Resolution Image Based On Object-oriented And Class Rule Research On The Classification Of High Resolution Image Based On Object-oriented And Class Rule Li Chaokui a,b, Fang Wen a,b, Dong Xiaojiao a,b a National-Local Joint Engineering Laboratory of Geo-Spatial

More information

Spatial Hierarchy & OLAP-Favored Search in Spatial Data Warehouse

Spatial Hierarchy & OLAP-Favored Search in Spatial Data Warehouse Spatial Hierarchy & OLAP-Favored Search in Spatial Data Warehouse Fangyan Rao IBM China Research Lab Nov 7, 23 DOLAP 23 Outline Motivation Spatial hierarchy OLAP-favored search Heuristic OLAP-favored search

More information

2002 URBAN FOREST CANOPY & LAND USE IN PORTLAND S HOLLYWOOD DISTRICT. Final Report. Michael Lackner, B.A. Geography, 2003

2002 URBAN FOREST CANOPY & LAND USE IN PORTLAND S HOLLYWOOD DISTRICT. Final Report. Michael Lackner, B.A. Geography, 2003 2002 URBAN FOREST CANOPY & LAND USE IN PORTLAND S HOLLYWOOD DISTRICT Final Report by Michael Lackner, B.A. Geography, 2003 February 2004 - page 1 of 17 - TABLE OF CONTENTS Abstract 3 Introduction 4 Study

More information

Remote Sensing and Land Use Classification: Supervised vs. Unsupervised Classification Glen Busch

Remote Sensing and Land Use Classification: Supervised vs. Unsupervised Classification Glen Busch Remote Sensing and Land Use Classification: Supervised vs. Unsupervised Classification Glen Busch Introduction In this time of large-scale planning and land management on public lands, managers are increasingly

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Objectives. Raster Data Discrete Classes. Spatial Information in Natural Resources FANR 3800. Review the raster data model

Objectives. Raster Data Discrete Classes. Spatial Information in Natural Resources FANR 3800. Review the raster data model Spatial Information in Natural Resources FANR 3800 Raster Analysis Objectives Review the raster data model Understand how raster analysis fundamentally differs from vector analysis Become familiar with

More information