Tracking System for GPS Devices and Mining of Spatial Data AIDA ALISPAHIC, DZENANA DONKO Department for Computer Science and Informatics Faculty of Electrical Engineering, University of Sarajevo Zmaja od Bosne bb, Kampus Univerziteta, 71000 Sarajevo BOSNIA AND HERZEGOVINA aida.alispahic@gmail.com, ddonko@etf.unsa.ba Abstract: - This paper presents one implementation of tracking system of GPS devices and describes solution for different issues during the design, such as detection of current position, communication to databases and storing the data. Once the warehouse is created the multidimensional data model - cube, is produced and OLAP operations are performed in order to obtain different statistics. Mining of spatial data is possible in order to produce predictable data important for decision support. One of the results is identification of congested areas and support for selecting the fastest route. Key-Words: - location based services; tracking system; data cube; data mining 1 Introduction One of the main problems we are facing today in the IT oriented business world is congestion with information system data. The issue is reflected in the inability to recognize useful knowledge hidden in these data. Advanced technology provides the ability to solve the aforementioned problem with the use of data mining. With the goal of presenting information in a manner familiar to the user, visual presentation of data is used more often these days. Visual presentation is mainly associated with the virtual space. Digitization of space is based on spatial data. It found it roots in relatively new technology - location-based services (LBS). Sometimes, there is wrong identification of LBS as geographical information systems (GIS) as they are two separate technologies. LBS became possible recently thanks to the fast development and wide acceptance of technologies such as mobile phones, Internet, global positioning system (GPS), which had not been developed at the time when GIS were first developed [1]. One the main aspect in LBS is finding the location of users using their mobile device. Users locations are considered as the spatial context in LBS applications. Addition to this, spatial data warehousing results from the convergence of two technologies, spatial data handling and multidimensional data analysis, respectively. In business data warehouses, the spatial dimension is increasingly considered of strategic relevance for the analysis of enterprise data [2]. In addition, navigation and tracking are very common features that improve business processes. Visual presentation of data to the users from overcrowded information systems, identification of useful knowledge from these systems, and navigation and tracking is usage of data mining and spatial data. There is a lot of research work in this area to properly design and implement applications that tracking GPS devices [3]. In this paper we have used the spatial data from two databases in order to create spatial data warehouses and multidimensional models of spatial data, and to manipulate the created model by On- Line Analytical Processing (OLAP) and Extraction, Transformation and Loading (ETL) process: publicly available databases used by OpenStreetMap application and database that contains data collected using the system for the tracking of GPS devices. The system is implemented with the use of mobile devices with integrated GPS. In this paper we address the following: The system for sending information about the current position of mobile device Recording and manipulation of spatial data through the storage of data Data mining of spatial data The second section of this paper gives a detailed description of the problem. The third section presents the model, shows the implementation of the solutions and lists some of the ideas for the possible improvement. The conclusion is given in the final fourth section of the paper. 2 Description of the Problem In the business world there is a need for determining the most popular, busiest and congested locations in a given area. The solution to this problem is based on ISBN: 978-960-474-317-9 100
the tracking systems. Tracking systems allow localization, sending and storing data of the object position on the Earth. The emergence of contextaware mobile environments and specially the implementation of LBS in those environments brings requirement for identifying current user's position and assigning a geographical context in which specific service is made available to the end user. In recent years spatial data, digital road maps, traffic routes and vehicle speeds are integrated and stored into the spatial temporal database systems. There are many different problems in this area, such as vehicle routing problem in the research fields of transportation and usage of many data mining algorithms such as k-means clustering under the constraints of visiting sequence [4]. There is increased need for combining GIS data with other disciplines in order to do survey data analysis and multi-disciplinary studies. Spatial intelligence enables efficient data integration for spatial and non-spatial data including fast and accurate location analysis [5]. In this paper are presented solutions for the following problems: detection of the object position, communication with the database sending of detected geographic coordinates and receiving the response on successful reception and insertion of data. After storing location data, we are facing following issues: exploitation of statistics from the corresponding data use of data mining algorithms in order to produce predictable data important for decision support. Mining of spatial data is concern of many authors. Spatial data mining is a new technique that performs extraction of implicit knowledge or other interesting patterns from large amount of spatial data. Many data mining systems work with data stored in flat files. It is proven that mining in a data warehouse results in more useful information. Main reason is because data are cleansed before they are stored into data warehouse. Addition to that, data warehouse provides data with different levels of summarization for the users, which will lead to fruitful data mining. However, current techniques of data warehouse cannot handle spatial data well. Because of that some other models are proposed, such as spatial data cube for data warehouse that can answer queries efficiently by selective materialization [6]. 3 Model This section presents the proposal for problem solution. For this conceptual realization it is necessary to have mobile devices with integrated GPS and support for Internet connection. Design is based on two implementations: Enabling mobile device to function as a detector and emitter of the GPS data Data mining of archived spatial data The enabling of mobile device to detect and transmit location data consider implementation of: detection connection to the database transfer of spatial data located in the base Data mining of archived spatial data takes following in consideration: storing of data detected by GPS tracking system usage of data mining algorithms presentation of real and predicted statistics To address the problem of storing data it is necessary to submit data to the following processes: cleansing, integration with other spatial data important for statistics creation of multidimensional data model extracting of statistics using OLAP (Online Analytical Processing) operations. The above mentioned processes present the base structure for data storage technology. The process of spatial data warehousing is the foundation upon which are created real and predictable statistics. Prerequisite for data mining is selection of the database on which we will perform data mining. These are already mentioned database: publicly available database of street names and so-called GPSdb database that contains geographical coordinates of the located mobile devices (GPS devices), length of their stay at the above coordinates, etc. The system for tracking of GPS devices has been created in order to populate GPSdb database with valid data. The system is implemented using Android mobile application with following implemented options: detection of the current location of the mobile devices and convert to the format of the geographical coordinate (latitude, longitude), connection to the online service that allows transmission of the coordinates to the GPSdb database sending the coordinates using the service to the database. ISBN: 978-960-474-317-9 101
After GPSdb database is populated with a sufficient amount of data for the extraction of statistics, it is possible to implement data mining procedures. Data mining procedure considers following: cleansing of GPSdb database and publicly available database used by Open Street Map applications (ETL processes), selection of appropriate storage architecture, integration of data from GPSdb and from publicly available database creation of a cube use of OLAP operations use of data mining algorithms presentation of real and predictable statistics Above mentioned steps of warehouse creation present two-layers warehouse architecture. Cleansing of the data is applied in order to remove noise and correct inconsistencies in the data. Data cleansing was conducted on the data taken from publicly available databases. Many frameworks are proposed for data quality assessment and cleansing tool for spatial data that integrates the spatial data visualization and analysis capabilities [7][8]. Through the process of cleansing affricates are replaced, to avoid the possibility of conversion thereof into illegible symbols. During the cleansing process of both databases, the corresponding data are adjusted so it is possible to compare and enable the joining of individual tables. A particular cleansing process in order to avoid data inconsistency is rounding decimal (real) data types in the same number of decimal places. This type of data belong latitude and longitude data. After the cleansing process, it is possible to integrate the database into the appropriate warehouse from which data will be drawn for display and analysis. While creating storage it is necessary to include an optimal amount of data, which requires the exclusion of certain tables, columns or even specific rows. The result of pre-processing of data is shown in the figure 1, which shows input from two existing databases before processing the databases used in the integration and warehouses as well as end-product integration. Fig.1. Existing input and analysed warehouse Once the warehouse has been created it is possible to create multidimensional data model - socalled cube, and perform OLAP operations on it in order to obtain statistics. Cube enables modeling and observation of data in multiple dimensions (street, time and date) on the basis of facts (the number of detection of the same devices on the same GPS coordinates). Fig. 2. Multidimensional data model three layers cube Using OLAP cube operations applied on the cube on the Fig.2. it is possible to extract some statistical data: Drill down / drill up and roll down / roll up operations provide insight into congestion, trafficability and number of visits in the streets, cities, municipalities daily, monthly ISBN: 978-960-474-317-9 102
or annually (aggregation of data from larger to smaller / from smaller to larger levels of hierarchy) - attached MDX query Slice operations provides insight into congestion, trafficability and number of visits of particular street, city or district (selection of the one dimension). Dice operations provides insight into congestion, trafficability and attendance of individual streets, cities or municipalities (selection of two or more dimensions). In order to obtain any information from multidimensional data model it is used language specially designed for this purpose: the MDX (Multidimensional Expressions) - a multidimensional query language for OLAP. Below is given MDX query with for the selection of the streets with highest traffic jam between 08:00 and 10:00 o clock. with member [Measures]. [max] as 'Max ([Street].[name]. [name], [Measures]. [Statistics Count]) ' select {[Measures]. [max]} ON COLUMNS, Filter ([Street]. [Name]. [Name], ([Measures]. [Statistics Count] = [Measures]. [Max])) ON ROWS from [Warehouse] where [Time]. [Time]. & [08:00:00] [Time]. [Time]. & [10:00:00] In addition to real statistics that can be obtained using OLAP operations, it is possible to apply the appropriate data mining algorithms in order to obtain predictable statistics. Using specific data mining algorithms can be obtained predictive statistics, which help in making strategic decisions. The algorithms are executed on the data contained in the multidimensional model. The data placed in the previously created cube enables: 1) by applying decision tree data mining algorithm: determine number of visits on particular area at particular period of time determine the trafficability on particular area at particular period of time determine the congestion on particular area at particular period of time 2) by applying linear regression data mining algorithms: prediction of the routes during the specific day period to particular position with minimal travel time prediction of the position in which there should be a specific device in a given time (if the mobile phone with GPS is used for the purpose of monitoring official vehicles) 3) by applying clustering data mining algorithms: determine fuel consumption in relation to the time and the vehicle (again if the mobile phone with GPS is used for the purpose of monitoring official vehicles) 4 Conclusion This paper presents the implementation of mobile applications with description how to enable tracking, store spatial data and how to manipulate and analyze them. Data collected and stored using mobile applications are combined with data from publicly available spatial databases while creating warehouse. From the data warehouse is created a multidimensional data model - the cube, on which are performed OLAP operations and apply various data mining algorithms. The processes applied over the cube are of the paramount importance for managers, planners and analysts, as they help them in making strategic decisions, determining directions for further development and execute o relevant analysis, statistics and conclusions with the aim of creating reports and business improvement. References: [1] Allan Brimicombe and Chao Li, Location- Based Services and Geo-Information Engineering, John Wiley & Sons, 2009 [2] David Taniar, Progressive Methods in Data Warehousing and Business Intelligence: Concepts and Competitive Analytics, IGI Global, 2009, [3] Manoharan, S., On GPS Tracking of Mobile Devices, ICNS '09. Fifth International Conference on Networking and Services,, April 2009, pp. 415-418. [4] Kawano, H., Applicability of multi-vehicle scheduling problem based on GPS tracking records, 18th International Conference on Geoinformatics, June 2010, pp. 1-4 [5] Bing She, Spatial data integration and analysis with spatial intelligence, 18th International Conference on Geoinformatics, June 2010 [6] Yuanzhi Zhang et al.,, Spatial data cube: provides better support for spatial data mining, ISBN: 978-960-474-317-9 103
IEEE International Geoscience and Remote Sensing Symposium, IGARSS '05. July 2005., Volume 2 [7] Tadakaluru, A. Mostafa, M., Andrew, K., Ernest, A., GeoExpert A Framework for Data Quality in Spatial Databases,, International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, 2005, pp. 557 561 [8] Xi-Qian Chen, Zhong-Xian Chi, Xiu-Kun Cao., Applying DP to ETL of spatial data warehouse, International Conference on Machine Learning and Cybernetics, Proceedings of 2004, pp. 1616-1619 vol.3 ISBN: 978-960-474-317-9 104