An approach for fusing data from multiple sources to support construction productivity monitoring

icccbe 2010 Nottingham University Press Proceedings of the International Conference on Computing in Civil and Building Engineering W Tizani (Editor) An approach for fusing data from multiple sources to support construction productivity monitoring Anu Raj Pradhan Drexel University, USA Burcu Akinci Carnegie Mellon University, USA Abstract Existing research studies in construction management domain, such as productivity monitoring, cost estimation and project controls, have utilized multi-source data fusion to support various types of project management decisions. Multi-source data fusion is referred to a process of fusing data from multiple heterogeneous data sources. Data fusion approaches incorporated in the existing research studies within construction management domain are specific to a given task, such as labour productivity monitoring and defect detection. As a result, these data fusion approaches do not necessarily support tasks other than the ones they are designed for. In this paper, the authors describe a new multi-source data fusion approach to support construction productivity monitoring. The developed approach addresses the challenges associated with capturing dynamic user queries, identifying relevant data sources from a given set of available data sources, generating and executing correct sequence of steps to fuse relevant data sources to answer a given user query. A prototype system was developed to validate the generality of the approach based on the representative queries from construction project engineers, which are identified based on previous research studies. Keywords: construction, management, intelligent site, productivity monitoring, planning, data fusion 1 Introduction Construction productivity monitoring requires frequent analyses of ongoing construction activities at a job-site. Such productivity monitoring assists in assessing project s performance and enables to identify opportunities for improvement. Construction productivity monitoring is a challenging task due to evolving job-site conditions over time. Different factors that can affect the performance of a construction activity need to be assessed for productivity monitoring (Motwani et al., 1995). Such factors can be associated with construction-site, construction-process and design (Kiziltas and Akinci, 2009). For an excavation activity, the construction-site related factors could be soil type, weather and haul road characteristics (Smith, 1999). The construction-process related factors can be crew and material types, while the design-related factors can be depth of cut and area of excavated region (Kiziltas and Akinci, 2009). Data related to such different productivity-related factors need to be acquired and assessed to monitor construction productivity. It is often difficult to acquire data related to different factors from a single data source, since a single data source can only provide a sub-set of data items associated with different factors. For example, design-related information can be obtained from Computer-Aided Drawings (CAD), weather-related information can be acquired from online weather database and construction-site

related information can be obtained from foreman s time cards. In order to obtain integrated information comprising of different productivity-related factors, there is a need to fuse multiple data sources. Such process of fusing data from multiple data sources is called multi-source data fusion (Hall and Llinas, 1997). Multi-source data fusion has been leveraged in different diverse domains, such as engineering, medicine, and military (Hall and Llinas, 1997). In the construction engineering domain, multi-source data fusion has been used to monitor earth-moving operations (Kannan and Vorster, 2000) and labor productivity (Navon, 2005), identify possible defects and deviations during construction phase (Akinci et al., 2006), improve safety of construction crew (Teizer et al., 2007) and track construction materials at construction site (Song et al., 2006). Data fusion research studies within construction engineering domain, so far, have focused on addressing a specific research task (e.g., defect detection and tracking of materials) and they have failed to formalize a generic multi-source data fusion approach that can be used to support different project management tasks. On the other hand, there have been research efforts to propose generic data fusion frameworks that can be applied in multiple diverse domains. One of the pioneering generic data fusion frameworks is the data fusion model proposed by the Joint Directorate of Laboratory (JDL) at United States Department of Defense. The subsequent generic data fusion models (e.g., Dasarathy s fusion model and Omnibus fusion model) attempted to either improve on or extend JDL data fusion model (Dasarathy 1997; Hall and Llinas 1997). These generic fusion models don t provide specific details on the processes to be directly applied to support construction management tasks. Also, it is still yet to be determined to what extent such generic data fusion models support data fusion needs of construction management domain (Haas, 2006). In the research described in this paper, the authors have developed a formalized multi-source data fusion approach to support construction productivity monitoring. The developed formalism can potentially be extended to support other project management tasks besides construction productivity monitoring. Due to the space constraint, this paper will provide an overview of four main components, and the readers are encouraged to consult the references for detailed description. 2 Motivation In this section, the authors will discuss some of the challenges associated with multi-source data fusion for construction productivity monitoring. These challenges were identified based on a highway excavation case study conducted by the authors. The case study was conducted on a forty-month construction project with an estimated cost of $23 million. The project contained eight million cubic yards of bulk excavation activity. The authors interviewed a senior project engineer and fused different types of data sources, such as on-board instrumentation (OBI) payload, construction schedule, time cards, online weather data and United States Geological Survey (USGS) soil, based on productivity-related queries from the project engineer. The detailed discussion about the case study can be found at (Pradhan, 2009). The identified challenges associated with multi-source data fusion are as follows: 2.1 Users have different types of dynamic queries related to productivity monitoring. The project engineer had a predetermined set of queries in mind, when monitoring the productivity at a job-site. The additional queries were dynamically posed based on the results obtained from predetermined set of queries. The authors noticed that productivity analyses and monitoring is a dynamic process in which project engineers queries are difficult to be determined ahead of time. In addition to the dynamic nature of user queries, the queries related to productivity monitoring are different in terms of requiring analyses over time and space. For example, one of the user queries is related to understanding the hourly variation of payload productivity with respect night and day shifts,

while another user query is related to understanding the variation of payload productivity with respect to soil condition, which has spatial aspect. Hence, a general data-fusion process for construction productivity analyses should support analyses over time and space. 2.2 Different user queries need data from different combinations of multiple data sources. The data sources to be fused and final fused data differ depending on users queries. For instance, in the case study, in order to understand the impact of soil condition on payload productivity for the duration of specific activity, it was necessary to get and fuse data from a schedule, GIS soil database, time-card, and equipment OBI payload data sources. The final fused database contained payload data item, soil data item and activity start and finish dates. In another query, in order to understand the hourly weather effect on payload productivity for a specific activity, it was necessary to fuse data from weather, OBI payload and schedule data sources. The required data items in the fused data are payload data item, weather item (e.g., temperature) and activity-related data items. Given that timecard contained daily weather and payload information, time-card and schedule data sources would have been sufficient if daily weather effect on payload productivity, instead of hourly effect, were desired. Thus, depending on users queries and the specific details needed for the corresponding analysis, such as daily vs. hourly variations, the required data sources and final fused data will differ. 2.3 Multi-source data fusion process is conducted in multiple steps. To conduct multi-source data fusion, multiple steps are required. During the motivating case study three major steps were conducted when fusing data once a user query was captured and analyzed. First, data sources, which are applicable to a given query were identified, since the other data sources are not relevant and hence do not need to be fused. Second, relevant data from each applicable data source was extracted and transformed when necessary (when the required level of detail in a query did not match the level of detail in which a specific data item is stored). Finally, when all the relevant data items were extracted and transformed, they got merged to perform the final analyses and answer the project engineer s query. These all indicate that multi-source data fusion requires a sequence of steps that might further be decomposed into subtasks. The need for such a multi-step approach for data fusion was also highlighted by other research studies (Hall and Llinas 1997). The next section will discuss the developed data fusion formalism that has attempted to address the above challenges associated with productivity monitoring. 3 Data Fusion Approach The developed data fusion approach has three main components in order to: (a) capture dynamic user queries related to productivity monitoring, (b) identify applicable data sources based on user queries, (c) identify and execute data fusion steps to fuse applicable data sources. Figure 1 depicts an IDEF0 diagram of the approach. The approach takes a user query related to productivity monitoring as an input. Different user queries are captured with the help of a newly developed domain-specific query capture language (Pradhan 2009). The query language incorporates vocabulary items that are used to capture different aspects (e.g., maximum productivity, average productivity and factors affecting productivity) about user queries related to productivity monitoring. The reasoning mechanisms automatically analyses a user query to understand the required data items, and their levels of details (i.e., granularity of data) to answer a given user query. The list of required data items and their levels of details are used to identify applicable data sources that need to be fused.

Figure 1 Research Approach for Generating Fused Data A data fusion ontology has been developed to capture the characteristics of data sources, such as representation, reference system, level of detail and data items (described later). The developed reasoning mechanism, based on the graph-theoretic approach, utilizes the data fusion ontology to identify applicable data sources based on the data items and their levels of details from a user query.. After the identification of applicable data sources, the relevant data items, contained in the data sources, need to be extracted (and transformed, if required) to perform data merging. The motivating case study showed that data fusion processes are conducted in a sequence of steps (e.g., extraction, transformation and data merging). The automated planning-based algorithms used in the field of artificial intelligence are developed and adopted to generate a sequence of steps to fuse data sources. A sequence of such steps is termed as a data fusion plan. Once a data fusion plan is generated, the plan needs to be executed to generate fused data. A library of reasoning mechanisms related to data extraction, transformation and data merging is needed to assist in plan execution. 3.1 Capturing of, Representing and Reasoning Dynamic User Queries In general, there are two main categories of computer languages: (a) domain-specific and (b) domainindependent (e.g., Java and C) (Parr, 2007). The primary difference between domain-specific and domain-independent languages is that a domain specific language utilizes knowledge and terminologies of a given domain making a user comfortable in using such a language (Parr, 2007). Constructing a computer language requires defining a formal language specification, which is known as grammar. Grammar, which describes the syntax of a language, is a set of rules that are used to generate syntactically correct sentences. Backus-Naur Form (BNF) and its extension, such as Extended BNF (EBNF) and Augmented BNF (ABNF), are commonly used to define and represent grammar for domain specific languages. The authors have developed a domain-specific language to capture productivity-related queries using EBNF. A standard software tool, Another Tool for Language Recognition (ANTLR), was used for parsing and lexing user queries, instead of developing customized parser and lexer (Parr, 2007). The details of the query language are discussed in (Pradhan, 2009).

3.2 Representation and Reasoning of Available Data Sources Ontology is defined as an explicit specification of a conceptualization (Gruber, 1993). Generally, it represents a set of concepts and relationships among them within a domain. The ontology specification contains a vocabulary of terms, within which each term defines its meaning. Ontology has been used in various research areas, such as knowledge management, semantic web analysis, and data integration (Buccella et al., 2003). The developed ontology was created using an object-oriented modelling approach. In the ontology, a data source is represented as a DataSource class with the following attributes: (a) name (as String data type) to represent the name of a data source, (b) dataitem (as Vector data type) to capture a set of data items, and (c) fusiontype (as FusionType class) to capture representation, reference system and level of detail of a data source. The representation defines the data structure used to capture data. For example, spatial data source, such as USGS soil, uses geometric features (e.g., point, polygon) to represent different geographic regions (e.g., regions having different soil types). In temporal data source, time can be represented as either point (time-stamp of payload pickup time) or interval (start and end dates for an activity). The reference system defines the co-ordinate system of a given data source. A spatial data source can use either a geographic (e.g., WGS 84) or a projected co-ordinate system (e.g., Lambert Conical). Similarly, a temporal data source can use systems, such as Greenwich Mean Time and Eastern Standard Time, depending on the time zones. The level of detail defines the granularity of data. Temporal granularity can be minute, hour and day while spatial granularity can be defined in terms of geographic scale units (e.g., yard, feet and mile). The reasoning mechanism that uses the ontology has been developed to identify applicable data sources. The detailed discussion of the developed ontology and reasoning mechanisms can be found at (Pradhan, 2009). 3.3 Generation and Execution of Data Fusion Plan The authors developed and tested two types of planning algorithms, namely domain-independent (i.e., GraphPlan (Blum and Furst, 1995)) and domain-dependent (i.e., Hierarchical Task Network (Nau et al., 1999)), to generate a data fusion plan for a given query. The primary difference between domainindependent and domain-dependent planning algorithms is that a domain-dependent algorithm uses a hierarchical structure, which is explicitly defined by a domain expert, to decompose a complex task into simpler tasks while a domain-independent algorithm does not require a domain expert to define such a decomposition structure (Ghallab et al., 2004). To execute a data fusion plan for a given query, the authors identified that existing spatial and temporal reasoning mechanisms are limited in terms of dealing with data sources having different levels of details. Thus, several spatial and temporal reasoning mechanisms have been developed with a primary focus on dealing with data sources having different levels of details that need to be used to support construction productivity monitoring. The details about plan generation algorithms along with temporal and spatial reasoning mechanisms can be found at (Pradhan, 2009). 3.4 Validation The authors developed a prototype system to validate the data fusion formalism. Different productivity-related queries identified from previous research studies and the motivating case study was used to test the generality of the overall research approach. The test queries were related to different types of construction activities and different types of factors (i.e., design-related, construction-site and construction process) impacting productivity. The developed data fusion formalism was able to generate fused data for the queries used in the validation study.

4 Conclusion In the research described in this paper, the authors developed a data fusion formalism to fuse multiple data sources based on user queries to support construction productivity monitoring. The developed approach comprises of: (a) computer language to capture dynamic user queries, (b) data fusion ontology to capture the characteristics of a data source, and a reasoning mechanism to identify applicable data sources, (c) planning algorithms to generate data fusion plan, and (d) library of spatial and temporal algorithms to execute data fusion plan. Acknowledgements National Science Foundation support (CMS#0448170) is gratefully acknowledged. Any opinions, findings, conclusions or recommendations presented in this paper are those of authors and do not necessarily reflect the views of the National Science Foundation. References AKINCI, B., BOUKAMP, F., GORDON, C., HUBER, D., LYONS, C., and PARK, K., 2006. A Formalism for Utilization of Sensor Systems and Integrated Project Models for Active Construction Quality Control, Automation in Construction, 15(2), 124-138. BLUM, A., and FURST, M., 1995. Fast Planning Through Planning Graph Analysis, Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI 95), 1636-1642. BUCCELLA, A., CECHICH, A., and BRISABOA, N. R., 2003. An Ontology Approach to Data Integration, Journal of Computer Science & Technology, 3(2), 62-68. DASARATHY, B. V., 1997. Sensor fusion potential exploitation-innovative architectures and illustrative applications, Proceedings of the IEEE, 85(1), 24-38. GHALLAB, M., NAU, D., and TRAVERSO, P., 2004. Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc. GRUBER, T. R., 1993. A Translation Approach to Portable Ontology Specifications, Knowledge Acquisition, 5(2), 199-220. HAAS, C. T., 2006. A Model for Data Fusion in Civil Engineering, 13th EG-ICE Workshop 2006, Lecture Notes in Computer Science, Springer 2006, Ascona, Switzerland. HALL, D. L., and LLINAS, J., 1997. An Introduction to Multisensor Data Fusion, Proceedings of the IEEE, 85(1), 6-23. KANNAN, G., and VORSTER, M., 2000. Development of an Experience Database for Truck Loading Operations. Journal of Construction Engineering and Management, 126(3), 201-209. KIZILTAS, S., and AKINCI, B., 2009. Contextual Information Requirements of Cost Estimators from Past Construction Projects, Journal of Construction Engineering and Management, 135(9), 841-852. MOTWANI, J., KUMAR, A., and NOVAKOSKI, M., 1995. Measuring construction productivity: a practical approach, Work Study, 44(8), 18 20. NAU, D. S., CAO, Y., LOTEM, A., MU, H., and AVILA, O., 1999. SHOP: Simple Hierarchical Ordered Planner, Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 968-975. NAVON, R., 2005. Automated Project Performance Control of Construction Projects, Automation in Construction, 14(4), 467-476. OGLESBY, C. H., PARKER, H. W., and HOWELL, G. A., 1989. Productivity Improvement in Construction, McGraw-Hill. PARR, T., 2007. The Definitive ANTLR Reference: Building Domain-Specific Languages, The Programmatic Bookshelf, 1st Edition PRADHAN, A. R., 2009. An Approach for Fusing Data from Multiple Sources to Support Construction Productivity Analyses, PhD thesis, Department of Civil and Environmental Engineering, Carnegie Mellon University Pittsburgh, PA. SMITH, S. D., 1999. Earthmoving Productivity Estimation Using Linear Regression Techniques, Journal of Construction Engineering and Management, 125(3), 133-141. SONG, J., HAAS, C. T., and CALDAS, C. H., 2006. Tracking the Location of Materials on Construction Job Sites, Journal of Construction Engineering and Management, 132(9), 911-918. TEIZER, J., CALDAS, C. H., and HAAS, C. T., 2007. Real-Time Three-Dimensional Occupancy Grid Modeling for the Detection and Tracking of Construction Resources, Journal of Construction Engineering and Management, 133(11), 880-888.