Proceedings. Software Technology. SoftTech May 29-31, 2012 Waterfront Airport Hotel and Casino, Cebu, Philippines

Transcription

1

2 Proceedings Software Technology SoftTech 2012 May 29-31, 2012 Waterfront Airport Hotel and Casino, Cebu, Philippines

3 Volume Editors: Chin-Chen Chang Feng Chia University, Taiwan Yvette E. Gelogo Hannam University, Korea Ronnie D. Caytiles Hannam University, Korea Copyright Science and Engineering Research Support society (SERSC) All rights reserved Copyright and Reprint Permissions: Abstracting is permitted with the credit to the source. Other copying, reprint request should be addressed to: SERSC Copyrights Manager, Head Office: 20 Virginia Court, Sandy Bay, Tasmania, Australia, Phone no.: , Fax no.: ISSN

4 Foreword Software Technology is areas that attracted many academic and industry professionals to research and develop. The goal of this conference is to bring together the researchers from academia and industry as well as practitioners to share ideas, problems and solutions relating to the multifaceted aspects of Software Technology. We would like to express our gratitude to all of the authors of submitted papers and to all attendees, for their contributions and participation. We believe in the need for continuing this undertaking in the future. We acknowledge the great effort of all the Chairs and the members of Editorial Committee of the above-listed event. Special thanks go to SERSC (Science & Engineering Research Support society) for supporting this conference. We are grateful in particular to the following speaker who kindly accepted our invitation and, in this way, helped to meet the objectives of the conference: Prof. Chin-Chen Chang, Feng Chia University, Taiwan. May 2012 Chairs of SoftTech 2012

5 Preface We would like to welcome you to the Regular Papers proceedings of Software Technology (SoftTech 2012) which was held on May 29-31, 2012, at Waterfront Airport Hotel and Casino, Cebu, Philippines. SoftTech 2012 is focused on various aspects of advances in Information Science and Industrial Applications. It provided a chance for academic and industry professionals to discuss recent progress in the related areas. We expect that the conference and its publications will be a trigger for further related research and technology improvements in this important subject. We would like to acknowledge the great effort of all the Chairs and members of the Editorial Committee. We would like to express our gratitude to all of the authors of submitted papers and to all attendees, for their contributions and participation. We believe in the need for continuing this undertaking in the future. May 2012 Chin-Chen Chang Ronnie D. Caytiles Yvette E. Gelogo

6 Organization General Chair Haengkon Kim, Catholic University of Daegu, Korea Program Chair Javier Garcia-Villalba, Universidad Complutense of Madrid, Spain Editorial Committee Muhammad Fiaz, NWPU, Xi'an, China Ruay-Shiung Chang, National Dong Hwa University, Taiwan Jin Wang, Nanjing University of Information Science & Technology, China Hsi-Ya Chang (Jerry), National Center for High Performance Computing, Taiwan J. H. Abawajy, Deakin University, Australia Sabah Mohammed, Lakehead University, Canada Wai-chi Fang, National Chiao Tung University, Taiwan Adrian Stoica, NASA JPL, USA Aboul Ella Hassanien, Cairo University, Egypt Dominik Slezak, Warsaw University & Infobright, Poland Gongzhu Hu, Central Michigan University, U.S.A Kirk P. Arnett, Mississippi State University, U.S.A Sankar Kumar Pal, Indian Statistical Institute, India Martin Drahansky, BUT, Faculty of Information Technology, Czech Republic Filip Orsag, BUT, Faculty of Information Technology, Czech Republic Samir Kumar Bandyopadhyay, University of Calcutta, Kolkata, India Tadashi Dohi, Hiroshima University, Japan Tatsuya Akutsu, Kyoto University, Japan Carlos Ramos, GECAD and ISEP, Portugal Muhammad Khurram Khan, King Saud University, KSA Hideo KURODA, FPT University, Vietnam Wenbin Jiang, Huazhong University of Science & Technology, china Tao Gong, Donghua University, China Byeongho Kang, University of Tasmania, Australia

7 Table of Contents Construction of DBMS for the Warehouse of Textile Enterprise Based on RFID Technology... 1 Ruru Pan, Jihong Liu, Weidong Gao, Hongbo Wang, Jianli Liu Reliability Research of Software System for Subsea Blowout Preventers... 7 Baoping Cai, Yonghong Liu, Zengkai Liu, Xiaojie Tian, Shilin Yu Adaptive User Interface Modeling Design for Web-based Terminal Middleware Sunghan Kim and Seungyun Lee Study on Improving Energy Saving for Old Buildings with Daily Energy Conserving Index Yan-Chyuan Shiau, Chi-Hong Chen Method of Unified Communications and Collaboration Service in Open Service Platform Based on RESTful Web Services Sunhwan Lim, Hyunjoo Bae Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen An NMS Architecture for Large-scale Networks Byungjoon Lee, Youhyeon Jeong, Hoyoung Song, Youngseok Lee Efficient Data Transmission Technique for Ubiquitous Healthcare Systems Yoon Hyun Kim, Jin Young Kim Specification and Detection of Code Smells using OCL Tae-Woong Kim, Tae-Gong Kim, Jai-Hyun Seu Temporal Distance based Particle Domain Selection Method for Large Scale Streaming System Jun Pyo Lee

8 Establishment of Fire Control Management System in Building Information Modeling Environment Yan-Chyuan Shiau, Chong-Teng Chang Development of Computer-Aided Medication Education for Drug Abuse Prevention Seong-Ran Lee Concurrent Computation for Genetic Algorithms Kittisak Kerdprasop, Nittaya Kerdprasop A Practical Context Awareness Information System for VANET based on IEEE Tzu-Kai Cheng, Jenq-Shiou Leu, Ing-Xiang Chen, Jean-Lien C. Wu, Fellow,IEEE, Zhe-Yi Zhu Evaluation of Assessment Tools for High-care Student Groups in Vocational High Schools Chen-Feng Wu, Chun-Ta Lin, Pei-Min Wang and Pei-Ru Wang Generating Test Cases for Cyber Physical Systems from Formal Specifications Lichen Zhang, Jifeng He and Wensheng Yu Reliable Integration of Exact and Approximated Arithmetic with Three-Valued Logic in Python Reeseo Cha, Wonhong Nam, Jin-Young Choi Experiment study of Android Software Test using Moment Invariants Algorithm Won Shin, Chun-Hyon Chang Parallel Simulation Testbed for Swarming MAVs Ge Li Formal Specification for Transportation Cyber Physical Systems Lichen Zhang, Jifeng He and Wensheng Yu

9 Linking Data for an Information Support System in Traditional Korean Medicine Hyunchul Jang, Yong-Taek Oh, Sang-Kyun Kim, Anna Kim, Sang-Jun Ye, Chul Kim, Mi-Young Song A Proof-of-Concept of D3 Record Mining using Domain-Dependent Data Yeong Su Lee, Michaela Geierhos, Sa-Kwang Song, and Hanmin Jung Web Taxonomy Fusion using Topic Maps-driven Ontological Concepts and Relationships Ing-Xiang Chen and Cheng-Zen Yang A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers Soohyun Yang and Yeonseung Ryu NC-based Interoperability Synergy Assessment Model Hyun-sik Son and Tae-gong Lee Performance Evaluation of Recursive Network Architecture for Fault-tolerance Minho Shin, Raheel Ahmed Memon, Yeonseung Ryu, Jongmyung Rhee and Dongho Lee Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages Soohyun Yang and Yeonseung Ryu Critical Health Monitoring with Unreliable Mobile Devices Minho Shin Confidence Metric in Critical Systems Minho Shin Survey on Simulation Framework for Intelligent Transportation Systems Hyungsoo Kim, Beomseok Nam, and Minho Shin

10 Current State of Capability-based System of Systems Engineering in Korea Ministry of National Defense Jae-Hong Ahn, Yeonseung Ryu and Doo-Kwon Baik Distributed and Self-adaptive Cluster-head Selection Algorithm for Hierarchical Wireless Sensor Networks Sai Ji, Liping Huang, Chang Tan, Jin Wang An architecture description method for Acknowledged System of Systems based on Federated Architecture Jae-Hong Ahn, Yeunseung Ryu and Doo-Kwon Baik Scenario Generation for Model Checking Operating Systems Nahida Sultana Chowdhury and Yunja Choi Detecting Accesses of First Data Races in Parallel Programs with Random Synchronization Hee-Dong Park, and Yong-Kee Jun Visualizing Data Races in Concurrent Signal Handlers Lin Gan, Guy Martin Tchamgoue, and Yong-Kee Jun An Automatic Parallelization Scheme for Simulink-based Real-Time Multicore Systems Systems Minji Cha, Seong Kyun Kim, and Kyong Hoon Kim Efficient Data Race Detection for Structured Fork-join Parallelism Ok-Kyoon Ha and Yong-Kee Jun Tracing Logical Concurrency for Dynamic Race Detection in OpenMP Programs In-Bon Kuh, Ok-Kyoon Ha, and Yong-Kee Jun

11 Construction of DBMS for the Warehouse of Textile Enterprise Based on RFID Technology Ruru Pan, Jihong Liu, Weidong Gao, Hongbo Wang, Jianli Liu School of Textile and Clothing, Jiangnan Univerisity, Lihu Street 1800, Wuxi, Jiangsu, China Abstract. The complex production processes and intermediate products make it difficult to use traditional DBMS (Database Management System) to precision manage the materials and products in the warehouse of the textile enterprise. To overcome the shortcomings of traditional DBMS, RFID (Radio Frequency Identification) technology is proposed to construct the management system for textile enterprise. The RFID tag and handheld RFID reader/writer used in the system are first introduced. Then the frame and function of the system are discussed with detail. The development requirements of the main modules in the system and the advantages of the new system are explained at last. Keywords: Textile enterprise, Warehouse management, Radio frequency, Reader/writer. 1 Introduction The production processes beginning with procurement of raw materials, including opening and cleaning, roving, spinning, weaving, are very complex in the textile enterprise. It needs to manage the raw material, semi-finished production and final products in the storage, such as cotton, yarn and fabric. In the traditional textile enterprise, the trained worker registers the materials and products everyday to roughly manage the warehouse. With the development of computer technology, some textile enterprises begin to establish DBMS for managing the storage. However, it stills can not satisfy the need of the enterprises as the information should be manually input into DBMS with keyboard. From the discussion, it can be seen that the existing management methods for materials and products can not meet the automation production requirement of textile enterprise. To solve this problem, a new DBMS is proposed based on RFID technology in the paper, which will realize the automation management for the textile warehouse. 1

12 2 RFID Technology RFID is short for radio frequency identification, and it is a non-contact automatic identification technology, which was developed in 1980s [1-2]. The RFID is based on the technology of radio communications, and combines the technologies of intelligent control, identification and network. Recently, the technology becomes mature and is applied in many industries, especially for the system management. RFID is a revolutionary breakthrough ERP (enterprise resource planning) and SCM (supply chain management) systems. Its precise management reaches out into the business activities of every aspect of production, storage, transportation, distribution, retail and other aspects of management. It makes the management become convenient which can not imagine in the past. It is hard to track single material or product in the past, but with the help of RFID, precise management of every aspect and every component can be easily realized. The management of the quality control, automation management and product lift cycle become very effective and convenient. A typical RFID system is consisted by RFID tags, RFID reader/writer and management system [3-4]. The encoded data stored in the RFID is used to label the objects. The information data is transmitted between the RFID tags and the reader/writer with radio frequency signals. Compared with the bar code, the advantages of RFID are listed as follows: (1)The electronic information can be transmitted from five-meter distance; (2)The electronic information can be read within the range of electromagnetic wave even there are obstacles; (3)A number of RFID tags can be identified in one time; (4)The information in the RFID tag can be rewritten; (5)The memory capacity of RFID tag is sufficient for storing the information of the labeled objects; (6) The RFID tag has the ability for anti-pollution and damage; (7) The RFID tag can use encryption technology to protect information. The DBMS proposed in this paper used non-contact two-way data communication to indentify and register the raw materials, semi-products and the final products with RFID technology. The database technology is adopted to manage the information in the textile enterprise. The flow information of materials and products can be monitored with the share data real-time. 3 Hardware of the System The hardware of system includes RFID tags and handheld RFID readers/writers. The two kinds of hardware are discussed briefly in the follow paragraph. 3.1 RFID Tag The RFID tags are installed in the bags to label the material, semi-product and final- 2

13 product in the textile enterprise. The corresponding information is then encoded into some RFID data and written into the RFID tags with RFID writer [5-6]. Fig. 1 RFID tags Handheld RFID readers are used to read the tags through the various aspects and processed to track the materials and products. It is convenient to find, check and compare the information. The workers can know the information of the materials and products by organizing the data through the DBMS. When searching of raw materials and products, the workers just take the RFID reader to scan through the relevant regions and can easily find out the needed things. The used RFID tag is shown in Figure 1. The type of the tag is DU9021 and the frequency is 80-90MHz. The memory capacity is 96 bits. The information stored in the tag can be read within 0.5 meter and about tags can be read in one second. 3.1 Handheld RFID Reader/Writer In the production processes, the trained workers use the handheld RFID reader/writer shown in Figure 2 to read, register and write the electronic information of the material and product. Display Screen Functional Key Keyboard Fig. 2 Handheld RFID reader/writer 3

14 It can be used for checking or finding certain material or product. The program of the RFID reader/writer is simple and each function is set a corresponding fast operation key. The workers can use the reader/writer after a little training. It is hard to confirm the information is obtained because of the noise in the textile enterprise. To overcome this, a red LED is installed on the side of the reader/writer. When the information stored in the RFID tag is read out by the reader, the LED gives red light. Figure 3 shows the side of reader/writer after modification. With the help of the LED, it is convenient for the workers to confirm that the RFID codes stored in the tag is successfully read out or the codes are successfully written into the tags. Red LED Fig. 3 One side of RFID reader/writer after modification 4 System Structure and Functions 4.1 System Structure A new warehouse management system for textile enterprise is established with RFID technology. The designed system includes the electronic tag reading system and database management software. The architecture of system is shown in Figure 4. In some cases, it is the Contact Volume Editor that checks all the pdfs. In such cases, the authors are not involved in the checking phase. RFID TAG Data Information Radio Frequency Signals Antennas RFID Reader/Writer RS232 Serial Port Microcomputer Database Management System Figure 4 Diagram of system structure 4

15 The electronic tag reading system is composed by three parts, RFID tag, RFID reader/writer (including the antenna) and the microcomputer. The RFID tag is installed in the bags of the materials and products. The information of the materials and products is then encoded into RFID codes which are stored into the tags by the RFID writer. The codes can be used to label the material and product. The RFID reader uses an external antenna to form an effective region to identify raw materials and products with electronic tags. The RFID reader/writer communicates with the microcomputer by RS232 serial port. Through the middleware, the RFID code in the tags can be automatically read out. The database management system software is composed by the modules of storage management, stock-out operation and inventory management. It realizes the management of raw material and products. It can be also used to track some needed information. 4.2 System Function In the production process, the workers use the RFID readers/writers to register the information of the materials and products into the RFID tags with RFID codes. When the materials or products are stocked in or out the warehouse, the workers use the handheld RFID reader/writer to read the codes and write the new codes into the tag. The operation realizes the information update real-time. At the same time, the update information is stored in the memory of the handheld reader/writer. Every day, before leaving the work, some specialized worker use the developed system to check and manage the material and product in the warehouse. The update information of the materials and products stored in the reader/writer are uploaded into the DBMS through the RS232 serial port. The worker then can use the DBMS to check and manage the existed things in the warehouse. The result can be printed and sent to certain administrative personnel. 5 Requirement of Main Modules The proposed system combines the software and hardware to achieve the desired functions. The reading and writing of the RFID codes of the tags is realized by the RFID reader/writer. The management of the materials and products is carried out by the DBMS installed the microcomputer. The need of recognition speed puts forward higher requirements for the developing the software in the microcomputer. The main demands are explained as follows: (1)Real-time; The information stored in the RFID tags should be read or written by RFID the reader/writer real-time, and the update information should upload into the system real-time. When reading a number of tags in one time, it needs to process the data in real time, otherwise, some data maybe lost. (2)Reliable; The information stored in the tags should be corrected read every time and the writer should correctly update the new codes into the tags. (3)Universal; The DBMS is written for a particular purpose. We should make the system can be facilitate used for other situations with make too many changes. We 5

16 just need to modify certain module in new situation. Borland C++ Builder 6.0 is used as the tool to construct the software in the platform of Microsoft Windows XP SP3. The database platform is Microsoft Access Conclusions The RFID technology is use to establish a DBMS for textile enterprise. By analyzing the need of the enterprise, a practical system is proposed. Compared with traditional management method, the new system based on RFID has followed advantages: (1) Non-contact identification technology; (2) Multi-tag identification technology improving data collection efficiency; (3) The RFID tags can be affixed in any location; (4)Facilitate to find or check the information of the materials and products realtime; (5)Managing the warehouse information in real time and improving production efficiency. Acknowledgments. The authors are grateful for the financial supported by the Fundamental Research Funds for the Central Universities (No.JUSRP21105), the National Natural Science Foundation of China (No ) and the National Natural Science Foundation of Jiangsu Province (No.BK ). References 1. Y. M. Xiang and S. H. Fan, Journal of Nanchang University (Nature Science), 29, 190 (2005). 2. J. Tan, Z. C. Zhao, W. He, X. H. Zhang and Z. Wang, Application Research of Computers, 23, 7 (2006). 3. Y. M. Wang, C. Li and J. M. Ma, Oil Field Equipment, 36, 71 (2007). 4. J. H. Liu, R. R. Pan, W. D. Gao and J. H. Liu, Cotton Textile Technology, 38, 551 (2010). 5. J. H. Liu, W. D. Gao, H. B. Wang, H. X. Jiang and Z. X. Li, Journal of the Textile Institute, 101, 925 (2010). 6. J. H. Liu, R. R. Pan, W. D. Gao, H. X. Jiang and H. B. Wang, Industria Textila, 61, 203 (2010). 6

17 Reliability Research of Software System for Subsea Blowout Preventers Baoping Cai, Yonghong Liu, Zengkai Liu, Xiaojie Tian, Shilin Yu College of Mechanical and Electronic Engineering, China University of Petroleum, Dongying, Shandong , China Abstract. In order to meet a high reliability requirement of subsea drilling, a redundant software system for subsea Blowout Preventers (BOP), including control logics, HMI programs, remote access and redundant databases are developed. The Bayesian networks for control logics, HMI programs and redundant databases are built and then the whole Bayesian networks are established. The quantitative reliability assessments are performed by using Netica software. The results show that the probability of software failure is 0.04%, which can meet the requirement of subsea drilling. The triple common cause failure should be paid more attention in order to improve the software performance. In addition, the control logics have the most important influences on software safety; the HMI programs have the least important influences; and the redundant databases are in between. Keywords: Software; Reliability; Bayesian networks; Subsea blowout preventers 1 Introduction Subsea Blowout Preventer (BOP) stack plays an extremely important role in providing safe working conditions for the drilling activities in ft ultradeepwater region [1]. Programmable Logic Controller (PLC) based triple modular redundancy system GE Fanuc Genius Modular Redundancy (GMR) is chosen to provide supervisory control and data acquisition due to the fact that the system can provide the tolerance against single component failures [2]. The operations of subsea BOP stack are performed totally by the software systems, including control logics, Human-Machine Interface (HMI) programs, remote access and redundant databases. The reliability of control software is of vital importance to the safety of subsea operations. Recently, Bayesian networks are more and more used in performance assessment of software, due to the fact that the model can perform forward or predictive analysis as well as backward or diagnostic analysis [3]. This work aims to research the reliability of software system for subsea BOP by using Bayesian network models. The paper is structured as follows: Section 2 describes software modules of subsea BOP, 7

18 including control logics, Human-Machine Interface (HMI) programs, remote access and redundant databases. Section 3 presents the Bayesian networks models for reliability analysis. Section 4 gives the analysis results. And Section 5 summarized the paper. 2 Software development 2.1 Subsea BOP control system A typical architecture of subsea BOP control system is shown in Fig. 1. A triple GMR system, consisting of three Series PLCs, is the kernel of the multiplex control system, which runs the control logics for subsea functions. Driller s computer, toolpusher s computer and work station provide for full control of the subsea BOP stack functions, and serves as primary, secondary and third control station, respectively. The three stations run the user-friendly HMI programs which are full of useful graphics and report tools. The database servers, Virtual Private Network (VPN) server and control stations are connected to the PLCs via dual redundant Ethernet. Dual Ethernet cards run in each device. The PLCs are connected to blue and yellow Subsea Electronic Modules (SEM) via Genius Bus. The two SEMs contain two sets of independent input and output subsystems. They control the blue pod and yellow pod, respectively. The VPN server is connected to Internet network through a third Ethernet card. The authorized operators in engineering offices, who has tunnel name, tunnel password, user name and user passwords, are permitted to remotely access the subsea BOP control processes through the VPN. 2.2 Control logics The control logics are developed using ladder language in Proficy Machine Edition Logic Developer (v.5.90), and all of the logics are downloaded to the three redundant PLCs. The control logics of subsea BOP system can work when at least one set of the logics works. Therefore, the three sets of control logics can be considered as parallel. The main logics are shown in Fig. 2. The first three rungs of control logics, when activated, ensure that only one PLC communicates with the HMI programs. The submodules of control logics, such as Emergency Disconnect System (EDS) modules given in fourth and fifth rungs, can be added subsequently. 2.3 HMI programs and remote access The driller, toolpusher and manager can monitor and control the subsea BOP stack system with HMI programs running in driller s computer, toolpusher s computer and work station, respectively. It is developed using the Proficy Cimplicity HMI/SCADA (v. 7.50) software. It is similar to control logics that, the operators can control the subsea BOP when at least one set of HMI programs work. Therefore, the three sets of HMI programs can also be considered as parallel. 8

19 The WebView function of Cimplicity HMI/SCADA makes authorized users can remotely view read-only points and alarm data for the project that is broadcasted to the web server through Internet. The broadcast session provides the means to broadcast a Cimplicity WebView screen to an unlimited number of users who can view it from remote locations. Therefore, the engineers in engineering offices can monitor the states and data of subsea BOP system remotely. For example, the subsea BOP stack screen and readback screen can be read through Internet by using IE browser. 2.4 Redundant databases All the vital information during the drilling should be saved in the database, which is created using Microsoft SQL Server The database redundant servers involving a primary monitoring server and a secondary Hot Standby server are configured using Cimplicity server redundancy function within the Workbench on the primary server. Each primary server has one secondary server, and it is essentially a mirror image of the primary server. The secondary server can not be a primary configuration node, and does not support any configuration functions. The operator accesses the database of primary server normally. Upon detection of failure of the primary server, the secondary server can assume control of data collection automatically, and allow user access with minimal loss of continuity. When the primary server comes back on line, control can be transferred back, and the secondary server will resume its backup role. Obviously, the two databases can be also considered as parallel. Fig. 1. Architecture of subsea BOP control system Fig. 2. Main logics of BOP control system 9

20 3 Bayesian networks modelling for reliability analysis 3.1 Bayesian networks modelling for control logics For the redundant software, common cause failure (CCF) has significant influences on the software performance. CCF is defined as the failure of more than one hardware or software due to the same cause for redundant systems. Experience has shown that it has a dominant impact on accidents [4]. In the Bayesian network shown in Fig. 4, different sources of shock are distinguished to model CCF of control logics. A shock from source A destroys logic A, a shock from source AB destroys logics A and B, and a shock from source ABC destroys logic A, B and C. Therefore, failure of logic A is the series of source A, AB and ABC. The system state of whole logics is the parallel of logics A, B and C due to the redundancy. The conditional probability tables are given in Fig. 4. It is noted that the values of 1 and 0 denote the logic failure and logic working, respectively. The prior probabilities of logic shocks from sources are obtained based on the experience of operators. 3.2 Bayesian networks modelling for HMI programs The Bayesian networks of HMI programs are similar to control logics except that they have different prior probabilities, as shown in Fig. 5. This is because both of control logics and HMI programs have triple redundant software structures as described above. Obviously, HMI programs have lower prior probabilities than control logics. Therefore, the failure probability of HMI programs (0.006%) is lower than that of control logics (0.024%). 3.3 Bayesian networks modelling for redundant databases The Bayesian networks of redundant databases are shown in Fig. 6. Although the failure of redundant databases has less influence on the safety of subsea drilling than control logics and HMI programs, the whole software is considered to be failed, once the redundant databases are failed, due to the fact that the control logics, HMI programs and redundant databases are integrated into a whole. 3.4 The whole Bayesian networks According to the description above, either of the control logics, HMI programs and redundant databases is failed, the whole software is failed. Therefore, the three parts can be considered to be series. After establishing the whole Bayesian networks, the quantitative reliability assessments of subsea BOP software are performed using Netica software. The software reliability can be evaluated via forward analysis, and the posterior probability for each event given the software failure is evaluated via backward analysis. The mutual information is also researched in order to assess the important degree of each event. 10

21 Fig. 3. Bayesian networks for (a) Control logics, (b) HMI programs, and (c) redundant databases 4 Results and discussions The graphical representation of software failure with prior probabilities is shown in Fig. 7(a). It can be seen that the probability of software failure is only 0.04%. The posterior probabilities of all the events given the software failure are shown in Fig. 7(b), and the values are given in the 4th column of Table 1. The mutual information and important degree sequence of all the parent nodes and the child node Software failure are given in the 5th and 6th columns of Table 1. It can be seen that Source of logic ABC, Source of logic A (B, C), Source of HMI ABC and Source of redundant database AB have significant influences on the probability of whole software failure. Therefore, the triple CCF for all of control logics, HMI programs and redundant database should be paid more attention when developing and running the software. The average values of mutual information and important degree sequence for each category of software are shown in 7th of Table 1. It can be seen that the control logics have the most important influences on software safety; the HMI programs have the least important influences; and the redundant databases are in between. Therefore, the control logics should be paid more attention when developing the software for subsea BOP system. Fig. 4 Graphical representations of (a) Software failure with prior probabilities and (b) Posterior probability given software failure 11

22 Table 1. Mutual information of prior and posterior probabilities for each event Software Modules Control logics HMI programs Redundant databases Events Prior Pro. Posterior Pro. Mutual Inform. S_Logic_A(B,C) 3.35% 20.70% S_Logic_AB(AC,BC) 0.10% 9.06% S_Logic_ABC 0.01% 25.10% S_HMI_A(B,C) 1.28% 3.90% S_HMI_AB(AC,BC) 0.07% 2.43% S_HMI_ABC 0.00% 7.53% S_RD_A(B) 0.85% 18.80% S_RD_AB 0.00% 6.28% Imp. Deg. Average value of MI Imp. Deg. 5 Conclusions A redundant software system for subsea BOP is developed, and the control logics, HMI programs, remote access and redundant databases are described in detail. The Bayesian networks for software system are established and the quantitative reliability assessments are performed. (1) The probability of software failure for subsea BOP is 0.04%, which can meet the requirement of subsea drilling. (2) The triple CCF for all of control logics, HMI programs and redundant database should be paid more attention in order to improve the software performance. (3) The control logics have the most important influences on software safety; the HMI programs have the least important influences; and the redundant databases are in between. References 1. API 16D.: Specification for Control Systems for Drilling Well Control Equipment and Control Systems for Diverter Equipment. 2nd ed. API, Washington, DC (2004) 2. Cai, B., Liu, Y., Liu, Z., Wang, F., Tian, X., Zhang, Y.: Development of An Automatic Subsea Blowout Preventer Stack Control System Using PLC Based SCADA. ISA Trans. 51, (2012) 3. Bobbio, A., Portinale, L., Minichino, M., Ciancamerla, E.: Improving the Analysis of Dependable Systems by Mapping Fault Trees into Bayesian Networks. Reliab. Eng. Syst. Saf. 71, (2001) 4. Hoepfer, V.M., Saleh, J.H., Marais, K.B.: On the Value of Redundancy Subject to Common- Cause Failures. Reliab. Eng. Syst. Saf. 94, (2009) 12

23 Adaptive User Interface Modeling Design for Webbased Terminal Middleware Sunghan Kim and Seungyun Lee 1 Standard Research Center, ETRI, Daejeon, Korea {sh-kim, syl}@etri.re.kr Abstract. This paper shows the progressing research results how user interface can model in adaptive way with terminal middleware for TV services. Current TV terminals are diverse and multiple types of devices, and service is also supporting advanced functionality toward users. In the situations, user interface design skills are getting more important to UI designers, manufacturers and service providers, too. Therefore, we reviewed current terminal middleware to allow UI module working in terminal, and then shows a relevant use scenario for service requirements. From the requirement, this paper proposes experimental UI model to support adaptive UI between different types of TV devices, e.g. mobile and set-top, for seamless-like TV UI service. Keywords: User Interface, Terminal Middleware, TV 1 Introduction TV services are now already launching with several types of service, from traditional TV services, e.g. satellite, terrestrial and cable to WebTV by PC. And recently, new kind of TV service, e.g. IPTV, is appeared on traditional TV service domains. TV service maybe becomes a diverging point between traditional TV and new era of TV services in standardization respects. TV service deploys new kinds of emerging TV services. These services are usually supporting high quality, differentiated data service to customer compared traditional TV service. New TV service can support user interface and interactive data service between TV service provider and users. And current TV devices are diverging from traditional TV model to mobile, Pad. From these backgrounds, recently TV screen are different size and capabilities, we need to consider how to design and support to user interface in an adaptive way to enable seamless user experience. Even though user interface is not new technology domain, but according to current advanced hardware interface performance, new approach is researching and experimenting with several systems to support advanced user interface. Also, relevant standardization activity is also developing in W3C, ISO/IEC JTC SC29 WG11(a.k.a MPEG), etc. In W3C, MBUI WG(Model-based User Interface Working Group) has relevant activity on the issue of user interface, and MPEG also has standardization achievement in the scope of user interaction domain. In this paper, section 2 describes general web-based terminal middleware for supporting UI module 13

24 in terminal, and in section 3 shows adaptive user interface modeling design and mechanism how to support UI in an adaptive way along the use scenario. 2 Web-based Terminal Middleware Generally, Web-based terminal middleware is a terminal middleware whose characteristic is that it has one central middleware which orchestrates various applications. This orchestrating middleware, generally called browser or user agent in W3C terminology, processes a structured document and an interpretive language, usually called script, to enable various services. Web-based terminal middleware enables basic, advanced interactive TV services for TV terminal device. It is required to review the general TV service requirements and architecture, as well TV terminal modes of devices. Web-based TV terminal middleware is needed to define the interfaces on IPTV terminal functional architecture and the structure of the presentation engine. Among interfaces, user interface module is necessary to support UI in device in an adaptive way. The presentation engine basically supports the markup, script processing and document object processing. TV terminal middleware is generally need hardware-agnostic. Being one type of implementation, the WBTM in the TV terminal device should provide various integrated functional blocks and APIs for the high-level services, which could be programmed in WBTM script or by other methods, to implement TV services with the integration of TV services such as VOD, linear TV and so on. TV services need to support seamless capabilities to users regardless of device types of model. Therefore, users can access the same TV service independently of different types of TV devices. Because of this fact, TV users and mobile users can share the same services/applications over different TV devices for some types of services, e.g. Facebook service, updated content notification, etc. So, TV service needs to support interoperability for web-based applications among different devices. Therefore, user interface is required in the web-based core engine, which is to enable to support coherent user experience such kinds of web applications over different TV devices. Figure 1 gives the layered view of WBTM architecture. Fig. 1. Generic Framework of Web-based Terminal Middleware 14

25 2.1 Scenario of Adaptive User Interface An example of adaptive user interface scenario is explained in figure 2. This figure illustrated how user demands the coherent and seamless UI service with his/her own devices. Now user has several devices around him/her, and service provider supports similar user coherent UI among devices. Terminal devices are in assumption loaded the web-based terminal middleware. Mr. Kim is relaxing and watching TV program through TV terminal UI at home. And then, he received an emergent call from office to attend meeting in other city. So, he has to go out for the meeting, after stopping the movie immediately. It takes one hour to move from home to the meeting place. So, he hopes to see the continuous contents while moving on bus with other mobile phone. When he gets on the bus, he opens the mobile phone and connects the service provider for the movie contents, again. And then, he find that mobile UI is familiar to him, for service provider supports seamless UI screen for TV service. Therefore he enjoyed and finished watching the video to the end. Fig. 2. Case for necessity of different user Interfaces Figure 3 shows the flow for the use case of different devices with media processing together. Here two kinds of devices are STB and mobile phone, and content provider supports the composite of services. Fig. 3. UI service flow between different terminals 15

26 3 Model Design of Adaptive User Interface 3.1 Analysis and model Research on UI is still studying and developing on the issue of user convenience. UI technology enables web-applications to use with user preference information together. Therefore user can find service more easily through comfortable UI and devices is required to design UI with the philosophy of user centric. UI research is related with modeling of context processing, which means user has interactions with devices. But, previous research is weakly deficient for sequential procedure in abnormal situations. It needs to define clear about description of interaction between user and device in specific procedure, and overcome the limit for the time-flow processing in abnormal completeness. Recent research is concentrated on building user profiles with XMLbased model and allow available user interface by user preference. AUI model is a model based on XML and user interaction. AUI model is available with asynchronous update of UI and change with Interactor class and Selection class for context and data processing. Some features of model is user interface is divided into abstract level and concrete level, abstract level is independent of platform and concrete level is possible working on various platform such as graphical touch-based smart phone, multi-modal device. As well, UI is considering in normal user and handicapped user in user perspective, and on multiple devices environment. Therefore, model makes it possible to use in appropriate web application model in TV service environment. To support adaptive user interface model is necessary to define diagram of class structure, which shows the relationship among entities on device. The class structure is below in figure 4. Interface ExternalFunction Presentation dialogexpression connections 1 0..* + name dialogmodel + condition ctt operator 1 1..* 1..* 1 elementary_conn datamodel dataobject event grouping 0..* 1 1..* interator_composition conditional_conn + id complex_conn hierarchy_value 1..* 1 + id + presentation_name + orderaing_value 1 + interator_id handler + hierarchy + parameter_name - <<contains>> 1..* 1 1..* + ordereing 1..* Interactor second_expression - <<contains>> bool-operator 1 1..* + name 1 1 cond relation 0..* + parameter_value DataItem DataInterator TriggerInterator + target_presentation_name - systemprovided + target_composition_operator + ref[0...1] 0..* + presentation_name + maxcardinality : Integer[0...*] - userprovided + mincardinality : Integer[0...*] 0..* + defaultvalue : object[0...*] 0..* Command Navigator 0..* 0..* 0..* + triggersconstraintchecking : Boolean Input Output Selection + ordercriteria + iscontinuos + start + end + step + isexpandable : Boolean Fig. 4. UI service flow between different terminals 16

27 Table 1. Class name and roles Connections Class name Grouping, relation Interactor Control Selection Edit Grouping Relation Composite Description Repeater Data Model Event Model Dialog Model Interactor Trigger Interactor - Command Descriptions Show that being of next presentation after user interaction elementary connection, complex connection, conditional connection Two type of user interaction components Target of user interaction or target of only output Category of user interaction: selection, edit, control, interactive description, etc only output interactor : object, description, feedback, alarm, text, etc navigator, activator Selection of user predefined list On number, Interactor selects connection User can edit manually on Interactor, text(textedit), number(numberedit), position(positionedit),generic object(objectedit) Interactor element group Group having relationship each other Group display for mixed of Descriptionand Navigator element Content repeat for general data source Data type for interface Interactor status update data model define for XML Schema Definition UI update status for each interactor Presentation interaction for event in time. CTT operator for fime relationship DataInteractor for UI input/output or Trigger Interactor for UI command Selection, Input, Output Data Interactor Command interaction unit update in UI Navigator 3.3 Client-Server Mechanism Figure 5 shows the flow diagram between client and server. User interface receives input signal from main display, in fact through internal input Class. After receiving data input, DataInteractor Class updates to Interactor s status or various UI status by changing data element value. Through this processing, Server is saving the updated user profile information, and continues to process the Interactor-Presentation-Interface class in order, then display on user device. Other procedure also can be modeled in client server model. 17

28 Fig. 5. Example of Client-Server interaction: User data registration 6 Summary This paper described on progressing research result of UI for TV service in the environment of multiple devices by user. General terminal middleware is described for the platform of UI modeling, which is targeting for seamless UI service. And, UI modeling is analyzed for adaptive UI design for client-server architecture. Still further research is going and necessary for further research. Acknowledgements. "This research was supported by the ICT Standardization program of MKE(The Ministry of Knowledge Economy)" References 1. ITU-T IPTV, 2. ATSI, 3. OIPF, OIPF Forum, 4. ISO/IEC 23004, Information technology MPEG Multimedia Middleware (MPEG-E) 5. A. Puerta, and J. Eisenstein. A Representational Basis for User-Interface Transformations CHI Workshop on Transforming the UI for Anyone, Anywhere. Seattle WA, March (2001) 6. A. Puerta and J. Eisenstein. Towards a General Computational Framework for Model-Based Interface Development Systems. Knowledge-Based Systems, Vol. 12, 1999, pp D. Thevenin and J. Coutaz. Plasticity of User Interfaces: Framework and Research Agenda, in Proceedings of INTERACT 99. Edinburgh: IOS Press J. Vanderdonckt and P. Berquin. Towards a Very Large Model-Based Approach for User Interface Development, Proceedings of UIDIS 99. Los Alamitos: IEEE Press 1999, pp

29 Study on Improving Energy Saving for Old Buildings with Daily Energy Conserving Index Yan-Chyuan Shiau 1, Chi-Hong Chen 2 1,2 Department of Construction Management, Chung Hua University, 707, Wu-Fu Rd., Sec. 2, Hsinchu, 300 Taiwan 1 ycshiau@ms22.hinet.net, 2 m @chu.edu.tw Abstract. Energy conservation and carbon reduction have become the emphasis of governance in countries of the world as global warming is increasing quickly. This study assessed building energy efficiency, proposed plans for enhancing building energy efficiency, and assessed the effectiveness of enhancement; in order to comply with the green building specification and save energy for sustainable development. The target building was assessed according to the major items in the Energy Conservation Target of Taiwan s system for Ecology, Energy saving, Waste reduction, and Health (EEWH) and Ecotect. These included the Efficiency of Lighting (EL), Efficiency of Air-Conditioning (EAC), and building Enclosure Energy-efficient Value (EEV). Buildings must pass these three items in order to pass the energy conservation assessment. Keywords: Green Building, EEWH, Ecotect, Energy Saving, BIM 1 Introduction The condition of the earth has been deteriorating in recent years as witnessed by the continuing extreme climates, ice meltdown in the Arctic and Antarctic, forest fire, and ozone layer damage. Global warming and high oil prices were a threat to the whole human race. One of a solution is to reduce energy consumption of building.[1] Realizing the future impact of climate change, the Taiwanese Government began advocating energy conservation and carbon reduction and launched the green building specification in It is important to evaluate Building Life Cycle Energy Consumption which is the key problem of building energy-saving. [2] About 97% of buildings in Taiwan are older buildings, and most did not pass the Green Building Mark (GB mark) accreditation. Throughout the lifespan of buildings, as energy consumption when buildings are in use is about 93% [3], energy conservation can help enhance the substantive economic efficiency of old buildings. The most important decisions regarding a building's sustainable features are made during the design and preconstruction stages. [4] In order to obtain the GB Mark, buildings must pass at least four out of the said nine targets, and energy conservation and water conservation are the required targets. This study mainly investigated the energy 19

30 conservation target. The main issues of this target include building enclosure energy-saving design, air-conditioner energy-saving design, and lighting efficiency. These issues are closely related to energy consumption. In this study, the College of Architecture and Planning of Chung Hua University was selected as the subject to assess if the building was qualified for the energy conservation assessment of Taiwan s GB mark accreditation. The research objectives of this study include: (1) To investigate the methods for enhancing energy conservation in green buildings, (2) To build the BIM and implement Ecotect to simulate and analyze the assessment items of energy conservation, and (3) To propose improvement plans for non-conforming items and estimate the projected benefits of improvement. 2 Literature Review 2.1 Energy Conservation Energy-efficient design for buildings is a vital step towards building energy-saving. [5] The lifespan of buildings is about years. When the building is in use, air-conditioning, lighting, elevator service, etc will consume most part of the energy throughout the lifespan of the building. In summer, air-conditioning consumes about 40-50% of energy, and lighting about 30-50%. Therefore, discussing building energy conservation in terms of air-conditioning and lighting will be the most effective [6]. The construction industry is increasingly interested in designing green buildings that can provide both high performances while saving on costs. [7] 2.2 Building Information Modeling Building information modeling. Building information modeling (BIM) is a three-dimensional value technology. BIM can provide a range of information about a building. [8] The concept of BIM necessary for the design of green building can provide the characteristics and performance of design concepts. [9] It integrates the data of all related information of a construction project. That is to say, it is the quantitative expression of construction projects. BIM is also a valuation method applied to design, construction, and management. By applying it to the central management of construction projects, constructors can enhance the overall project efficiency and reduce project risk [10]. In general, BIM is considered as a digital 3D geometric model of buildings. BIM provides a cooperative work platform for building designers and engineers, which is beneficial for them to effectively achieve energy saving, pollution reduction, costs saving of a project. [11] The information is expressed in different dimensions of a building, such as the layout, façade, cross section, detailed drawing, 3D view, perspective, BOM, the lighting effect of individual rooms, the required ventilation of air-conditioning, material price, and purchasing information. Therefore, BIM builds a 20

31 virtual building in the computer by means of digitization and provides a complete, consistent, and logical building database [12]. Ecotect. Ecotect uses a flexible and perceptual 3D information model simplifies complex geometric conceptualization, and completely abandons the complex modeling method of conventional CAD. It includes six analysis functions: thermal environment, light environment, sound environment, daylight, economic and environmental effects, and visibility analyses. These analyses cover the major factors affecting architectural design [13] and can accurately simulate the corresponding surroundings of a building in the four seasons. It also includes the interior environment analysis of buildings. This package allows us to spend the least time to simulate and experiment architectural design in the greatest detail. In energy conservation, it further analyzes information relating to daylight length, thermal environment, air-conditioning, and lighting equipment to reduce energy consumption of buildings for energy conservation. 3 Present Status and Analysis of Target Buildings 3.1 Assessment of Energy Conversation Indicators of the Target Building The College of Architecture and Planning of Chung Hua University was the target of this study aiming to investigate if the building was qualified for energy conservation. In Table 1, the failed items include horizontal light penetration window solar radiation shade and Building envelope energy efficient value (EEV). As EEV is the required assessment item, these results showed that the target building is not a qualified green building. Table 1. Energy conservation indicator assessment sheet of the target building. Item Standard Result Pass Fail (a) Glass visible light refraction rate Gri<0.25 Gri = 0.09 V (b) Horizontal light penetration HWs<HWsc HWs = 0.71 V window solar radiation shade HWsc = (c) Roof average penetration rate Ur<1.0(w/m² k) Ur = V (d) Building envelope energy EEV 0.8 EEV = 1.24 V efficient value (EEV) (e) Unit capacity efficiency (HSC) HSC 1.35 HSC = V (f) Efficiency of Air Conditioning EAC 0.8 EAC = 0.75 V System (EAC) (g) Efficiency of Lighting (EL) EL 0.7 EL = V Failed in energy conservation when any of the following items fails: V EEV, HSC, EAC or EL 21

32 3.2 Analysis of Building Enclosure Energy-Efficient Value The calculation of the energy conservation efficiency of the horizontal sunshade, vertical sunshade, and grid sunshade of windows was the focus of EEV analysis. When assessing enclosure energy conservation efficiency, we usually calculate the average window solar gain (AWSG). The AWSG of the target building is 198, which is higher than the standard at 160. Therefore, it failed. For this reason, the enclosure energy conservation efficiency of the target building should be reviewed. Most openings of the target building face east and west, including 47% facing east and 36.3% facing west. Although grid sunshades are installed on the building, as they are too shallow (approx cm), and there are too many openings on the east and west, the AWSG value is much higher than the standard EEV value for green buildings. Therefore, we suggest the owner to increase the depth of the sunshade and change all sunshades into grid sunshades. This way, there will be a greater change to pass the assessment. 4 Energy Efficiency Analysis This study applied Ecotect to simulate and analyze lighting energy conservation and enclosure energy conservation. After creating a 3D module based on the target building s present status, related values are input for space energy analysis. Classrooms A307 and A324 on the third floor of the target building were the targets of spatial simulation. The following energy conservation items were analyzed: 4.1 Verification of Lighting Energy Conservation We verified the lighting energy conservation of Classrooms A307 and A324. T8 (Tube 8/8 inches) fluorescent lamps were equipped in room A324. The equipment capacity included 12 sets of T8 fluorescent lamps, each 40W x 3 tubes. In room A307, T5 (Tube 5/8 inches) fluorescent lamps were equipped. The equipment capacity included 12 sets of T5 fluorescent lamps, each 28W x 2 tubes. As the luminous efficacy of the T8 fluorescent lamp is 52.5Lm/W each tube, the radiant flux of each set is 6,300Lm. Also, the luminous efficacy of the T5 fluorescent lamp is 89.29Lm/W each tube, the radiant flux of each set is 5,000Lm. After Ecotect simulation and analysis of Classrooms A324 and A307, Fig. 1 shows that the illuminance of artificial lighting by using T8 lamps for A324 is lx, meeting the CNS school indoor lighting requirements at lx. Fig. 2 shows that the illuminance without natural lighting by using T5 lamps for A307 is lx, also meeting the CNS requirements. In energy consumption, assume that these lamps are used 10 hours a day, the energy consumption is kwh for summer months; kwh for non-summer months. The cost for summer months is x $3.5/kWh = $512.4, for non-summer months is x $2.62/kWh = $764; and the total amount is $1276 x 12 sets = $15,312/year. 22

33 Fig. 1. Illuminance of T8 lamps (room A324, lx) Fig. 2. Illuminance of T5 lamps (room A307, lx) 4.2 Cost Analysis of Lamp Replacement As shown in the results, the illuminance difference between T8 and T5 lamps was very small, reduced from lx to lx. In energy consumption, however, the difference is significant, from $15,312/year to $7,152/year. That is to say, T5 lamps can save energy up to 53%. After replacing part of the lamps on the first to the fifth floors in June 2011, the changes in energy consumption of the target building from June to December 2011 was shown in Fig. 3. The results showed that T5 lamps can reduce energy consumption up to 31.7% when compared with the same period last year. These results also suggest that energy conservation performance will be more significant after replacing the lamps of the entire building to T5 lamps. After replacing part of the T8 lamps to T5 lamps in the target building, the total capacity saved is 15,762W. Assume the consumption time is 3,650 hours, a total of 57,531 kwh and a sum of NT$172,593 of electricity bill will be saved a year. A budget of lamp replacement is NT$254,945. The estimated payback period is 15 months. Fig. 3. Illuminance statistics 2010 and 2011 (unit: kwh) 5 Conclusions Based on the investigation results of the target building, the effect of energy 23

34 conservation is obtained after replacing part of the T8 lamps with T5 lamps. In the energy consumption of air-conditioning, conservation should begin with the enclosure design. A good building enclosure design can maximize the lighting, sun-shading, and heat-insulating effects of buildings. Consequently, the energy consumption of lighting and air-conditioning will be reduced. The author suggests the replacement of all T8 fluorescent lamps with T5 fluorescent lamps for the entire building to reduce energy consumption. In terms of building enclosure, as most openings are in the east and west, the energy conservation efficiency is limited. Therefore, the author suggests the installation of 45cm horizontal sunshades on all openings to achieve optimal energy conservation effect. References 1. Yoon, S.H., Park, N.H., Choi, J.W.: A BIM-based design method for energy-efficient building. NCM th International Joint Conference on INC, IMS, and IDC, pp (2009) 2. Xie, J., Yuan, J., Yuan, Y.: A study on optimization of Building Life Cycle energy consumption International Conference on Consumer Electronics, Communications and Networks, pp (2011) 3. Yang, H.C.: An Analysis on Energy Consumption and Environmental Impact of Buildings. (Unpublished mater s dissertation). Department of Architecture, National Cheng Kung University, Taiwan (1996) 4. Azhar, S., Carlton, W.A., Olsen, D., Ahmad, I.: BIM for sustainable design and LEED rating analysis. Automation in Construction, vol. 20, n. 2, pp (2011) 5. Ma, Z., Zhao, Y.: Model of Next Generation Energy-Efficient Design Software for Buildings. Tsinghua Science and Technology, vol 13, pp (2008) 6. Lin, H.T.: Green Building Explanation and Assessment Handbook. Taipei: Architecture and 3. Building Research Institute, Ministry of the Interior (2005) 7. Jalaei, F., Jrade, A.: Integrating sustainability with BIM at the conceptual design stage of building projects. Proceedings, Annual Conference - Canadian Society for Civil Engineering, vol. 4, pp (2011) 8. Kotwal, T., Ponoum, R., Brodrick, J., Brodrick, J.: BIM for energy savings. ASHRAE Journal, vol. 53, pp (2011) 9. Bernstein, P.G.: Green building information modeling. Construction Specifier, vol. 58, n. 2, pp (2005) 10. Tang, C.P.: The Application of Building Information Modeling (BIM) Technology Integrate with the Construction Information. (Unpublished mater s dissertation). Graduate Institute of Architecture and Sustainable Planning, National Yi-Lan University, Taiwan (2008) 11. Ren, Q., Tan, D., Tan, C.: Research of sustainable design based on technology of BIM International Conference on Remote Sensing, Environment and Transportation Engineering, pp (2011) 12. Chang, Y.H.: Study on Building Information Model Based on Cadastre Data. Mater Dissertation, Dept. of Real Estate & Built Envr., National Taipei University, Taiwan (2010) 13. Peng, Y.: Architectural and Environmental Design Program. Taipei. Chan s Arch Publishing Co, Ltd (2006) 24

35 Method of Unified Communications and Collaboration Service in Open Service Platform based on RESTful Web Services Sunhwan Lim and Hyunjoo Bae Future Communications Research Laboratory, ETRI, Daejeon, Korea {shlim, Abstract. In this paper, the functional architecture for third party call, short messaging, directory, and discussion RESTful web services was designed that enables IT developers to create applications using telecommunications network elements. In the modeling of third party call, short messaging, directory, and discussion, we proposed resource definitions and the HTTP verbs applicable for each of these resources. And we measured the TPS of the open service platform including RESTful web services above. Also, using the above model, an example service (i.e. unified communications and collaboration service) being composed of basic communication service (e.g. a third party call service, a short messaging service, etc.) and social networking service (e.g. a directory service, a discussion service, etc.) was created. Through third party call, short messaging, directory, and discussion process, the feasibility of the creation of a new service using the proposed architecture and resources was confirmed. Keywords: RESTful Open API, Third Party Call, Short Messaging, Directory, Discussion 1 Introduction From the viewpoint of service, integration between the wire and the wireless services is a current issue. The integration between wire and wireless services provides subscribers with the opportunity for a new of level services using the broadband capability of wired service coupled with the mobility of wireless. Open API (Application Programming Interface) can be easily used to implement or provide integration between wire and wireless services. Open API is a set of open, standardized interfaces between an application and a telecommunications network [1], [2]. This technology can provide a range of services for the integration of wire and wireless systems independently from network infrastructures, operating systems, or developing languages. In this paper, the functional architecture for third party call, short messaging, directory, and discussion RESTful web services was designed. The architecture was implemented with Eclipse Galileo version and tested on Apache Geronimo version In the modeling of the functional architecture, resource definitions and the HTTP verbs applicable for each of these resources were proposed. And the TPS (Transaction 25

36 Per Second) of the open service platform including RESTful web services above was measured. Also, using the above model, the functional architecture for an example service was designed, implemented, and tested. 2 Open API OMA (Open Mobile Alliance) and Parlay group is to develop open, technology independent APIs that enable the development of applications capable of operating across converged networks [1], [2], [4]. OMA group defines open APIs (i.e. OMA network APIs) based on REST (REpresentational State Transfer) and Parlay group defines open APIs (i.e. Parlay X APIs) based on SOAP that enables third party applications to make use of network functionalities [7], [8], [9]. In here, use of SOAP (Simple Object Access Protocol) based APIs because of message encoding and decoding, many related stack (e.g. WS security), and so on is considered to complex [3]. Alternatively RESTful APIs using HTTP protocol, and so on are a light weight [5], [6]. 3 Designed Architecture and Resources for RESTful Web Services 3.1 Functional Architecture REST Web Service Call Instance Creation RMI Call Method Call AS AS : Application Server DIR : DIRectory DIS : DIScussion LDAP : Light-weight Directory Access Protocol MSC : Mobile Switching Center REST : REpresentational State Transfer RMI : Remote Method Invocation SCF : Service Capability Feature SM : Short Messaging SMS-C : Short Message Service Center TPC : Third Party Call TPC/SM DIR/DIS Notification Subscription Web Service Module Open Service Platform RMI Server SCF Module Main TPC/SM/DIR/DIS Manager Notification Subscription Manager Client Notification Manager Protocol Adaptor MSC SMS-C LDAP Server Fig. 1. Functional architecture of the third party call, short messaging, directory, and discussion RESTful web services The functional modules of third party call, short messaging, directory, and discussion RESTful web services are illustrated in Fig. 1. This architecture is composed of a web service module and a SCF (Service Capability Feature) module. In here, the main 26

37 reason of the separation between web service module and SCF module is the effective support of services including state information like third party call, and so on. Web service module only publishes API and SCF module implements service logic of both including (i.e. stateful) and not including state information (i.e. stateless). Alternatively we can only use web service module. However, for the process of services including state information, we may implement service logic using DB including all state information or using request message including all state information parameter. This results in low performance of system. 3.2 Resources for Third Party Call RESTful Web Services Currently, in order to perform a third party call in telecommunication networks we have to write applications using specific protocols to access Call Control functions provided by network elements (specifically operations to initiate a call from applications). This approach requires a high degree of network expertise. Alternatively it is possible to use open API approach based on web service, invoking standard interfaces to gain access to call control capabilities. Resource Table 1. Resources summary for the third party call RESTful web services URL Base URL: partycall/{apiversion} All call sessions callsessions get a list of all call sessions Individual call callsessions/{callsessionid} get information of an session individual call session HTTP verbs GET PUT POST DELETE no create new call session (callsessionid assigned) no no no terminate call session 3.3 Resources for Short Messaging RESTful Web Services Currently, in order to programmatically send and receive SMS it is necessary to write applications using specific protocols to access SMS functions provided by network elements (e.g. SMS-C). This approach requires a high degree of network expertise. Alternatively it is possible to use open API approach based on web service, invoking standard interfaces to gain access to SMS capabilities. Table 2. Resources summary for the short messaging RESTful web services Resource SMS message requests Individual SMS message request URL Base URL: messaging/{apiversion} HTTP verbs GET PUT POST DELETE /requests return all message requests no create new messages request no (requestid assigned) /requests/{requestid} return one message request no no no 27

38 3.4 Resources for Directory RESTful Web Services For directory service within an enterprise, it should support the following functionalities : a polling mechanism for getting part list within an enterprise through part search keyword, a polling mechanism for getting user list within an enterprise through user search keyword, a polling mechanism for getting part profile within an enterprise through part ID, a polling mechanism for getting user profile within an enterprise through user ID, and managing contact list (e.g. get contact list, add contact, modify contact, delete contact, etc.). Resource Table 3. Resources summary for the directory RESTful web services URL Base URL: rectory/{apiversion} HTTP verbs GET PUT POST DELETE Parts /parts search part no no no profiles Users /users search user profiles no no no Individual part /parts/{partid} get part profile no no no Individual user /users/{targetid} get user profile no no no as- Contact list /contactlist get contact list no add contact (contactid signed) Individual contact /contactlist/{contactid} get contact info modify no list contact no delete contact 3.5 Resources for Discussion RESTful Web Services For discussion service within an enterprise, it should support the following functionalities : a polling mechanism for getting all discussion group list, a polling mechanism for getting discussion group information, and managing discussion group post (e.g. get all discussion group post list, add discussion group post, delete discussion group post, etc.). Resource Table 4. Resources summary for the discussion RESTful web services URL Base URL: on/{apiversion} Discussion groups /groups get all discussion group list Individual discussion /groups/{groupid} get discussion group group info Discussion group posts Individual discussion group post /groups/{groupid}/posts /groups/{groupid}/posts/{po stid} HTTP verbs GET PUT POST DELETE get all discussion group post list get discussion group post info no no no no no no no add discussion group post (postid assigned) no no delete discussion group post no 28

39 4 Designed Architecture for Example Service 4.1 Functional Architecture for Unified Communications and Collaboration Functional architecture for UC&C (Unified Communications and Collaboration) is illustrated in Fig. 2. UC&C applications are composed of the web based UC&C and the mobile UC&C. Enterprise members can connect into the web based UC&C via web browser in office and the mobile UC&C via smart phone out of office. Web based UC&C interacts with back end servers directly. However, mobile UC&C interacts with back end servers via RESTful service components on open service platform. Because mobile UC&C has a limitation on service usage, considering capacity of smart phone, security, and so on. End User Web Browser UC&C App Third Party App Third Party App UC&C App RESTfulWeb Services RESTfulWeb Services Open API Open Service Platform Third Party Call Short Messaging Presence Service Access SIP/LDAP/XMPP Directory Discussion Board Direct Message Control SIP/LDAP/XMPP IPPBX Directory IM Back End Server Fig. 2. Functional architecture for UC&C Open service platform publishes service components providing functionalities of back end servers, and then third party applications use these service components. Open service platform also includes service access control function which performs authentication and authorization of service access. Service components are published to RESTful web services. 5 Implementation of the prototype function 5.1 Environments and Testing Third party call, short messaging, directory, and discussion RESTful web services were implemented using Eclipse Galileo version and tested on Apache Geronimo version These RESTful web services was composed of a web service module and a SCF module. The web service module interacts with the SCF module using RMI. And also these RESTful web services interact with a Mysql DB to record transaction history information using JDBC (Java Database Connectivity), with MSC to setup a call session between two terminals, with SMS-C to send a SMS message to a terminal, and with LDAP server to manage enterprise organization chart and contact list. For the TPS measurement of the open service platform including RESTful web services, the above four and additional four RESTful web services were tested using SOAP UI pro version

40 - User : 20, Count Per Second : 200, Duration (Second) : 86,400 Table 5. TPS for open service platform Service Component API RESTful TPS (Average: ) Third Party Call makecall O 168 endcall O Short Messaging sendsms O 327 Directory getpartprofile O 311 getcontactlist O 307 addcontact O 286 deletecontact O 279 Discussion addpostreply O 295 deletepostreply O 282 Mail sendmail O 321 Presence setuserpresence O 363 getuserpresence O 322 Board getboardpostlist O 287 Direct Message getnewdmcount O Conclusion Regarding new market growth, a range of new intelligent services is on the horizon. Potential subscribers must be introduced to these services, but it is currently not feasible to bring third party service providers and developers into the vertical architecture of current telecommunications networks. Thus, open, technology independent APIs that enable the development of applications that operate across converged networks are necessary. In this paper, the functional architecture for third party call, short messaging, directory, and discussion RESTful web services was designed that enables IT developers to create applications using telecommunications network elements. The architecture was implemented with Eclipse Galileo version and tested on Apache Geronimo version In the modeling of the functional architecture, resource definitions and the HTTP verbs applicable for each of these resources were proposed. And the TPS of the open service platform including RESTful web services above was measured. Also, using the above model, the functional architecture for an example service was designed, implemented, and tested. Through the process, the feasibility of the creation of a new service using the proposed architecture and resources was confirmed. Acknowledgement. This research was supported by the KCC (Korea Communications Commission), Korea, under the R&D program supervised by the KCA (Korea Communications Agency) (KCA ( )) References 1. 3GPP, Third Generation Partnership Project, 2. OMA (Open Mobile Alliance), openmobilealliance.org/ 3. W3C, World Wide Web Consortium, 30

41 4. Roy Fielding, Architectural Styles and the Design of Network-based Software Architecture, Dissertation of Doctor of Philosophy in Information and Computer Science, University of California, IRVINE (2000) 5. Leonard Richardson and Sam Ruby, RESTful Web Services, O Reilly Media, (2007-5) 6. Cesare Pautasso, REST vs. SOAP: Making the right architectural decision, 1 st International SOA Symposium (2008-7) 7. Parlay X Working Group, Parlay X Web Services White Paper v1.0 (2002) 8. Web Services Working Group, Parlay Web Services WSDL Style Guide (2002) 9. Parlay X Working Group, Parlay X Web Services Specification v1.0 (2003) 31

42 Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen School of Computer Science and Technology, Harbin Institute of Technology, Harbin Abstract. This paper analyses the disadvantages of the existing EXT3 file system in accessing small files, and designed an original-page oriented large file organization structure and large file related read-write query tree, based on the small, many and no modifications after being written characters of small files. Keywords: search engine, small file storage, storage time, storage space 1 Introduction The accessing speed and utilization ratio of storage space of search engine are two essential performance indicators. If accessing speed is too low, storage will be the bottleneck of search engine performance, while the crawling speed of crawlers will be limited because crawlers are filled with the obtained pages, if storage speed is too low. If read speed is too low, it will affect the speed of analysis of search engine, while the cost of data storage will sharply increase if utilization ratio is too low, and resource will be wasted. Contents of search engine storage comprise by original pages, content pages and indexes. In these, original pages own the largest data volume, while the number of content pages and indexes is much smaller. The data volumes of content pages and indexes are more or less the same. The proportion of these three data volumes is almost 100:1:1. Original pages are the WEB pages crawled by web crawlers, whose sizes range from several KB to several hundred KB, dozens of KB normally. Content pages are pages that are extracted by original pages, the sizes of which are smaller than original pages, half of them generally. The establishment, update, storage and locating of indexes are in the charge of third-party software, and storage systems only need to provide the storage directory to these third-party software. Thus, original-page small files are the main objects for storage nodes to handle. Great defects exist in the existing file system accessing the original-page small files. In this paper, we do the compare of common compressing algorithms firstly, and choose a proper compressing algorithm to compress and store the page data. Then, we design a large file storage format for original pages, at the same time, we design a large file storage related snapshot query tree to optimize the speed of reading 32

43 snapshots. These measures reduce the storage nodes accessing response time and the usage of disk space. 2 The strategy on storing small file of EXT3 file system In this paper, based on the ext3 file system in linux operating system, and drawing lessons from the log-structured file system thoughts, we cache a large number writeoperation of original pages to the storage nodes memory, and organize the original pages to a large file in the cache. Then the large file is written to disk, greatly reducing the disk seek and data modification operation, at the same time, reducing the disk fragments and the disk metadata occupancy 2.1 File format designing In this paper, we design a compact large file format consists of small files, named LOG_COMPACT file. The large file is divided into three sections. As shown in figure 1, the first part is the file header, recording the whole file information, such as the number of WEB documents and large file bytes and some other information. The second part is the WEB document information unit array, used to quickly locate the WEB document position. Each unit records the WEB document URL (if URL is too long, then storing the hash value of the URL) and the offset in the large file and its occupants bytes length. The third part is the WEB document array, in which each element is a WEB page and can be located rapidly through the second part. Compressing every WEB document to save the storage space, this storage method can save the metadata storage as much as possible, and also can quickly locate the content of the large file. Because the size of WEB information unit array is generally dozens of KB, when reading file, the WEB information unit array of large files is first read. Through the array, it can find the offset of the given URL and the length of data, and then read out the file data. Fig. 1. The files layout inner LOG_COMPACT file. 33

44 2.2 Large file read-write designing When the storage node receives a snapshot fetch request, it gets some URL and date, and it needs some mechanism to quickly locate the URL and date to the position in large files where there is the original page, and then reads the page snapshot from the large file. In this article, we map the URL and date and the path of the large file to a database table. The table has three rows, and respectively, they are URL, date and the path, of which URL and date together make up the unique identity of each column. When the pages are written, add lines to the database. When reading the file, locate the large file through URL and date, and then, from the large file, read the page. Because of the huge amount of data, with the increase in the number of stored pages, the table will become quite large, so the table is often added, which makes query speed very slow. Therefore, the method is not suitable for page retrieval. Referencing the TRIE tree, we design snapshot retrieval mechanism. Corresponding relationship of URL, date and file path is designed into a tree. The date is the first division level, and then in accordance with the URL natural path, stretch to the bottom of the tree. Leaf nodes store the path, represented by the date and URL corresponding to the original page where the large file path exists. In this way, when querying a snapshot, just need to find the leaf node along the query tree through data and URL. The path, where the leaf node stores, is the path of the large file. Then the page can be read from the path. Huge number of pages, the tree can t be completely stored in memory. To solve this, the upper layers can be stored in memory, and the content of under layers can be stored on hard disk. When read the path of under layers, the needed sub-tree is read into memory. Then operate the subtree in memory. Fig. 2. The storage structure of raw pages in store machine. Although query tree is able to quickly finish modification and query, but operations are complicated and the number of layers is varying in depth. These make the query tree very uneven. The seeking and modification are remarkably different between files. In order to reduce the complexity of the operation, at the same time, be able to quickly query page snapshot, we conduct necessary simplification on the 34

45 query tree in this paper, leaving only two layers. The first layer is the storage unit, on which the information is always kept in memory. The second layer is index table of storage unit, whose information is stored in the hard disk. In the last chapter, we chose the day channel as the load distribution unit, by which day channel in the storage node is as the unit of storage division. Because the data amount of day channel is moderate, and the number of day channel is small, the storage unit layer in query tree can always kept in memory. When query Information arriving, through URL and date, the storage unit can be found by query tree. And through storage unit, the index table of this storage unit can be acquired. And through the index table, the large file where the page is stored can be quickly located, and then the page can be extracted from large file. 3 Experiment To compare the accessing efficiency and storage volume of different storage schemes for original pages, we design and realize five storage schemes for original pages. 1) Compressing document by document storage, short for Raw_Store. Compressing the pages firstly when writing pages, and storing the pages, with the same day channel, in the same directory. When do snapshot query, day channel directory is firstly located. Then, querying the page files in this directory. Finally, the pages are decompressed and returned. Because the EXT3 file system uses hash locating to handle with large directories, so the file search in some directory can be very fast. This scheme is simple to realize, but there will be a number of small files existing in system. 2) Archiving compressing storage, short for AC_Store. Archiving original pages in memory and assembling into a large file when writing pages, and then, compressing the large file and storing it in disk. When do snapshot query, the large file is found through the query tree firstly, and the large file is read to memory and decompressed. Then, the archiving item wanted by the query is read finally. This scheme remarkably decreases the small files in system storage, but the whole large file needs to be read to the memory. 3) Compressing archiving storage, short for CA_Store. Compressing original pages when writing pages, and then, archiving the compressed page files to a large file and storing to the disk. When do snapshot query, the large file is found through the query tree. Then, the data is read to memory by archiving item and decompressed until the page files wanted by the query are obtained. This scheme just reverses the sequence of compressing and archiving in scheme 2, but the average data amount of reading and decompressing is lesser that scheme 2 when querying pages. 4) LOG_COMPACT large file storage, short for Log_Store. Compressing the pages in memory when writing pages, and then, the compressed files are organized as LOG_COMPACT file and written to disk. When do snapshot query, the large file is found through the query tree. Then, reading LOG_COMPACT header and WEB information unity array, and getting the offset and length of wanted page in the large file. Next, reading and decompressing the page. This scheme doesn t need to operate the whole file when extract the snapshot, only little information of the large file 35

46 needed, thus the speed of snapshot is higher. 5) Content duplication storage, short for CI_Store. Conducting repeatability test first when writing pages. If the content of the page is the first appearance in system, then compress and store it. Locating page file through inquiring the query tree, and decompressing the page file after finding it. Due to maintaining the reference count of content and being beneficial to deleting when expired, this scheme conducting the storage document by document. We assume in experiment that there s no limitation for the receiving speed of network card, to test the maximum of read-write speed. Reading the previously crawled original pages dataset to memory, and then, sending the original pages to storage module through memory. The size of experimental dataset is 832M, and number of pages is 26646, and average page size is 31.2KB. Because the most operation on original pages in storage system is write-operation, while snapshot extraction operation is relatively less, we set the ratio of frequency of two operations as 100:1 which means that there about one snapshot extraction operation during 100 times write-operations. The time of writing pages and snapshot extraction makes up the total time of accessing time of a storage node. The read-operation and writeoperation on the same storage node ought to be synchronized, so we choose the total accessing time as the standard to measure the accessing efficiency of a storage node. Compressing data is exchanging the accessing speed for less disk space occupation, so we compared the accessing efficiency and disk occupation of different methods of lossless compression and loss compression in this paper before, and we finally chose gzip as our proper compression algorithm. From the method (2), (3), (4) in Table 6, we know that it can remarkably reduce writing file time if the small file buffers are organized to large file. But the speeds of method (2) and method (3) are very low that because they have to read the most content of archiving large files to find the wanted small page file. The read speed of method (4) is lower than method (1), because it has to read much more data amount from large files than method 1. Write-operation in method (5) needs to conduct duplication test, so the write cost is high, while the read operation is the same as method (1), thus the speed is high. No Table 1. The non-compressed file read-write velocity and disk occupied. storage method volume(mb) write(ms) read(ms) read+write(ms) 1 Raw_Store AC_Store CA_Store Log_Store CI_Store

47 6 Conclusion To improve the accessing efficiency of storage node to improve storage throughput and reduce storage disk space occupation, so as to reduce system deployment cost. In this paper, we contrasted the compression ratio and compression speed of common compression algorithms to web data firstly. Then, we analyzed the problems in accessing small files of EXT3 file system and designed the LOG_COMPACT large file format based on the characters of original pages which are much write and little read and almost no modifications after written, and designed the accessing process of it. Then, we conducted an experiment on different accessing methods supplemented by compression algorithm, and the result of the experiment showed that LOG_COMPACT related storage methods performed best in the comprehensive evaluation of accessing efficiency and disk space occupation. References 1. RFC1952. GZIP file format specification version RFC1950.ZLIB Compressed Data Format Specification version T.A.Welch. A Technique for High-Performance Data Compression. Computer In Computer. 1984,17(6): Ziv J/Lempel A. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory,1977,23(3): S.C.Tweedie. Journaling the Linux ext2fs Filesystem. Proceedings of the 4th Annual LinuxExpo, Durham, NC. 2007,10(4): Namesys web site JFS for linux project website. The SGI XFS project website. 37

48 An NMS Architecture for Large-scale Networks Byungjoon Lee 1, Youhyeon Jeong 1, Hoyoung Song 1, and Youngseok Lee 2 1 Electronics and Telecommunications Research Institute, Republic of Korea {bjlee, yhjeong, hsong}@etri.re.kr 2 Chungnam National University, Republic of Korea lee@cnu.ac.kr Abstract. One of the key characteristics of the contemporary network is its scale. The cloud data center network, which is considered as a key component of the current Internet, is normally composed of tens of thousands of devices. Besides, networks which connect data centers and their users are getting larger and becoming more complex because of the growing bandwidth requirement and user demands. This paper suggests a novel network management software architecture, which is suitable to manage such large-scale networks in a scalable and highly available manner. The software architecture has been designed based on consistent hashing. In this paper, we provide a formal definition of the software architecture, and evaluation results to show its feasibility. Keywords: NMS, Consistent Hashing, CORD 1 Introduction One of the key characteristics of the contemporary network is its scale. Normally, a cloud data center network is composed of more than 50,000 elements [1]. The cloud data centers are connected with each other and end-users through highperformance transport networks, which are also getting larger and becoming more complex. The scale of such networks casts several important questions: (1) does the architecture provide a performance improvement method according to the size of managed networks? (2) is it possible to quickly recover an NMS from software component failures? The management system software for such networks should be able to monitor thousands of network elements within a reasonable time bound and to configure the huge number of network elements simultaneously to maintain their configurations consistently [2][3]. It means that an NMS should be horizontally scalable; we should be able to enhance its performance by adding more servers, and we should be able to predict the performance gain. Otherwise, it is impossible to calculate the cost to provide reasonable network management performance. This research is supported by the IT R&D program of KCC/KCA ( : R&D on Smart Node Technology for Cloud Networking and Contents Centric Networking). 38

49 2 Byungjoon Lee et al. Besides, an NMS for such networks should provide high availability which guarantees the fast failover from software failure. The unwanted downtime of an NMS might lead to the loss of revenue of the network service providers. In this paper, we suggest an NMS software architecture that focuses on the high scalability and high availability. The original version of the architecture was first publicized in [4]. This paper is an extension to the original version, with more formal definitions and more evaluation results. We implemented the software architecture as a middleware, called COre library for Rapid Development of NMSs (CORD). This paper is organized as follows. Section 2 summarizes previous important studies. In Section 3, we present the CORD architecture in detail. Section 4 presents the performance evaluation results of CORD. Finally, concluding remarks and future research directions are presented in Section 5. 2 Related Works A multi-tier architecture is a very popular software architecture that enhances overall manageability and scalability by splitting a software into multiple layers. Each layer consists of a similar set of functions following different development strategies from the other layers. The architecture has been adopted to network management systems including [5]. However, the multi-tiered architecture does not allow linear performance enhancement by adding more servers to the NMS server pool; the whole system should be stopped to reconfigure the connections between layers and servers, and there s no guarantee that load is evenly distributed and automatically reconfigured. CORBA-based network management systems [6][7][8][9] model software components and network elements as objects, and have them cooperate to manage networks. Those systems solve many problems of the traditional multitier management architecture. CORBA-based network management systems are scalable to the size of the managed networks because it is possible to distribute NMS software components on multiple nodes and make them share management workloads. But the scalability of the CORBA-based management systems is limited; though developers are able to deploy objects across multiple servers, it is not possible to predict the volume of management workloads assigned to each object because CORBA does not support load-balancing on objects. P2P-based network management systems [10][11][12][13] mainly focus on attaining network scalability by allowing a management overlay network is automatically configured. However, above studies did not pay attention to the scalability or availability of a network management system itself; the scalability and availability of an NMS that sits on top of the P2P network does not change by adding more P2P nodes. Though there was a study which delegated some of the management functionalities to P2P peers [14], they did not study how such delegation might enhance the overall scalability and availability of a system. 39

50 An NMS Architecture for Large-scale Networks 3 Management User Interface CORD process CORD node CORD domain gateway network elements Fig. 1. CORD network management architecture 3 CORD: An Alternative NMS Software Architecture In designing an alternative NMS software architecture and implementing it as a middleware, CORD, we have used the consistent hashing as our tool [4]. In this paper, we achieve predictable scalability and high availability by using consistent hashing (1) to distribute management workload evenly to all NMS servers, and (2) to design a fast software component failover mechanism. Fig. 1 is a conceptual diagram of one CORD system. One CORD-based NMS system that manages one network domain is called a CORD domain, which is a set of CORD nodes and CORD processes. CORD node is where the CORD middleware is installed, and CORD process is a software component process which is the part of an NMS. CORD processes run on CORD nodes. For every CORD process, we assign an unique identifier. CORD nodes communicate with each other by CORD protocol. If a network manager sends some network management policies to a specific CORD process using a user interface, the process identifies a set of management commands for applying the policies to the network. After the identification, the process evenly distributes the management commands to all CORD nodes. Every CORD node that receives a command delivers it to a network element. Consistent hash function of CORD We denote the consistent hash function of CORD as F. F(x) returns the IP address of a CORD node that manages the key x. We use F(x) to determine a CORD node to which we register the location of a CORD process x. We also use F(x) to calculate which CORD node manages a network element x. In implementing F, we assumed that every CORD node knows all IP addresses of all CORD nodes in the same domain. To satisfy the assumption, we use a simple discovery protocol [4]. 40

51 4 Byungjoon Lee et al. Thus, F(k) is defined as F(k) = argmin si SD(k, s i ) where S is a set of all IP addresses of all CORD nodes. D(x, y) is the distance between H(x) and H(y) on a circular hash space H, measured in a counter-clockwise direction. Thus, F(k) maps k to the IP address of the k s counter-clockwise adjacent node on H. Process registration and Communication Let there be several processes started at the node s k, and the set of their identifiers be P k = {p k,1,..., p k,n }. The node s k records the fact that process p k,i is started at the node s k by sending a registration message to the node F(p k,i ) (resolver node of p k,i ). For example, let there be three CORD processes with the same identifier value k, which are being started on three different CORD nodes IP 1, IP 2, and IP 3. They send registration messages to the same resolver F(k). After receiving the messages, the resolver node builds a location table shown in Fig. 2. The resolver sends an activation message to the first node that sent the registration message, and replies stand-by messages to the other nodes. A node that receives the activation message changes the status of the process k into activated. The other nodes would turn the processes into stand-by status. The registration procedure is executed whenever there is a change on F(p), p P k. A process p m,n that wants to communicate with process p i,j first queries the resolver F(p i,j ) where p i,j is. The resolver replies the IP address of the active p i,j. Using the IP address, p m,n is able to communicate with p i,j. Because this query-based communication scheme does not require a priori knowledge of IP addresses, it is possible to move processes to different locations transparently. Failover: process and system faults We have implemented a fast process failover mechanism using the resolver nodes. For example, if the active process k in Fig. 2 fails, the CORD node IP 1 detects the failure, and sends a deregistration messsage to its resolver. The resolver removes entry < k, IP 1 > from the location table, and sends an activation message for the next registered location of process k (in this case, IP 2 ). The node IP 2 immediately activates the process k. k k k IP 1 <k,ip 1 > activate IP 2 <k,ip 2 > IP <k,ip 3 > 3 F(k) k IP 1 k IP 2 k IP 3 Fig. 2. Process registration and replication 41

52 An NMS Architecture for Large-scale Networks 5 location table q IP 2 q IP 3 F(q) process list backup table IP 2 IP 1 q n cw = IP 2 duplication messages n ccw = IP 1 CORD domain IP 2 process q IP 3 process q Fig. 3. Process list backup table The process failover time is as follows: d pf + s d + s a d pf + 2 s (1) d pf is time for a node to detect the failure of a process; s d is time for the node to deliver a de-registration message to resolver; and s a is time for the resolver to deliver an activation message. Thus, the process failover time is determined by the network delay s, which is very small in the contemporary networks. Basically, CORD handles system failures by re-registration; if one CORD node fails, F of all CORD nodes are changed, and the change forces every node to re-register their process locations. In result, a new active processes is elected. However, the time complexity of this basic failover scheme is O(N) where N is the number of nodes in the domain, because it takes about N 2 s for an address removal message to be delivered to all CORD nodes. CORD solves this issue using process list backup table. That is, every node sends its active process identifiers to its counter-clockwise neighbor n ccw using process list replication messages (Fig. 3). n ccw saves the process identifier into the process list backup table. The backup table entries are not used until the loss of n cw is detected. On detecting the loss of IP 2, IP 1 sends address removal messages for all processes in the backup table to their resolvers. Therefore, F(q) receives an address removal message from IP 1. Then the resolver F(q) removes location table entries for IP 2, determines new active process locations, sends process activation messages to the determined locations, and updates F. Therefore, the system failover time is reduced to: d nf + s d + s a d nf + 2 s (2) where d nf is a time to detect the loss of neighbor node. 42

53 6 Byungjoon Lee et al. Scalability with even load distribution To achieve predictable scalability, CORD evenly distributes management workloads (management commands) to all CORD nodes using F. Thus, we are able to enhance the management performance linearly. Let D = {e 1, e 2,..., e n } be the set of IP addresses of all network elements within a single management domain. A CORD node m manages a set of network elements E such that F(e) = m, e E, E D. By applying the techniques mentioned in [15][16][17], we tuned that the number of network elements assigned to each CORD node is close to K/N where K = D and N is CORD domain size. A process p i,j sends a management command for a network element x to F(x). The receiver node F(x) interacts with x on behalf of p i,j and sends a reply to p i,j. Thus, by adding more CORD nodes to a CORD domain, we are able to achieve almost-linear performance improvements because the management workload is evenly distributed to all CORD nodes automatically. 4 Performance Evaluation 4.1 Failover recovery time The process and system failover performance of CORD is provided in [4]. In addition, to verify that the system failover time does not depend on the size of a CORD domain, we collected ping messages between CORD processes that have RTT values greater than 15ms per every domain size. Such longer delay (normally below 1ms) are believed to be introduced by the system failover. For the experiments, we used 128 OpenVZ virtualized CORD nodes. For each domain size, we emulated 256 sequential node failures by picking 256 CORD nodes in sequence and turn them on/off in turn. As depicted in Fig. 4, the RTT values do not vary much by the size of CORD domain, which supports the argument that the system failover time of CORD does not depend on the size of a CORD domain. RTT (msec) min, avg, max of RTT values # of nodes Fig. 4. Average failover time according to CORD domain sizes 43

54 #/,'1.#*-#%>1#%13>-&C01'#(1-%&*-1.# &-#D!EF#,-.#%1'%1.#&%#B&%>#GHAHIG#$J#,..+1''1'#,-.#!G#%*#!EK#5678#-*.1'#%*#'11#>*B# An NMS Architecture for Large-scale Networks 7 # response time (sec) # 8190-%":2#J1+@*+(,-31#@*+#2,+&,/41#'&N1#*@#,#5678#34*0.#@*+#E"A"""#+*0%1+'O#,'#%>1#-0(/1+#*@# Fig. 5. CORD domain size and management performance 5678# -*.1'# &-3+1,'1'A# %>1# +1')*-'1# E"A"""# (*-&%*+&-:# (1%>*.# 3,44'# PQ!R"A"""# STUJ#VW=#*)1+,%&*-'X#&'# <## 4.2 Load distribution and linear performance improvement =>1-A#B1#(1,'0+1.#>*B#%>1#(*-&%*+&-:#)1+@*+(,-31#*@#,#5678#34*0.#2,+&1'#/9# To measure how the management performance of a CORD domain is improved,.y0'%&-:# &%'# '&N1<# Z*+# 1L)1+&(1-%A# B1# 1L130%1.# E"A"""# (*-&%*+&-:# (1%>*.# 3,44'# PQ# by the consistent hash function, we measured a total response time for executing!r"a"""#stuj#vw=#+1c01'%'x#*-#e"a"""#-1%b*+?#141(1-%'#0'&-:#*-1#%*#gr#5678# SNMP GET queries for 20,000 network elements by varying the size of the CORD -*.1'<#W,3>#(1%>*.#3,44#&'#1C0&2,41-%#%*#'121-#STUJ#VW=#+1C01'%'<#[>1-121+#,#3,44# domain. To prepare 20,000 network elements, we used node emulator developed &'#&''01.A#%>1#3,44#&'#.14&21+1.#%*#,#5678#-*.1#'1413%1.#/9#%>1#3*-'&'%1-%#>,'>#@0-3; by ETRI. Each emulated network element is Cisco, Juniper, or Nortel router. %&*-<#=>1#7UJ#)+*31''1'#&-3*(&-:#(1%>*.#3,44'#'1C01-%&,449<#=>1#-1%B*+?#141(1-%#&'# We also prepared a management command which queries seven different SNMP 3>*'1-# %>1# )**4# *@# E"A"""# 1(04,%1.# 5&'3*A# \0-&)1+A#,-.# T*+%14# scalar OIDs. Therefore, in our experiment, we perform 140,000 SNMP GET +*0%1+'<# ]'# '>*B-# &-# Z&:0+1# RA# B1# 3*04.# 1->,-31# %>1# (*-&%*+&-:# )1+@*+(,-31# /9# queries. The measurement result given in Fig. 5 suggests that the total response,..&-:#3*((*.&%9#5678#'1+21+'#%*#%>1#5678#34*0.<# time is half-reduced by increasing the size of the CORD domain twice # of C O RD nodes 5 Conclusion The size and complexity of contemporary networks are evolving in an unprecedented way. The advent of cloud computing centers changed the scale of networks, and the huge traffic between them is modifying the transport network structure connecting the centers and users. Thus we need a novel network management software architecture which enables network management systems to easily deal with the scale of the underlying networks in a highly available manner. To meet the demand, we have proposed!"# a new management software architecture, # called CORD, which is defined on the consistent hash function. The scalability of the proposed architecture is predictable because we can enhance the management performance linearly by adding more servers to the management server pool. The architecture provides sub-50ms failover performance which is adequate to support highly-available management systems [19]. CORD has been successfully applied to the implementation of several network management system, including PBB-TE path configuration and management system (PPS). In the near future, we are planning to apply CORD to the intercloud traffic management systems. 44

55 8 Byungjoon Lee et al. References 1. A. Greenberg, J. Hamilton, D.A. Maltz, and P. Patel, The cost of a Cloud: Research Problems in Data Center Networks, ACM SIGCOMM Computer Communication Review (CCR), Vol. 39, Issue 1, January S. Kim, M-J. Choi, and J. W. Hong, Towards Management Requirements of Future Internet, 11th Asia-Pacific Network Operations and Management Symposium (APNONS) M-J. Choi and J. W. Hong, Towards Manageability in the Next Generation Networks, IEICE Transactions on Communications, Vol. E90-B, No. 11, Nov. 2007, pp B. Lee, Y. Jeong, H. Song, and Y. Lee, A Scalable and Highly Available Network Management Architecture, GLOBECOM Web NMS 5 framework, 6. P. Haggerty, and K. Seetharaman, The Benefits of CORBA-based Network Management, Communications of the ACM, Vol. 41, No. 10, pp , October M. Jeong, J. Kim, J. Kwon, and J. Park, Design and Implementation of CORBAbased Network Management Applications within TMN Framework, APNOMS M. Leppinen, P. Pulkkinen, and A. Rautiainen, Java and CORBA-based Network Management, IEEE Computer, Vol. 30, No. 6, pp , Jun J. Pavon, and J. Tomas, CORBA for Network and Service Management in the TINA Framework, IEEE Communications Managazine, Vol. 36, No. 3, pp , March L. Z. Granville, D. M. Rosa, A. Panisson, C. Melchiors, M. J. B. Almeida, and L. M. R. Tarouco, Managing Computer Networks using Peer-to-Peer Technologies, IEEE Communications Magazine, Vol. 43, No. 10, pp , October R. State, and O. Festor, A management platform over a peer to peer service infrastructure, ICT, C. Simon, R. Szabo, P. Kersch, B. Kovacs, A. Galis, and L. Cheng, Peer-to-peer management in Ambient Networks, 14th IST Mobile and Wireless Communications summit, Dresden, Germany, June 19-23, A. Fiorese, P. Simoes, and F. Boavida, A P2P-based Approach to Cross-domain Network and Service Management, AIMS, R. Carroll, C. Fahy, E. Lehtihet, S. Meer, N. Georgalas, and D. Cleary, Applying the P2P paradigm to management of large-scale distributed networks using a Model driven approach, NOMS, D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin, Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web, in Proc. ACM Symposium on Theory of Computing, D. Karger, A. Sherman, A. Berkheimer, B. Bogstad, R. Dhanidina, K. Iwamoto, L. Matkins, and Y. Yerushalmi, Web Caching with Consistent Hashing, WWW, I. Stoica, R. Morris, D. Karger, M. Frans Kaashoek, and H. Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, ACM SIGCOMM, G. DeCandia, D. Hastrorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, Dynamo: Amazon s Highly Available Key-value Store, SOSP, O. Bonaventure, C. Filsfils, and P. Francois, Achieving sub-50 milliseconds recovery upon BGP peering link failures, IEEE/ACM Transactions on Networking, Vol. 15, Issue 5, pp , October

56 Efficient Data Transmission Technique for Ubiquitous Healthcare Systems Yoon Hyun Kim and Jin Young Kim Department of Wireless Communications Engineering, Kwangwoon University, Wolgye-Dong, Nowon- Gu, Seoul, Korea {yoonhyun, Abstract. In this paper, we propose a efficient data transmission algorithm of UWB (ultra wideband) signals, and analyze performance of UWB signal detection probability using digital watermarking sequence for ubiquitous healthcare (u-healthcare) applications. The digital watermarking is employed to enhance security of bionic data. Because the power level of the UWB signals can be sufficiently low than other signals, there are no harmful effects on human body. In the proposed sperad spectrum watermarking system, Kasami sequence is chosen as a spreading sequence due to its good correlation property. The performance of proposed scheme is analyzed in terms of detection probability. The results of the paper can be applied to design of various u- healthcare systems. Keywords: UWB Signal, digital watermarking, u-healthcare system, WBAN 1 Introduction One of the promising solutions to this is UWB (ultra wideband) system. The power level of the UWB signals can be sufficiently lowered than other signals, so that there are no harmful effects on the human body [1-5]. In the u-healthcare system based on WBAN (wireless body area network), the UWB signals are very efficient with its low power level. In the WBAN, there are a lot of sensors on the body to detect and manage bionic signals. For efficient implementation of the WBAN, there is a critical problem on how exactly detect a desirable signal among the various sensors signals. A server of the WBAN collects the sensors singals, and then informs state of body of a hospital or a doctor. Under these conditions, it is very important to accurately detect each sensor s signal without miss detection. In this paper, we propose a novel transmission algorithm of UWB signals based on digital watermarking for the WBAN. The proposed algorithm employs digital watermarking sequence which spreads signal for watermarking capability. The primary advantage of watermarking comes from the fact that digital watermarking make the system more robust and secure in noisy and interference environments. The proposed algorithm is expected to have the following adventageous features: First, the proposed algorithm can achieve high detection probability, 46 1

57 especially at low SNR (signal-to-noise-ratio). Second, the digital watermarking sequence level can be controlled according to its target applications. And third, the proposed algorithm can simultaneously transmit a simple addtional information with transmission of UWB signals through spreading and despreading property. As a digital watermarking sequence, Kasami sequence is employed owing to its good correlation property. Signal detection probability is derived for various system parameters in WBAN applications. The rest of this paper is organized as follows: In Section 2, WBAN channel model is introduced for analysis and simulation of the proposed scheme. In Section 3, the proposed system model is analyzed and its system block diagram is suggested. In Section 4, signal detection probability is derived and simulation results are presented for various system parameters, respectively. In Section 5, application fields of proposed algorithm is suggested. Finally, In Section 6, some conclusions are drawn. 2 WBAN Channel Model There are three choices for a location of communication equipments: in-body, on-body, and offbody. In addition, there are three modes according to transmission speed: low, moderate and high speed [6]. CM 1 is a channel model inside a body, and its response is determined by internal organ or blood vessels. Also, CM 2 and CM 3 are channel models between in-body and on-body, respectively, and their responses are affected by skin, muscle and fat. Usually, the CM 3 has a longer distance than the CM 2. Fianlly, CM 4 is a channel model between on-body and external terminal. The distance between external devices is typically considered to be a maximum of 5 m. i In the WBAN channel, the complex channel impulse response ( ) given by [7] L 1 ( ) = α δ l ( t τl ) h t i i i l= 0 h t for the i-th device is, (1) where L is the number of total arrival paths modeled as Poisson random variables with a mean i value of 400, l is the l-th arrival path of the signal and α is the magnitude of the l-th path, l which can be expressed as i i exp τ 1 l α = Ω F δ 0 ( l) β. l k Γ (2) In (2), Ω 0 is the path loss,γ is an exponential decay factor, β is a log-normal random variable with zero mean and variance σ 2 i, and τ is the path arrival time that is modeled as a l 47

58 Poisson random process with an arrival rate of λ = 1/ ns. Also, in (2), Fk denotes effect of K-factor in NLOS (non-line-of-sight) environments and can be calculated as k ln10 F =, k 10 (3) where k is the difference between magnitude of the first impulse response and average value of impulse responses. 3 Proposed System Model It is assumed that each device and access point can support TH-PAM UWB standard of IEEE a. The transmitted signal s (t) can be described as below E p t it ct RX 0 s i c (4) i= s( t) = ( ), E = E + E. (5) RX TX Wat where E RX is power of received signal, digital watermarking sequence. p ( t) 0 E TX is power of transmitted signal, EWat is power of T is is normalized pulse signal, T s is symbol duration, c time duration and c i is the i -th time-hopping code. Fig. 1 shows the block diagram of the proposed algorithm with digital watermarking sequence for UWB signals of the WBAN. The input data is regarded as bionic data from each on-body sensors. Each on-body sensor data is spread by direct sequence generator. And then, spread data is modulated by pulse amplitude modulation (PAM) modulator. 48 3

59 Fig.1. Block diagram of the proposed algorithm. Finally, digital watermarking sequence is added up to the modulated data. At this block, digital watermarking sequence has the power level about -5dB ~ -20dB compared with power level of transmitted PAM data. In general, the watermarking sequence should have good autocorrelation property, therefore, we employs Kasami sequence. At the receiver, a log-likelihood function of received signals is given by L ( S) ln p( r( n) ) =, (6) where r ( n) is received signal and p ( r( n) ) denotes probability density function (PDF) of ( n) From (6), we obtain maximum likelihood (ML) estimation ofs given by { L( S) } Sˆ = arg max, S ( p( r( n) )) r. i 1 = arg max ln S, n= 0 2i 1 = arg max r( n) w( τ n), (7) S n= 0 where w( n) is watermarking sequence added up to the transmitter signal. ML estimation finds the 49

60 maximum output value of the correlator from correlation between the received signal and the watermarking sequence. 4 Simulation Results In this section, simulation results of the proposed system are presented for the varying system parameters. The performance is evaluated in terms of detection probabilityp D. In the simulation, we compare the detection probabilities with various FA probabilities in the specific digital watermarking sequence level. Digital watermarking sequence levels considered in this paper include -5dB, -10dB, -15dB and -20dB. As a digital watermarking sequence, Kasami sequence is applied for enhanced detection performance. Also, we employ UWB time hopping- PAM (TH-PAM) signal in the on-body sensor. To assess detection probabilities, FA probabilities are found to be as 6.4%, 8.5% and 9.5%. Finally, in the simulation results, we assume that the channel estimation is perfect. Fig. 2 shows detection probabilities vs. SNR for the proposed algorithm with various FA probabilities when the digital watermarking level is -20dB. In Fig. 3, X-axis represents SNR from -25dB and 0dB and Y-axis represents signal detection probability from 0% and 100% range. As shown in Fig. 2, detection probabilities are getting higher with FA probability. Also, because the power level of the digital watermarking sequence is very lower than the power level of on-body sensor s signal, the detection probability is between 55% ~ 60% at the SNR of -10dB. Fig. 2. Detection probabilities vs. SNR performance with various FA probabilities. (Watermarking sequence level is -20dB) 50 5

61 Fig. 3 presents the detection probabilities vs. SNR for the proposed algorithm with various FA probabilities when the digital watermarking level is -15dB. The detection probabilities for an overall SNR are improved by change of digital watermarking sequence level compared with the results of -20dB. Fig. 4 shows the BER performance with various digital watermarking sequence levels. As shown in Fig. 4, BER performance of without watermarking is almost same as the BER with -20dB, - 15dB and -10dB digital watermarking levels. Because -5dB watermarking level may affect the on-body sensor s data, a BER of -5dB is higher than other watermarking level. Therefore, the watermarking sequence level is cautiously chosen to avoid degradation of BER performance. From the results of this paper, it is confirmed that there is a trade-off relationship between detection probabilities and BER performance. So, it should be paid special attention to choice of digital watermarking sequence levels [8]. Fig. 3. Detection probabilities vs. SNR performance with various FA probabilities. (Watermarking sequence level is -15dB) 51

62 Fig. 4. BER performance with various digital watermarking sequence level (-5dB, -10dB, -15dB, - 20dB, and without watermarking) 5 Application for U-Healthcare Systems Moreover, the proposed algorithm can be applicable to the medical ICT (Information and communication technology) using WBAN, as shown in Fig. 5. Main application examples are ECG, pacemaker and wireless capsule endoscope. The ECG is a device which records contraction of heart according to stream of times while the pacemaker is a device which makes patient s heart working normally, and it is inserted inside human muscle. The wireless capsule endoscope is usually a tablet-sized capsule, and if a man swallows the capsule, it sends the moving capture data of internal organ to the external sensor. In the aspect of medical applications above mentioned, a main issue is whether the medical system can coexist with other RF systems. This issue can be solved by the proposed algorithm where the WBAN based on UWB can effectively operate with the very low power spectral density. Therefore, algorithm and results of this paper are appropriate for medical ICT. 52 7

63 Fig. 5. Application fields of proposed algorithm based on WBAN. 6 Conclusions In this paper, we proposed the novel transmission algorithm of UWB signals with digital watermarking sequence, and we analyzed and simulated performance of UWB signal detection probabilities for WBAN applications.. From the simulation results, it is confirmed that detection probability over 90% can be achieved in the overall range of SNR for the WBAN channel model at the SNR of 0dB. Also, from trade-off relationship between detection probability and BER performance, it is very important to properly choose digital watermarking level for various WBAN applications. By employing digital watermarking, we can make the proposed WBAN system more robust and secure while maintaining system performance. The results of this paper can be applied to the various WBAN application fields such as healthcare system, living assistance of an elder and blind person, and entertainment services as well as other wireless consumer electronics in various home network applications. Acknowledgements. This work was, in part, supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(mest) (No ), and in part, supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency)" (NIPA-2012-(H )). 53

64 References [1] J. Y. Kim, Ultra Wideband Wireless Communication Systems, GS Intervision Publishers, Seoul, Korea, [2] M. Z.Win and R. A. Scholtz, On the robustness of ultra-wide bandwidth signals in dense multipath environments, IEEE Commun. Lett., vol. 2, no. 2, pp , Feb [3] Y. P. Zhang and Q. Li, Performance of UWB impulse radio with planar monopoles over a human-body propagation channel for wireless body area networks, IEEE Trans. Antennas Propag., vol. 55, no. 10. pp , Oct [4] M. Z. Win and R. A. Scholtz, Ultra-wide bandwidth time-hopping spread-spectrum impulse radio for wireless multiple-access communications, IEEE Trans. Commun., vol. 48, no. 4, pp , Apr [5] S. Erkucuki and D. I. Kim, M-ary code shift keying impulse modulation combined with BPPM for UWB communications, IEEE Trans. Wireless Commun., vol. 6, pp , Aug [6] R. Kohno, K. Hamaguchi, H. Li, and K. Takizawa, R&D and standardization of body area network (BAN) for medical healthcare, in Proc. of IEEE ICUWB 08, vol. 3, pp. 5-8, Sept [7] H. Viittala, M. Hamalainen, J. Iinatti, and A. Taparugssanagorn, Different experimental WBAN channel models and IEEE models: Comparison and effects, in Proc. of IEEE ISABEL`09, pp. 1-5, Nov [8] FCC, Spectrum Policy Task Force, Rep. ET Docket no , Nov

65 Specification and Detection of Code Smells using OCL Tae-Woong Kim 1, Tae-Gong Kim 2, Jai-Hyun Seu 2,1 1 School of Computer Engineering, Inje University, Obang-dong 607, Gimhae, Gyeong-Nam, Korea, ktw.maestro@gmail.com 2 School of Computer Engineering, Inje University, Obang-dong 607, Gimhae, Gyeong-Nam, Korea, {ktg,jaiseu}@inje.ac.kr Abstract. The words Code smells mean certain code lines which makes problems in source code. It also means that code lines in bad design shape or any code made by bad coding habits. There are a few studies found in code smells detecting field. But these studies have their own weaknesses which are detecting only particular kinds of code smells and lack of expressing what the problems are. In this paper, we newly define code smells by using specification language OCL and use these models in auto detection. To construct our new code smells model, we first transform java source into XMI document format based on JavaEAST meta-model which is expanded from JavaAST meta-model. The prepared OCL will be run through OCL component based on Eclipse plugin. Finally the effectiveness of the model will be verified by applying on primitive smell and derived smell in java source code. Keywords: Code Smells, Refactoring, OCL(Object Constraint Language), AST(Abstract Syntax Tree), Meta-model 1 Introduction Refactoring is a kind of method to improve program behavior which includes program performance, structure, maintenance, and appearance without any changes to its original functionality. This means that it changes inner structures of source code to make it easy on maintenance and improve readability but keeps all functionalities in system at the same time[1]. To reform existing program design, we better find out what should be improved before applying refactoring to its original design[3]. Martin Fowler and Kent Beck suggested a method to discriminate certain design problems from bad smells in source code[2]. They express design problems as smells metaphorically. They merely explained which refactoring method is good to remove specific smell. A few studies were introduced to determine which refactoring method is appropriate to apply for detecting a certain code smells[3,4,5,6,7]. But these studies 1 Corresponding author. Tel.: ; fax: address: jaiseu@inje.ac.kr (J.H. Seu) 55

66 have their own weaknesses. They can not only detect limited kinds of code smells but express insufficiently for the detected code smells. In this paper, we define code smells by using OCL[8] and use them for auto detecting. To realize this, we first transform java source code into XMI(XML Metadata Interchange)[10] format based on JavaEAST(Java Extended Abstract Syntax Tree) meta-model which is expanded from JavaAST[9] meta-model. The newly defined code smells model is made up in OCL. The prepared OCL code model will be run through OCL component(api for parsing and evaluating OCL constraints and queries)[11] based on Eclipse plug-in for auto detecting. In chapter 2, we look into kinds of code smells and its definition. In chapter 3, we introduce a way how to detect code smells as proposed and implemented in this paper. In chapter 4, the proposed method will be verified through the example running. Finally in chapter 5, we make results and carefully suggest future study. 2 Bad Smells in Code The word code smells mean certain code lines at any point in source code that makes problems. It is other than syntax error or warning when its compiling. It also means that code lines in bad design shape or any code made by bad coding habits. It causes increasing expenses for software development when adding new functions or changing platform. It may also decrease efficiency of whole system. Code smells are defined for two kinds as follow. - Primitive smell : simple smell can be detected from one class - Derived smell : derived smell can be detected from relations between classes Derived smell can be detected from information extracted from relation between several classes(inheritance, instances, usage of methods or fields, etc.). The following section explains studies about how to detect code smells. 3 Specification and Detection of Code Smells The following requirements should be satisfied to detect code smells from source code. First, all syntax or grammar information of source code should be represented. Second, it should be easy to access and extract semantic information from represented source code model. Finally, it should be easy to extract relational information between classes. To satisfy these requirements, we propose following figure 1 which represents that transforming java source code into JavaEAST model and specifying code smells by using OCL and detecting method. 56

67 3.1 JavaEAST(Java Extended Abstract Syntax Tree) The grammar information of source code will be presented as tree structure when java source code is transformed into JavaAST. It also produces a XML context for each java source code. But there is a problem to detect derived smell to analyze relationships between classes because it has only grammatical information. Consequently we propose extended AST model as figure 1. This is including binding information about fields. Meta-model for binding information(declaration) Meta-model for binding information(reference) Fig. 1. JavaEAST meta-model to analyze relations between classes 3.2 Code Smells specification by using OCL We can specify the definition of Long Parameter List among primitive smell by using OCL as follows. self.parameters->notempty() self.parameters->size() > maximum But derived smell is gotten by analyzing relations between classes other than primitive smell. Therefore we extract this kind of information with proposed JavaEAST model. For example, we should find out the name of class, and declared properties and methods, and extracted classes in inheritance relations to detect Refused Bequest among derived smell. The extractable information from relationships between classes will be represented in OCL as table 1. Table. 1. OCL Code for Analyzing Relationships between classes. OCL statements context Project def:getsuperclasses() :Set(TypeDeclaration) =self.compilationunits.types->collect(oclastype(typedeclaration))- >select(subclasstypes->notempty())->select(bodydeclarations- >select(oclistypeof(fielddeclaration))->notempty())->asset() def: getusedclassofsimplename(fkey : String) : Set(String) = SimpleName.allInstances()->select(expressionBinding->notEmpty()) ->select(expressionbinding.key = fkey).expressionbinding.usedclass->asset() meaning Getting super class Getting name of the class that is used variables with variable key value 57

68 4 Applied example The Refused Bequest among derived smell does not use fields or methods from super class at derived class. In other words, inherited sub class refuses to be inherited with particular fields or methods from super class. Relationship between classes and fields should be formed as figure 2 to detect Refused Bequest code smells. TD: Type Declaration FD: Field Declaration BD: Body Declaration t : subclasstypes and the class which has FD as child f : field in t, field not used in xts ts : class set that uses f of child classes which has t by superclasstype xts: class set that do not use f of child classes which has t by superclasstype Fig 2. Specification to detect Refused Bequest code smells It finds class t and field f which is satisfied following conditions as represented context in figure 2. A. Finding super class t which has children class and field B. Finding f which is satisfied following conditions in super class t a) The fields should not be used in t b) The fields should not be used in xts c) The fields should not be used by using t d) The fields should not be used by using xts e) It should not be overriding in ts If we specify conditions like A and B by using OCL as follows. context Project def : getrefusedbequestdetection(): Sequence(Tuple(superClass : String, field : String, subclass : String)) = self.getsuperclasses()->collect(stype stype.getfielddeclarationkey()->collect( sfieldkey if self.getusedclassofsimplename(sfieldkey)->includes(stype.gettypekey) then null else if self.gettypeofqualifiedname(sfieldkey)->includes(stype.gettypekey) then null else if self.getusedclassofsimplename(sfieldkey)->includesall(self.getsubclasstypekey(stype)) then null else (self.getusedclassofsimplename(sfieldkey)->intersection(self.getsubclasstypekey(stype)) ->union(self.gettypeofqualifiedname(sfieldkey))-self.getsubclasstypes(stype) ->iterate(acc:typedeclaration ; result : Set(String) = Set{} if acc.getfieldnames()->includes(self.getvariablename(sfieldkey)) then acc.gettypekey->union(result) else result endif ))->collect(target Tuple{superClass = stype.getclassname(), field = self.getvariablename(sfieldkey), subclass = self.getclassnamebykey(target)}) endif endif endif ))->excluding(null)->asset()->assequence() If we apply above OCL code to source code as represented class diagram in figure 3 and execute it by using OCL component then we get result as figure 4. 58

69 Fig 3. The only class AccessConnection is using filename among DBConnection classes We assume that only AccessConnection uses the filename field of DBConnection among classes inherited DBConnection shown in figure 3a. In this case, Refused Bequest occurs. In other words, classes other than AccessConnection refuse to inherit filename which is declared in super class. Figure 3b shows that XMI document which is transformed from figure 3a class diagram and expressed by JavaEAST model base which was proposed in this paper. Figure 4 shows the result of Refused Bequest smell detection which OCL is applied on figure 3b model. The red box in figure 4 shows detection result of AccessConnection which is the only used sub class that refuse to inherit filename field from super class DBConnection. Fig. 4. It makes output Tuple as result of detecting Refused Bequest code smells. 59

70 5 Conclusion The few studies were introduced how to find out certain code smells. But these studies have their own weaknesses. They only detect limited kinds of code smells and represent lack of expression for the detected code smells. In this paper, we precisely specify code smells by using OCL and studied how to detect them automatically by running OCL component. Especially for specifying and detecting derived smell, we proposed JavaEAST model and introduced a way how to detect them. It is not only specifying bad smells in code, but also useful. In case of adding new code smell or need to modify definition of existing code smell in the future, just defining a new OCL could be enough to handle for new code smell. Moreover specification of derived smell could be reusable because it was generated from combined OCL definition which is constructed from extracting information among already defined classes. This is a very important preceding study to develop more flexible reverse engineering tools. In the future, we need to specify and define OCL for various kinds of code smells. As a result, the studies about developing automated refactoring tools will be proceeded by using already detected and defined code smells. References 1. Fowler, M.: Refactoring: Improving the Design of Existing Programs. Addison-Wesley (1999) 2. Fowler, M.: Refactoring: Improving the Design Existing Code. Addison Wesley (1999) 3. Slinger, S.: Code Smell Detection in Eclipse. Delft University of Technology (2005) 4. Kataoka, Y., Michael D.E., Griswold, W.G., Notkin, D.: Automated Support for Program Refactoring using Invariants. In Proc. Int. Conf. on Software Maintenance, pp IEEE Computer Society Press (2001) 5. Ducasse, S., Rieger, M., Demeyer, S.: A language independent approach for detecting duplicated code. In: Yang, H., White, L. editors, Proc. Int'l Conf. on Software Maintenance, pp IEEE Computer Society Press (1999) 6. Simon, F., SteinbruUckner, F., Lewerent, C.: Metrics Based Refactoring. In Proc. 5th European Conference on Software Maintenance and Reengineering, pp IEEE Computer Society Press (2001) 7. Murphy, H. E., Black, A. P.: An Interactive Abbient Visualization for Code Smells. ACM SOFTVIS '10 Proceedings of the 5th international symposium, pp Software visualization (2010) 8. Object Management Group, Object Constraint Language Specification, Version 2.0, 9. MoDisco, MoDisco Tool- Java Abstract Syntax Discovery Tool, Object Management Group, MOF 2.0/XMI Mapping Specification, V2.1.1, Eclipse, OCL for EMF, updated

71 Temporal Distance based Particle Domain Selection Method for Large Scale Streaming System Jun Pyo Lee Software R&D Lab., LIG Nex1, 702, Sampyeong-dong, Bundang-gu, Seongnam-city, Gyeonggi-do, Korea Abstract. Network based video proxy server can store the videos in order to minimize initial latency and network traffic significantly. However, due to the limited storage space in video proxy server, an appropriate video selection method is needed to store the videos which are frequently requested by clients. Thus, we propose a temporal distance based particle domain selection method using concurrent request pattern in large scale streaming system. We exploit the short-term temporal locality of two consecutive requests on an identical video. If the video is requested by user, it is temporarily stored during the predefined interval and then, delivered to the user. Due to the limited storage area in video proxy server, it is often required to replace the old video which is not serviced for long time with the newly requested one. This replacement causes the service delay and increase of network traffic. To circumvent this problem, we also propose an efficient deletion scheme in a video proxy server. Keywords: multimedia streaming, video proxy server, delivery technique, QoS management, video-on-demand service. 1 Introduction A video proxy server is essentially a middle computer system that sits between the client and the content server which is located at the remote location [1-3]. By storing video data, a video proxy server close to the clients can be used to assist video delivery and alleviate the load of content servers. This video proxy server can partially satisfy the need for rapid multimedia data delivery by providing multiple clients with a shared storing location. The requested videos are always delivered from the content server through the video proxy server to clients, thus the video proxy server is able to intercept and store these videos to decrease the amount of video data that has to be delivered by the content server. In this context, if a requested video exists in a storage area in video proxy server, clients get a stored video, which delivery time is typically reduced. The storing and deletion technique of video proxy server is one of the key solutions to improve the performance of multimedia service systems on the large scale network environment. However, since the storage capacity of video proxy server does not have an infinite-capacity for keeping all the continuous video data, the challenge for the video proxy server is to determine which videos 61

72 should be stored or removed from the storage area of video proxy server. Due to the large sizes of streaming media objects, these algorithms are not providing optimal performance large scale network environment. In video proxy server, most media objects should be stored partially to save the storage space efficiently. This leads us to a segment-based approach to proxy storing of large media objects. The motivation of media segmentation is that we can quickly discard a big chunk of a stored media object that was once hot but has turned cold. In this way the storage manager of video proxy server can quickly adjust to the changing reference patterns of partially stored object [4-5]. However, to the best of our knowledge, none has dramatically utilized the effectiveness of media segmentation approach for large media objects and considered the time-variable request pattern by clients. Thus, we propose the temporal distance based particle domain selection method as well as time-based approach to minimize the initial latency, central server load, and network traffic significantly. 2 Temporal Distance based Particle Domain Selection In this section, we present the temporal distance based partial domain selection method for large scale streaming system. We exploit the short-term temporal locality of two consecutive requests on an identical video. Fig. 1. The scenario of temporal distance based particle domain selection method. 62

73 For this purpose, videos are divided into segments of equal length and the video proxy server loads the requested video segment which is delivered from the content server in temporary storage area and forwards it to users directly. The requested video segment is not residing in the video proxy server in first loaded in temporary storage area. But, if the same video segment is requested once again by a different user in case when the previous request is consuming the same video, we utilize the time distance between the earlier request and the latest request. Within this time distance, video proxy server can preserve the segment which is delivered by the latest request in temporary storage area. In addition, if the new request starts to utilize the video in temporary storage area, video proxy server can store the segments which are utilized by the new request in order to exploit the finitecapacity storage area efficiently. We expect that this video will be requested a lot of times within a short time. Fig. 1 depicts the detail scenario of our temporal distance based particle domain selection method. In Fig.1, prefix size and expectation time ( ) of video i is determined by average initial latency within a predefined delay time, µ i (k), and calculated by equation (1) respectively. In this paper, optimal value of predefined interval is derived by simulations. n 1 µ ( k) = i ( k) i n α (1) i= 1 where, n denotes total number of segment k and α i (k) represent the time difference between recently request time and its previous request time. Since the storage space of video proxy server does not have infinite-capacity storage for keeping all the continuous video data, the video proxy server should have an appropriate deletion algorithm to make storage space for newly stored data in case when there is not free space enough to hold the new one. The challenge for the deletion algorithm is to determine which video segment should be stored or removed from the storage area of video proxy server. The crucial aspect of deletion algorithm lies in selecting victim. This deletion causes the service delay and increase of network traffic. To circumvent this problem, we propose an efficient deletion scheme in a video proxy server. In this paper, a video is divided to two groups, that is, preservation segment group and deletion segment group using equation (2). λ i (k) = {Cnt i (k) - Cnt i (k+1)} k=1, 2,..., n. (2) where, λ i (k) denotes the difference value of request number of segment k and Cnt i (k) and Cnt i (k+1) represent request number of segment k and k+1, respectively. We select the key segment which remarks the maximum request difference value and remove the last segment when the storage capacity is running out of space. Fig. 2 depicts the preservation segment group and deletion segment group. 63

74 Fig. 2. Preservation and deletion segment group. In addition, if a segment which is located in preservation segment group of video i is not requested within a predefined delay time, µ i (k), this segment is moved to deletion segment group automatically. 3 Performance Evaluations In general, hit rate has been used as the performance measurement of the storage schemes for traditional data such as text and image. But it is not proper for continuous media data. Thus, we use the hit rate of block defined as a segment, which can reflect the proxy server management method. In addition, we also use the number of block deletion in order to verify the efficiency of storage management method more correctly. We compare our method with the well-known algorithms such as PLFU(Partial Least Frequently Used) [6], Distance-based [7] and Reallocation Affinity [8] method through simulations. In order to verify the effectiveness of the proposed method, we conduct simulation under fixed-length segmentation. We consider a set of approximately 654Mbyte long videos and assume that the request rate is 1,200 per minute. The detail simulation parameters are shown in Table 1. Table 1. Simulation Parameters. Parameter Value Simulation time 72 hours Number of video 1,200 Video size MB Bit rate 1,024 Kbps Block play time 5 sec Request time 1,200 per minute Storage size 50, 100, 150, 200, 250, 300, 350 GB In our simulation, to verify the effectiveness of our method, we conduct simulations under random access pattern when the video proxy storage sizes of 50, 100, 150, 200, 250, 300, and 350GB are used respectively. Fig. 3 shows that the hit rate of our method compared with that of other wellknown algorithms. The simulation results show that the proposed algorithm performs better than other algorithm such as PLFU, Distance-based and Reallocation Affinity method in terms of block hit rate. 64

75 Block Hit Rate Proxy Storage Size(GB) PLFU Distance-based Reallocation Affinity Proposed Algorithm Fig. 3. Comparison of block hit rate under various video proxy server storage sizes. Fig. 4 shows the efficiency of our temporal distance based partial domain selection method, where the number of deletion of our method is significantly fewer than that of other algorithms. This improvement may be due to the fact that the segmented video which represents a low request ratio can be deleted quickly in storage area. Therefore, we prove that the deletion overhead of our method is not significant compared to other algorithms Number of Deletion (1000 blocks) Proxy Storage Size (GB) PLFU Distance-based Reallocation Affinity Proposed Algorithm Fig. 4. Comparison of number of block deletion under various video proxy server storage sizes. 65

76 4 Conclusions The storing and deletion technique of video proxy server is one of the key solutions to improve the performance of multimedia applications on large scale network environment. By storing frequently accessed video data at a storage area, client perceived latency, central server load, and network traffic can be reduced significantly. However, since existing storing techniques are for traditional data such as text and image, they are not suitable to continuous media data. In this paper, we present a temporal distance based particle domain selection method using concurrent request pattern in large scale streaming system. Thus, we can save storage space without degrading performance in video proxy server. Through simulation, we evaluate the performance of our video proxy server management method and compared with other well-known algorithms such as PLFU, Distance-based and Reallocation Affinity method. We demonstrate that the introduction of the concept of partial storing and deletion leads to better performance References 1. Ching-Lung Chang, Xan-Hua Hsieh, and Wei-Ming Chen: The design of video streaming proxy in high-speed train. In: International Conference on Information Networking, pp Taiwan (2012) 2. Tingyao Wu, Koen De Schepper, Werner Van Leekwijck, and Danny De Vleeschauwer: Reuse time based caching policy for video streaming. In: IEEE International Conference on Consumer Communications and Networking, pp IEEE Press, Belgium (2012) 3. Jie Dai, Fangming Liu, Bo Li, Baochun Li, and Jiangchuan Liu: Collaborative Caching in Wireless Video Streaming Through Resource Auctions, pp In: Journal on Selected Areas in Communications, pp (2012) 4. Yuan-Tse Yu and Sheau-Ru Tong: An Adaptive Suffix-Window Caching Scheme for CM Proxy Server. In: International Conference on Network-Based Information Systems, pp Taiwan (2010) 5. Wei Tu, Eckehard Steinbach, Muhammad Muhammad, and Xiaoling Li: Page Help Proxy Caching for Video-on-Demand Using Flexible Starting Point Selection, In: IEEE Transactions on Multimedia, pp (2009) 6. Kuan Sheng Hsueh and Sheng De Wang: A Packet based Caching Proxy with Loss Recovery for Video Streaming. In: International Symposium on Dependable Computing, pp Taiwan (2002) 7. Songqing Chen, Bo Shen, Wee S., and Xiaodong Zhang: Segment-based streaming media proxy: modeling and optimization. In: IEEE Transactions on Multimedia, pp (2006) 8. Christian Spielvogel and Laszlo Boszormenyi: Quality-of-Service based video replication. In International Workshop on Semantic Media Adaptation and Personalization, pp (2007) 66

77 Establishment of Fire Control Management System in Building Information Modeling Environment Yan-Chyuan Shiau 1, Chong-Teng Chang 2 Department of Construction Management, Chung Hua University, 707, Wu-Fu Rd., Sec. 2, Hsinchu, 300 Taiwan 1 ycshiau@ms22.hinet.net, 2 m @chu.edu.tw Abstract. Fire control is an significant topic for building construction. It directly affects the safety of residents. The fire control has been integrated with ICT to accurately monitor related fire information. This study constructed the sensor and monitor locations in Building Information Modeling (BIM) based on 3D model. When the sensor is activated, the system can instantly show floor and the room plan of fire point in the 3D model and connect to monitors which are assigned to watch the suspected fire area to determine the authenticity of fire alarm. Through the video display, correctness of fire alarms can be carefully judged to prevent panics and disturbances brought about by false alarms. Keywords: Fire Control Management System, Building Information Modeling, Fire Control Equipments 1 Introduction Since Taiwan is an island country, there are many tall buildings being built in major cities to increase living space. Fire has not only leaded progress and civilization for our society, but also caused disaster and suffering or calamity. [1] If any fire in these tall buildings, it would cause tremendous loss of human life and damage to property. The traditional method for people to access previous fire control and disaster relief data archive files is both inefficient and time-consuming. [2] The development of information technologies has been so rapid, integrating fire control and computer information became a trend. With the high demand of fire control in spacious buildings, computer vision is playing a more important role. [3] Integrating fire facilities with different functions into one single platform can help user accurately judge fire disaster as well as improve rescue efficiency and accuracy with current information and communication network technologies. Previous fire control software was typically designed in 2D environment. If BIM can be integrated with fire control systems, more details information can be provided so the rescue and data accessing time can be significantly reduced to improve disaster control results. A network construction model of underground transportation hub safety monitoring center is proposed by Wang according to the disadvantages of present management mode in underground 67

78 transportation hub. [4] 1.1 Fire Control System First responders to a major incident include many different agencies. [5] Kuligowski presented a discussion of the features of protected elevator systems that can provide safe and reliable operation both for fire service access and for occupant egress during fires. [6] The monitoring scope of a building includes access control, carbon dioxide, electricity, air conditioner, and fire control. Fire control system is separated from other monitoring systems in the building and form an unique system. When fire sensors are activated, alarms and sprinklers will be concurrently started. The administration proceeds to inform the fire department and broadcast to public for evacuation and other activities. Usually a fire control system consists of a huge amount of equipment which are connected as a sensing network linked with the server. This sensing network is also connected to other fire control facilities such as fire alarm, rolling fire door, sprinkling system and smoke exhaust system and so on. With the development of information technology, the fire safety assessment of whole structure or region based on the computer simulation has become a hot topic. [7] Monitoring system mainly achieves archives of fire, security, warehouse temperature and humidity and so on, one-stop monitoring and management. [8]. Zuo presented the design solution and framework of software which implemented the integrated management and data sharing of building automation, gating, fire safety, monitor control and material management. [9] 1.2 Building Information Modeling Proponents claim that the adoption of BIM will lead to greater efficiencies through increased collaboration. [10] The BIM has been adopted by a growing number of countries. There are many cases presenting results after implementation of the BIM in construction projects. Access to accurate building information is often limited and inefficient due to the lack of preservation of building documentation and inability to communicate with building systems. Currently the academic and industrial sectors of the world have proactively implemented the BIM and its research. The BIM can be applied to several stages including planning, design, construction, and operation of the whole life cycle for a building. In the life cycle of a building, design and construction stages could take about 2 to 5 years, but operation and maintenance of the building could take 30 years or even 50 years. If BIM can extend its application to the maintenance and management of the building, its value will be more prominent. The setup of the fire prevention monitoring system in this research is categorized as an application in the operation stage of a building. So if BIM is implemented in the design stage, it will be meaningful for subsequent operating functions. Completed construction buildings can also apply BIM during the operation stage for the purposes of repair, maintenance and management. 68

79 2. Establishment of Building Model 2.1 Establishment of Sample Building This study uses the College of Architecture and Planning in Chung Hua University to create a BIM model according to the actual dimensions of the building. The main function of BIM is to create and use internal common access to project related information in the life cycle of a building. In this integrated digital environment, the information entered by the former can be fetched by others for subsequent use. This will help to improve project quality, save time, reduce costs and prevent errors. The drawings created in BIM environment can be linked and fetched through ODBC to extract related building information and construct management database. The management software can be used to reduce the inconvenience of maintenance, avoid errors, and achieve an efficient fire control administration. BIM, ER Model, ASP.net, database and Windows environment are used to develop Fire Control Management System in this study. 2.2 Creation of Database In this research, ER/Studio has been used for the creation of E-R Model in conjunction with the BIM development tool Revit Architecture to import various parameter values and unit data table into the database system using ODBC. The featured BIM is essentially a joint database; all 3D models created using the BIM are constructed from the data in the corresponding attribute tables, which are interrelated. This explains why changes to data would be propagated to other related data in real time for users to easily extract the required data. There are two type of database conversions presented in this research: one involved the use of ER-Model for the conversion of logic module to physical module before data is transferred from ODBC to MS SQL and the other would be Revit. Data conversion processes include data table conversion and conversion to relational database. 3 Functional Analysis of the System The project plans and categorizes into four major categories including Building Fire Control Monitoring, Fire Control Equipment, Space Operation and Common Data Operation. 3.1 Building Fire Control Monitoring This module is the core module of the system, which includes 3D BIM, Alarm location data, User contact data, Floor monitor location diagram, etc., and is 69

80 shown in Fig. 1. When the system receives signal from the sensor, the space number, area and assigned monitors will be listed on the display. When the data is selected, the related contact user data can be checked to display the detail information of the user. The Open link can be selected to display the monitor video in that space to recognize actual situation of that space (as shown on small display in Fig. 1). If the fire signal is correct, informing the people in that area to evacuate and other related fire accident informing activities will be performed immediately, including informing the rescue group. Fig. 1. Building fire control display and monitor video display 3.2 Fire Control Equipment The fire control equipment data module is mainly for providing the functions of maintenance and query of fire control equipment and it includes Equipment Data Query and Maintenance Record Query. A. Equipment Data Query This page is separated into two main portions including defining sorting criteria and data display of the query results. Sorting criteria defining can screen the data being queried by different conditions such as floor, area, room, equipment, equipment type, etc. This pull-down menu designed in this research can follow the mentioned design structure to automatically list out all the details of the lower-layer data that belongs to the upper-layer defined data. For example, as shown in Fig. 2, after selecting 3F of Building A, all the areas belonging to 3F will be automatically summarized and listed in the menu for selection. After selecting Area A3F0, all the room numbers belonging to that area will be automatically listed and so forth for other selections. In addition, each selection will come with a checkbox. If that selection is checked, its selected value will be listed as the expression of sorting, whereas those not selected will not be taken into consideration. If multiple items are selected, it will be an and relation, and the data matching all sorting conditions will be listed as shown in Fig. 3. B. Maintenance Record Query This page provides maintenance and query of the equipment maintenance including 70

81 numbering, date, equipment supplier, equipment name, and room number. Fig. 2. Correlated pull-down menu display Fig. 2. Equipment data query display 3.3 Space Operation This module includes Space Data Maintenance and User Data Maintenance pages. Space Data Maintenance page can maintain data of each space including numbering, name, floor, area, etc., (as shown in Fig. 4). User Data Maintenance page provides maintenance and query of related user data including name, college and department, extension number, cell phone number, , etc., (as shown in Fig. 5). Fig. 4. Space data query display Fig. 5. User data query view 3.4 Common Data Operation This module includes Staff Data Maintenance, System Log Data and Firefighting Department Data pages. Staff Data Maintenance page provides maintenance of staff data. System Log Data page provides maintenance and query of the events handling records. Firefighting Department Data page provides maintenance and query of the related data of the rescue units administrated by the firefighting department and their contact information. 71

82 4. Conclusions This study uses BIM environment to create building model and uses MS Visual Studio to develop Fire Control Management System. This article draws the following conclusions: A. Buildings integrate the equipment with ICT to perform automated control of the facilities in the buildings provide a convenient, comfortable and safe living environment. B. Building component data in BIM can be imported into database to avoid data duplication caused by human error. C. This study built a BIM for a building and established Fire Control Management System to integrate fire control equipment with building spaces. When sensors are activated, monitor video display can be instantly inspected. Through the video display, correctness of fire alarms can be carefully judged to prevent panics and disturbances brought about by false alarms. References 1. Gan, F., Huang, J., Zhao, W., Xu, Z., Zhang, Y.: Design and application of building fire safety monitor system. Progress in Safety Science and Technology Volume 4:Proceedings of the 2004 International Symposium on Safety Science and Technology, pp (2004) 2. Shiau, Y.C., Tsai, J.Y., Cheng, S.J.: Research for of Fire Control in Buildings and the Development of RFID Application Systems, International Symposium on Automation and Robotics in Construction, (2007) 3. Li, J., Wang, L., Gao, X., Wang, Z., Zhao, Q.: Monitoring system of multiple fire fighting based on computer vision. Proceedings of SPIE - The International Society for Optical Engineering, vol (2010) 4. Wang, J., Liu, S., Chen, Z., Wang, X.: Network analysis of underground transportation hub safety monitoring system. Proceedings - International Conference on Computational Aspects of Social Networks, pp (2010) 5. Messner, R.A., Hludik, F., Vidacic, D., Melnyk, P.: An integrated command control and communications center for first responders. Proceedings of SPIE - The International Society for Optical Engineering, vol. 5778, pp (2008) 6. Kuligowski, E., Bukowski, R.: Design of occupant Egress systems for tall buildings. Elevator World, vol. 53, pp (2005) 7. Shi, J., Li, Y., Chen, H.: Application of Computer Integration Technology for Fire Safety Analysis. Tsinghua Science and Technology, vol. 13, pp (2008) 8. Yang, F.: The application of software maintainability design in the intelligent warehouse archives system. Communications in Computer and Information Science, vol. 234, pp (2011) 9. Zuo, M., Yuan, X.H., Yin, W.F.: Integrated intelligent building management system based on Brower/Server mode. Nanjing Li Gong Daxue Xuebao/Journal of Nanjing University of Science and Technology, vol. 29, pp (2005) 10. Dossick, C.S., Neff, G.: Organizational divisions in bim-enabled commercial construction. Journal of Construction Engineering and Management, vol. 136, pp (2010) 72

83 Development of Computer-Aided Medication Education for Drug Abuse Prevention Seong-Ran Lee Department of Medical Information, Kongju National University, 182 Gongju daehak-ro, Kongju, Chungnam, , South Korea, Abstract. This paper is focused on the development of computer-aided medication education for drug abuse prevention. The subjects consisted of 302 patients in a general hospital which located in Metropolitan area from September 1, 2011 to December 31, The present research showed that drug abuse can be reduced to % by the computer-aided medication education. This paper presented the satisfaction of 82.2% in evaluation after patients education. In conclusion, this paper resulted in significant positive effects of drug abuse prevention and its implications could be used as the basic data for developing further systematic materials on computer-aided medication education. Keywords: Development, Computer-aided health education, Prevention, Drug abuse 1 Introduction For the last twenty years, the drug abuse has been increasing constantly in Korea. Overall proportion of drug abuse experience in middle school students was over 20 percents. 20.1% of the pain-killers, 1.5% of the stimulants, 1.5% of the narcotics. 1.6% of the cough mixtures[1]. Drug abuse not only makes the psycho-physiological decrease but also self-work and ego developmental problem[2],[3]. Moreover, danger of suicide occurs too. Therefore, drug abuse needs social concern and total countermeasure to prevent it. In the case of foreign advanced programs, both the parents and their children become the subjects of the programs, and these programs offer not only the various contents of parents training to the parents but also home study to both of them[4],[5]. We don t have any national program at all about it. In order to solve the urgent problem, we should look for the practical plans. There were few studies to deal with computeraided medication education for drug abuse prevention until present in Korea. Thus, this paper developed computer-aided medication education for drug abuse prevention and ultimately analyzed the education effect through its application. This will take advantage of basic data for researcher and indicate the direction of their computer-aided medication education in the future. 73

84 2 Materials and Methods 2.1 Materials This survey was conducted with 302 patients who have visited psychiatry of a general hospital which located in Metropolitan area from September 1, 2011 to December 31, The computer-aided medication education for drug abuse prevention was performed four times for four months using Video, CD-ROM, teaching, case study, discussion, and others by training researchers. And then the education effect was estimated by the reduction of drug abuse after education as compared to before education. In this work, the reduced value of drug abuse after computer-aided medication education was plotted as a function of time elapsed after education by gender ; 30, 60, 90 and 120 days. After education, the researcher implemented the evaluation about computer-aided medication education to each participant. 2.2 Methods Basic information of study subjects was measured by percentage and number. The pairwise t-test was done to compare the reduced value of drug abuse before and after computer-aided medication education. And then average and standard deviate were obtained. The chi-square test analyzed the differences in the satisfaction between the two genders after computer-aided medication education. 3 Results 3.1 Basic Information of Study Subjects Table 1 presents basic information of study subjects. Male subjects(51.0%) were higher than female subjects(49.0%). In the examination for classification of religion, subjects who do not a religion are the highest with 40.1%, and then Christians 24.5%, Buddists 15.9%, in order. From the investigation for information of drug use, it revealed that the most subjects(75.2%) purchased drugs after contact mass-media. Table 1. Basic Information of Study Subjects Variables N(%) Variables N(%) SGender Male 154(51.0) Buddism 48(15.9) Female 148(49.0) Catholic 43(14.2) Age 29 31(10.3) Others 16(5.3) (17.2) Drug information Mass-media 227(75.2) (28.5) Book 6(2.0) (31.1) Friend/relatives 41(13.6) 60 39(12.9) Others 4 28(9.3) Marital status Single 92(30.5) Education Under middle 7 59(19.5) Married 210(69.5) High school 145(48.0) Religion No religion 121(40.1) Over college 98(32.5) Christianity 74(24.5) Total 302(100.0) Total 302(100.0) 74

85 3.2 Information on Drug Abuse of Study Subjects Table 2 presents information on drug abuse of study subjects. 68.5% of the subjects was current cigarette smokers. 75.8% of the subjects had drinking alcohol. 87.4% of the subjects had taken inhaler such as glue or gas sniffers. 92.7% of the respondents used hallucinogenic drugs. Motives of drug abuse were the highest with 33.0% in curiosity, and then 28.4% in troubles, 22.3% in stress relax, respectively. Table 2. Information on drug abuse of study subjects Variables N(%) Variables N(%) Cigarette Smoking 207(68.5) Hallucinogenic Yes 280(92.7) smoking No Non-smoking 95(31.5) Agent No 22(7.3) Alcohol Yes 229(75.8) Hypnotic Yes 293(97.0) Drinking No 73(24.2) No 9(3.0) Motives of Urging by 62(30.0) Motives of Urging by 24(9.1) Smoking friends drug abuse Friends Troubles 7(3.4) Tr Troubles 75(28.4) Stress relax 34(16.4) Stress relax 59(22.3) Curiosity 89(43.0) Curiosity 87(33.0) Others 15(7.2) Others 19(7.2) Inhaler Yes 264(87.4) No 38(12.6) Total 302(100.0) Total 302(100.0) 3.3 Comparison of the Drug Use Before and After Computer-Aided Education Table 3 was compared the drug use before and after computer-aided medication education. The results verified the significance of medication education on the subjects inhaler after education as compared before education(t =-27.94, p=.004). The attitudes of the subjects who used drugs changed markedly more after the medication education. Table 3. Comparison of the Drug Use Before and After Computer-Aided Education Before education After education Mean±S.D. Mean±S.D. t p Inhaler 1.34± ± Hallucinogen 0.98± ± Hypnotic 0.71± ± Cigarette smoking 1.50± ± Alcohol drinking 1.69± ± Comparison of the Drug Use by Gender Table 4 was done to compare the drug use by gender. The result comparing the mean scores of female with 2.27 point was higher than male with 1.87 point in hypnotic and revealed the significant difference(t=-271, p=0.03). However, male(1.49±1.35) was more to take a drug than female(1.43±1.16) in Hallucinogen. 75

86 Table 4. Comparison of the Drug Use by Gender Male Female Variables Mean±S.D. Mean±S.D. t P Inhaler 1.62± ± Hallucinogen 1.49± ± Hypnotic 1.87± ± Cigarette smoking 1.29± ± Alcohol drinking 1.50± ± Durability of Education Effect After Computer-aided Medication Education Fig. 1 was done to compare the durability of education effect as a function of time elapsed after computer-aided medication education in two gender. It was investigated that the education effect was higher in male than female after the lapse of 30 days since the computer-based medication education. However, the education effect was lower in male than female after the lapse of 90 days since the education. *Slope= Y Where X : time interval X Y : variation of education effect *Ratio= Ya Where Yb : the rate of drug abuse prevention before education Yb Ya : the rate of drug abuse prevention after computer-aided medication education Fig. 1. Durability of Education Effect After Computer-aided Medication Education 3.6 Evaluation of the Satisfaction After Computer-Aided Education by Gender Table 5 presents the evaluation of the satisfaction after computer-aided medication education by gender. 38.3% of male answered very sufficient for time assigned for education while 43.2 % of female was sufficient for it. On the other hand, for methods 76

87 for drug abuse prevention, 42.9% of male answered the most an appropriate education for subjects while 53.4% of female was the highest the emphasis on health importance. There was a significant difference between two groups(x²=9.38, p<.05). Variables Table 5. Evaluation of the Satisfaction After Computer-Aided Education by Gender Male Mean±S.D. Female Mean±S.D. Total X² Appropriateness of teaching method Very high 58(37.7) 64(43.2) 122(40.4) 7.62 High 60(39.0) 47(31.8) 107(35.4) Fair 25(16.2) 21(14.2) 46(15.2) Low 7(4.5) 9(6.1) 16(5.3) Very low 4(2.6) 7(4.7) 11(3.6) Time assigned for education Very sufficient 59(38.3) 48(32.4) 107(35.4) Sufficient 37(24.0) 64(43.2) 101(33.4) Fair 42(27.3) 29(19.6) 71(23.5) Insufficient 11(7.1) 6(4.1) 17(5.6) Very insufficient 5(3.2) 1(0.7) 6(2.0) Understanding of education contents Very high 71(46.1) 86(58.1) 157(52.0) High 48(31.2) 39(26.4) 87(28.8) Fair 25(16.2) 17(11.5) 42(13.9) Low 7(4.5) 4(2.7) 11(3.6) Very low 3(1.9) 2(1.4) 5(1.7) Methods for drug abuse prevention A Emphasis on health importance 49(31.9) 79(53.4) 128(42.4) 9.38* Appropriate education for subjects 66(42.9) 35(23.6) 101(33.4) Adoption of evaluation system 18(11.7) 20(13.5) 38(12.6) Others 21(13.6) 14(9.5) 5 35(11.6) t Total 154(100.0) 148(100.0) 302(100.0) * p<.05 4 Discussion This paper was aimed to evaluate the education effects of drug abuse of patients. This experimental research has been conducted to find out the actual status of drug abuse by the patients, and then to draw up plans for preventive and recuperation from the addicted condition to improve quality of life. The result of this paper, for motives of drug abuse was the highest in curiosity. The significance of computer-aided medication education of the subjects inhaler showed after education as compared before education. The finding was consistent with the result of earlier researches[6],[7]. Therefore, it needs to perform systematic medication education. That is, there is a need for a separate program to be implemented on the groups who characterize having lower levels of health knowledge and health promotion behavior 77

88 The present research was estimated that the education effect was higher at male than female after the lapse of 30 days since the education. However, the education effect was lower at male than at female after the lapse of 90 days since the education. Thus, year-based education should be performed more often male than female. The present research showed that drug abuse can be reduced to by the education, which is similar to data reported in the previous studies[8],[9]. However, it should be noted that the education effect does not maintain for so long. In order to maintain the education effect well, it is very important to determine adequate education period and perform various programs in consideration of patient circumstances. The objective measurement on the changes of the behaviors of the patients would be more valuable than mere abstract testimonies that are only response to the questions provided by the subjects. The present work elucidated throughout the statistical analysis how effectively the synthetic and systematic education contributes to improve quality of life through drug abuse prevention. The future work should focus on the study of the education effect as a classification of patient throughout more prolonged research based on a larger data base. 5 Conclusion In conclusion, this paper identified positive effects of medication education for the drug abuse prevention. The computer-aided medication education can be used as an effective method to improve medication knowledge and to reduce medication misuse and abuse. References 1. Hyo Chul Lee, Knowledge and the Prehospital Care of Emergency Medical Technicians for Chonnam National University (2010) 2. Thomas I., : Dezelsky A., : Jack V., : Toohey, Robert Kush : A Ten-Year Analysis of Non- Medical Drug Use Behavior at Five American Universities. vol. 51, pp The Journal of School Health (2004) 3. Hellman R., : Chinmpfhauser F.T., : Kunz M. K., : Self-Initiated Smoking Cessation in College Students, pp J. AM College Health (2001) 4. Jack, V., : Toohey J., : Thomas L., : Dexelsky. A., : Six-year Analysis of Patterns in Non- Medical Drug Use Behavior, vol. 48, pp , The Journal School Health (2008) 5. Pirastu S., : Arciti C., : Evaluation of An Antismoking Educational Programme Among Adolescents in Italy, vol. 11, p. 28, HYGIE (2009) 6. Raymond Tricker, Lorraine G., : Davis : Implementing Drug Education in School : An Analysis of the Costs Teacher Perceptions, vol. 58, pp , Journal of School Health (2006) 7. Rubinson L. and W. F Alles, : Health Education : Foundations for the Health, pp , St. Louis : Times Mirror / Moby College Publishing (2000) 8. Simonds J.F, and Kashani J., Drug Abuse and Criminal Behavior in Delinquent Boys Ad mitted to A Training School, vol. 136, pp , Am, J. Psychiatry (2009) 9. Jack, V., toohey Thomas I., Dexelsky. A., : Six-year Analysis of Patterns in Non-Medical Drug Use Behavior, vol. 48, pp The Journal of School Health (2003). 78

89 Concurrent Computation for Genetic Algorithms Kittisak Kerdprasop and Nittaya Kerdprasop Data Engineering Research Unit, School of Computer Engineering, Suranaree University of Technology, 111 University Avenue, Nakhon Ratchasima 30000, Thailand {kerdpras, Abstract. Genetic algorithms are kind of metaheuristic method in that they apply evolutionary techniques (best-fit selection, crossover, and mutation) to find optimal solution by iteratively attempting to improve current candidate solution with respect to the fitness criteria. The use of heuristic-based search through the application of reproduction, recombination, and mutation mechanisms keeps genetic algorithms from exploring the entire search space, and thus converging to the best solution quickly. We study the mechanisms of genetic algorithms and suggest that they can perform the search procedure more quickly with the concurrent programming paradigm. In this paper, we present the implementation of concurrent genetic algorithms with Erlang, which is the powerful functional language that provides message passing features for concurrent processing. Source codes for a simple mathematical problem using genetic algorithms are provided in the paper as well as the running results. The experimental study confirms the computational time efficiency of our concurrent genetic algorithms comparative to the sequential coding style. Keywords: Concurrency, Genetic Algorithms, Erlang Language, Functional Programming, Message Passing. 1 Introduction Evolution in the nature has inspired several computational models including genetic algorithms, genetic programming, and evolutionary programming to solve search and optimization problems. Genetic algorithms have been successfully applied to solve planning and parameter tuning problems in manufacturing, computational sciences, mathematics, business, and many other fields. The success of genetic algorithms is due to their simple procedure of evolving successive generations of individuals to quickly converging the search strategy to the best solution that has been encoded as individuals in the population. We investigate the robust search technique of genetic algorithms and propose that the algorithms can be improved via concurrency. In this paper, we introduce a simple model of concurrent genetic algorithms and show their effectiveness in terms of a manageable size of program source code and computational time. Our implementation is based on the concept of functional programming that provides a declarative style of coding and also a set of features for handling concurrent processing. 79

90 2 Preliminaries and Related Work Genetic algorithms are search and optimization methods inspired by the natural selection process that causes biological evolution [7]. At the initial stage, genetic algorithms model a population of individuals by encoding each individual as a string of alphabets called a chromosome. Some of these individuals are possible solutions to a problem. To find good solution quickly, the algorithms emulate the strategy of nature, that is, survival of the fittest. Individuals that are more fit, as measured by the fitness function, are more likely to be selected for reproduction and recombination to create new chromosomes representing the next generation. Reproduction and recombination are normally achieved through the probabilistic selection mechanism together with the crossover and mutation operators. As a consequence of their simple and yet effective search procedure, genetic algorithms have been successfully applied to solve different kinds of work ranging from optimization problem in large building structures [1], telecommunication routing protocol design [6], concurrent engineering for manufacturing design [2], to the data structure design [11] and deadlock detection [3]. Parallel computation for genetic algorithms has been proposed for at least two decades to speedup the computational time. Cantu-Paz [5] proposed the Markov chain models for parallelization of genetic algorithms. Sehitoglu and Ucoluk [9] proposed parallelization at a fine grain level of chromosome bits. Lim and colleagues [8] parallelized genetic algorithms using grid computing. The recent work of Tagawa [10] presented his study on the concurrent differential evolution using multi-core architecture. Our work presented in this paper propose a simpler scheme toward high performance computing using message passing mechanism, instead of a more sophisticated techniques appeared in the literature. The work of Bienz et al [4] is close to ours, but their process interaction scheme is more tightly coupled than our scheme. 3 Functional Programming to the Implementation of GAs The implementation of genetic algorithms uses a simple mathematical problem: find the maximum squared number of an integer from the search space of mixed positive integers ranging from 1 to 16,777,127. The correct solution is 281,472,426,579,600. Main module of our program is the function go() that takes three parameters, that is, the population size, probability of mutation, and probability of crossover. Program source code in Erlang is given as follows: go(ps,pm,pc)-> p([max_is, max()]), Popu = init(ps, space()), evol(ps, PM, PC, Popu, maxloop(),false). max()-> round(math:pow(2,bit())-1). bit()-> 24. % 24=2**24 instances including 0 maxloop()->150. correct()-> space()-> lists:seq(1,round(math:pow(2,bit())-1) ). init(ps,l)-> random:seed(erlang:now()), Pop = randw(l,ps), lists:map(fun encode/1,pop). randw(_,0)-> []; % random polulation with replacement randw(l,n)-> [lists:nth(random:uniform(length(l)),l ) randw(l,n-1)]. evol(ps,pm,pc,popu,0,_)-> p([in_each_evol,hd(popu)]),hd(popu) ; evol(ps,pm,pc,popu,_,true)-> p([in_each_evol2,hd(popu)]),hd(popu) ; evol(ps,pm,pc,popu,max,false)-> 80

91 PopuNew = xover(pm,pc,popu)++popu, Lout = sel(ps,popunew), [{Tmp,_,_} _] = Lout, Percent=Tmp/max(), p([after_evol_loop, maxloop()-max+1,max,tmp/max()]), OverThresh = Percent>correct(), evol(ps,pm,pc,lout,max-1,overthresh). sel(ps,popu)-> %select good parent Lsort=lists:sort(fun ({_,_,X},{_,_,Y})-> X>Y end, Popu), {L1,_}=lists:split(PS,Lsort), L1. % select best rank xover(pm,pc,[])-> []; xover(pm,pc,[x1,x2 T])-> xv(x1,x2,maybe(pc),pm)++xover(pm,pc,t). xv(x1,x2,false,pm)-> [X1,X2]; % no crossover xv(x1,x2,true,pm)-> cross(x1,x2,pm). % crossover cross({_,x1,_},{_,x2,_},pm)-> Rand=random:uniform(length(X1))-1, {L1,L11}= lists:split(rand,x1), {L2,L22}= lists:split(rand,x2), Xnew1= mutstring(l1++l22,pm), Xnew2= mutstring(l2++l11,pm), % mutate V1= decode(xnew1,bit()),v2= decode(xnew2,bit()), [{V1,Xnew1,fitness(V1)},{V2,Xnew2,fitness(V2)}]. mutstring([],pm)->[]; mutstring([h T],PM)->Prob=maybe(PM), if Prob-> [(H+1) rem 2 mutstring(t,pm) ]; % mutate true -> [H mutstring(t,pm) ] % no mutate end. maybe(prob)-> random:uniform() < Prob. encode(n)-> { N, bitof(n,bit()),fitness(n) }. decode([],_)->0; decode([h T],B)-> round(h*math:pow(2,b-1))+decode(t,b-1). bitof(_,0)->[]; bitof(n,b)-> bitof(n div 2,B-1)++[N rem 2]. fitness(a)-> A*A. p(l)-> lists:foreach(fun(h)->io:format("~p ",[H])end,L),io:format("~n"). 4 Concurrent Genetic Algorithms Implemented with Erlang On the design of concurrent computation (Figure 1), we try to keep the message communication as simple as possible. The main process simply creates the child process and waits for the first best result to arrive. As soon as the main process receives the first solution, it will kill other processes that are still active. This problem has only one best solution, so we accept the first one. Implementation of this concurrent scheme is in Figure 2, whereas its running result is illustrated in Figure 3. Fig. 1. A model of message passing in concurrent genetic algorithms. 81

92 -module(ga). % concurrent processes -compile(export_all). main([p1,m1,c1],[p2,m2,c2]) -> Pid2 = spawn(?module, process, [P1,M1,C1]), Pid3 = spawn(?module, process, [P2,M2,C2]), %parameter: PopSize, ProbMutation, ProbCrossover Pid2! {self()}, Pid3! {self()}, p([all_pid,pid2,pid3]), receive {Pid, Msg} -> io:format("p ~w Value=~p~n",[Pid,Msg]), exit(pid2,kill),exit(pid3,kill) end. process(ps,pm,pc) -> R = go(ps,pm,pc), receive {From}-> From! {self(), R} end. Fig. 2. Concurrent genetic algorithms with two active processes in Erlang. Fig. 3. Running result of concurrent genetic algorithms with two processes. The first process (process-id = <0.40.0>) has population size = 16, probability of mutation = 0.05, and probability of crossover = 0.9. The second process (process-id = <0.41.0>) has population size = 32, the other two parameters are the same as the first process. 5 Performance Evaluation We design a series of experimentation to compare performance of sequential genetic algorithms against the concurrent implementation. The problem domain is the same as those demonstrated in Sections 3 and 4. The number of processes in the concurrent implementation has been varied from 2, 4, 8, 16, 32, 64, and 128. When the number of processes has been increased to 256, memory capacity is not enough for the Erlang system to reach the completion stage. If we, however, decrease the problem domain, the Erlang system can spawn more than hundreds of processes. To record running time of genetic algorithms, we use the following commands: f(),t1 = erlang:now(), ga:main([8,0.05,0.8],[40,0.01,0.5]), T2 = erlang:now(), timer:now_diff(t2,t1)/1.0e6. 82

93 The f() function is for clearing buffer. Function now() is the clock function available in the Erlang shell. We start the concurrent process by calling function main(). In the above example, concurrent genetic algorithms with two processes have been invoked. The deduction of start time from the stop time will yield the running time. We also change the time unit from microsecond to second. The concurrent genetic algorithms coding can be easily adjusted to spawn more than two processes. Running time of 2 to 128 processes have been summarized and shown in Table 1. Table 1. Running time (in seconds) of sequential and concurrent genetic algorithms. Number of Running time (seconds) Time Time Processes Concurrent GA Sequential GA Reduction Saving (%) Fig. 4. Computational time comparison of sequential versus concurrent genetic algorithms. It can be seen from the experimental results (Figure 4) that concurrent genetic algorithms with 16 processes give the best computation performance. When the number of spawned processes is higher a hundred, concurrency yields poorer performance than serial computation. This is mainly because every time the main process spawn a child process, there is an overhead cost of message passing. For this specific simple problem, we should not concurrent more than 16 processes. The optimal number of processes is however subjective and varied according to the problem domain. Empirical study is essential for the best parameter setting. 83

94 6 Conclusion Genetic algorithms have gained interest because of their robust search characteristics. The algorithms mimic an evolutionary process of the nature such that at each generation of the search process, individuals with higher fitness values are considered good candidate solutions. The algorithms compensate the lost parts of unexplored search space by the crossover and mutation mechanisms. Genetic algorithms can therefore find the best solution quickly. We are interested in tuning the search performance of genetic algorithms by the programming technique known as concurrency. In the paper, we illustrate concurrent genetic algorithms implemented with Erlang. The functional style of Erlang provides advantage to the minimal size of codes, whereas its concurrency facility supports the rapid implementation of concurrent processes. The source codes provide in the paper is a simple form of concurrent model. We plan to extend our research to a more effective model. Other kinds of genetic algorithms such as adaptive algorithms and neuro-genetic algorithms are also in the main line of our future research direction. Acknowledgments. This work has been supported by grants from the National Research Council of Thailand (NRCT) and Suranaree University of Technology via the funding of Data Engineering Research Unit. References 1. Adeli, H., Cheng, N.-T.: Concurrent genetic algorithms for optimization of large structures. J. of Aerospace Engineering 7, 3, (1994) 2. Al-Ansary, M.D., Deiab, I.M.: Concurrent optimization of design and machining tolerances using the genetic algorithms method. Int. J. of Machine Tools and Manufacture 37, 12, (1997) 3. Alba, E., Chicano, F., Ferreira, M., Gomez-Pulido, J.: Finding deadlocks in large concurrent java programs using genetic algorithms. In: 10th Int. Conf. on Genetic and Evolutionary Computation, (2008) 4. Bienz, A., Fokle, K., Keller, Z., Zulkoski, E., Thede, S.: A generalized parallel genetic algorithm in Erlang, In: Midstates Conf. on Undergraduate Research in Computer Science and Mathematics (2011) 5. Cantu-Paz, E.: Markov chain models of parallel genetic algorithms. IEEE Transactions on Evolutionary Computation 4, 3, (2000) 6. Genco, A., Lopes, S., Lo Re, G., Tartamella, M.: Routing optimization by concurrent genetic algorithms. In: 4th Int. Conf. on Applications of High-Performance Computing in Engineering, ( Holland, J.H.: Adaptation in Natural and Artificial Systems. Univ. of Michigan Press (2004) 8. Lim, D., Ong, Y.-S., Jin, Y., Sendhoff, B., Lee, B.-S.: Efficient hierarchical parallel genetic algorithms using grid computing. Future Generation Computer Systems 23, (2007) 9. Sehitoglu, O.T., Ucoluk, G.: Gene level concurrency in genetic algorithms. In: Int. Symp. on Computer and Information Sciences, (2003) 10. Tagawa, K.: A statistical study of concurrent differential evolution on multi-core CPUs. In: The Italian Workshop on Artificial Life and Evolutionary Computation, (2012) 11. Zamanifer, K., Koorangi, M.: Designing optimal binary search tree using parallel genetic algorithms. Int. J. of Computer Science and Network Security 7, 1, (2007) 84

95 A Practical Context Awareness Information System for VANET based on IEEE 1609 Tzu-Kai Cheng 1, Jenq-Shiou Leu 1, Ing-Xiang Chen 2, Jean-Lien C. Wu, Fellow, IEEE 3, Zhe-Yi Zhu 1 1 Department of Electronics Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan {d ,jsleu,b }@mail.ntust.edu.tw 2 Ericsson Taiwan Ltd., Taipei, Taiwan ing-xiang.chen@ericsson.com 3 Department of Computer and Communication Engineering, St. John s University, Taipei, Taiwan jcw@mail.sju.edu.tw Abstract. With the popularization of consumer electronic products and the promotion of digital mobile devices, telematics services are more important and practicable when the global positioning system (GPS) is commonly used in today s lives, especially adopting context awareness technologies. In this paper, we present a novel approach to support the environment of an open architecture for telematics services based on international standards, and the proposed approach (Context Awareness Application Server) can optimize context awareness mode features to provide personal tracking services to children and elders for safety reason. The simulations and experiments in this study have verified the effectiveness of the proposed approach. Keywords: Context Awareness; IEEE1609; VANET; Telematics Platform; Particle Filter 1 Introduction Vehicular ad hoc network (VANET) is a critical technology for automotive electronics applications. Most recently, VANET is an emerging technology which rapidly grows up and presents a very active field of research, development, standardization. Throughout the world, there are many national and international projects in government, industry, and academia devoted to VANET. These projects include consortia like Vehicle Safety Consortium (US), Car-2-Car Communication Consortium (Europe), Advanced Safety Vehicle Program (Japan), standardization efforts like IEEE p (WAVE), and field trials like the large-scale Vehicle Infrastructure Integra- 85

96 tion Program (VII) [1] in the US. VII field trials are mainly focused on telematics platform that are related to IEEE 1609 protocol. In order to address the challenges of developing context awareness telematics platform, this paper presents our service-oriented approach to utilize data mining algorithm to convey and manage the transportation mode of the proposed system. The rest of the paper is organized as follows. Brief introductions related to context awareness over telematics system are depicted in Session 2. In Session 3, the proposed CAAS system model is described. We cover basic components of generic architecture of CAAS within telematics system including real-time monitoring and personal mobility analysis. Simulation results are shown in Session 4 to present the accuracy performance of the proposed scheme. Finally, this paper is concluded in Session 5. 2 Background The emerging telematics is considered to be a key feature in mobile computing networks, such as vehicular network or cellular network. Car in the near future will likely be equipped with many embedded computing platforms capable of running general purpose applications. Our critical research topic is towards the development and introduction of application-specific telematics platform for mobile communications, while considering the scalability and interface issues. In a vehicular network, an OBU (On Board Unit) may attach from one RSE (Road Side Equipment) to another RSE or a base station of another wireless network while moving. When integrated into the transportation system infrastructure (through RSEs), and in vehicles themselves (through OBUs), these technologies help monitor and manage traffic flow, reduce congestion, provide alternate route to travelers, and save lives. Tran et al. [2] presented a service-based approach to support the structural and behavioral adaptation of automotive telematics. In particular, the structure enables the separate development of telematics and the management of context and adaption. However, much work needs to be done to provide modeling/implementation methods and tools to support developers using such an approach. Choudhury et al. [3] developed a context aware framework to address the diverse communication needs of a modern enterprise. This method determines an optimal request-to-agent routing based on several metrics of effectiveness depending on the communication context. In other studies, Zhang et al. [4] [5] presented an approach based on supervised learning to automatically infer users transportation modes, including driving, walking, taking a bus and riding a bike, from raw GPS logs. This approach consists of three parts: a change point-based segmentation method, an inference model and a graph-based postprocessing algorithm. We present our work in designing and building a telematics platform that can be used to deploy a variety of intelligent transportation applications. Current platform consists of three major components: CAAS, CAP (Context Awareness Platform), and OBU. CAAS is an application service that is based on CAP and provides capability of identifying current context of the user by analyzing dynamic data from clients. This is the essential to potential extensions to many mobile services. CAP is a platform which connects between multiple users, APs, OBUs, and GPS Trackers. It stores the infor- 86

97 mation (e.g., latitude/longitude/speed) sent from OBU/GPS tracker into database and provides some functions to process the requests sent from APs. An OBU is embedded into each vehicle to connect to CAP to transmit up-to-date vehicle status, including vehicle location, OBD II (On Board Diagnostics II) information, and sensed information such as engine load, remaining fuel, and tire pressure. 3 Context Awareness Application System Architecture 3.1 CAAS System Model In our project, we built a GPS-assisted wireless application to validate the design and implementation of telemaitcs platform that to provide service to end users. Fig. 1 shows the proposed system model with CAAS. As the function diagram below, CAAS service logic package classes include communication module, CAAS handler, notification handler, and utility. Communication module is used to process incoming XML data from the XML interpreter. Utility package includes database connection, XML parser, and sending request through CAP broker, and will be called when needed. Fig. 1. The Core CAAS Main Functions The key processes are described as follows: first CAAS core module obtains context raw data from CAP coordinates; then CAP are responsible for collecting all different kinds of raw data into one centralized interface to the upper layer. For the static data that previously saved in the CAAS database, it can be acquired form software sources or hardware system logs. Then context interpreter is the main functions which constantly receive real-time raw data from sources to translate them to high-level information. And context interpreter includes the data mining algorithms that are responsible for predicting the user s current situation like on-foot, motorcycle or bus from other contexts depending on GPS coordinates, speed, acceleration, and served units. There are one routine to trigger the transportation status to OBU when the method changed. Finally CAAS manager has a knowledge base which can provide flexible, scalable, and reliable query interface to context services which can retrieve 87

98 context information using both query and subscription/notification mechanisms. The different kinds of context services can adapt them into GUI accordingly. Each component will be introduced briefly as follows: Real-time monitoring: Every GPS user that is under the surveillance of CAAS would be shown on the map in real-time. For the security reason, only the CAAS user can see the GPS users that s/he registers with. Personal mobility analysis: When the GPS user is shown on the map, CAAS also generates related information to display on the map, such as speed, idle time, etc. In addition, it provides information of this GPS user s mobility status, such as on bus, on foot, on bike. Geo-fence protection: GPS users have their own geo-fence that is previously defined by CAAS users. Once GPS users enter or exit certain geo-fence, CAAS will send alert to the designated contact that is also previously provided. Tracking and behavior report: CAAS provides map tracking and behavior report. Behavior reports show the distribution of time that each GPS user spends in the difference geo-fences and under different mobility status. This function can basically extend CAAS s usage not only for safety, but also personal health and business management. 3.2 The CAAS OBU Device The CAAS OBU is a new design to achieve high performance and to support multiple applications on a vehicle. To achieve the feature that one OBU core can simultaneously support multiple OBU user interfaces, the OBU core is a middleware between the CAP and OBU clients. The OBU core has to connect CAP s authentication handler to perform authentication whenever it is started, and therefore it initiates connections to CAP s communication handler and context handler when it is already authenticated. The OBU core also provides socket interfaces for OBU clients to acquire necessary information, such as GPS location, meter state, and OBD-II states, and to communicate with the CAP through OBU core s communication handler. CAAS user interface communicates with OBU core through CAAS logic. CAAS logic is responsible for maintaining connections, sending probe information, and handling events. CAAS client also needs to request group members locations from the CAP s web service after receiving the user ID from CAAS web administration to show group member s locations in Google Map. After user login, CAAS client will receive events from CAP s communication handler and display group members latest status and transportation state. In addition, CAAS user interface will display user s up-to-date locations by clicking on the Google Map browsing button and sending the query request to CAP s web service. The screenshot in Fig. 2 (a) is the usual operational appearance with following elements: status, location, alert. In Fig. 2 (b), the screenshot on the right is user s location displayed on the map surrounding him. Supposing a user would like to learn his own position, his position related map data surrounding him will be displayed using a browser with his position & traffic status marked on map. Such display is triggered on prompt when the central magnifier-like icon is clicked. Note that the zoom level of the map is to be adjusted in accordance with user s status. 88

99 Fig. 2(a). CAAS OBU Interface Fig. 2 (b). CAAS with Google Map 4 Experimental Evaluation We took the experiments on the CAAS, the most important application in the proposed telematics system. The simulator simulated the behavior of entities in different traffic status by using the pre-collected data from different real traffic status. We collected a log of 10 days of GPS raw data from GPS phone. Assume transportation modes in the experiment are walking, biking, riding bus and stay; these modes influence the motion velocity. In this experiment, we first focused on the estimation of location and modes of transportation. For simplicity, the rule-based method was proved to adapt well in the environment of the CAAS, we also included a particle filter (PF) [6] model groups in comparison with CAAS rule-based method groups. PF is a variant of Bayes filters for estimating the state of a dynamic system. They represent posterior distributions over the state space with temporal sets, Xt, of n weighted samples: This definition is written as follows: i i X t s t, w t i 1,..., n (1) t i i where each s is a sample (or state), and the w are non-negative numerical factors called importance weights, which sum up to one. We compared the performance of the rule-based method and the PF model under the same conditions. Fig. 4 shows the recall rate for the rule-based method and the PF model using different kinds of transportation modes. The PF model achieves higher accuracy since it updates the posterior distribution according to the sampling with re-sampling procedures. Once the transportation mode is sampled, the motion velocity is sampled from a mixture of Gaussians probability which is conditioned on the mode. There are two methods considered in two scenarios. Scenario 1 is shown in Fig. 3 (a) and the above-mentioned four transportation modes are employed to simulate scenario 1. Therefore, the average values of recall in the PF model and the rule-based method are 80.12% and 67.31%, separately. In the rule-based method, the average recall rate based on speed changes is about 12% lower than the PF model. Scenario 2 is basically the same as scenario1 except the transportation mode of walking and is revealed in Fig. 3 (b). The average values of recall in two methods are 67.62% and 62.71%. The PF model gets the better t 89

100 performance results. However, the variation between the two methods is not obvious; the average value of recall in the rule-based method is almost the same as that in the PF model. According to the simulation results, the PF mode is just a little better than the rule-based method with the improvement of about 5%. Fig. 3 (a). Recall of the rule-based method and the PF model based on speed changes Fig. 3. (b). Recall of the two methods based on the accelerating speed changes 5 Conclusion In this project, we aim at investigating in-car wireless network, inter-vehicle network and OBU devices communication protocol. We have worked on a comprehensive state-of-the-art analysis of the emerging VANET technologies and protocols that hold the potential of supporting a multitude of advanced and innovative applications in the next generation of intelligent vehicles. This work also presents a new distributed traffic information system for computing transportation mode based on the limitation of GPS raw data. In the future, we will strive for improving the predication performance of CAAS by designing a more sophisticated algorithm. References 1. VII Architecture and Functional Requirements Version 1.1, July Minh H. Tran, Alan Colman and Jun Han, Service-based Development of Context-aware Automotive Telematics Systems, in 15th IEEE International Conference on Engineering of Complex Computer System (ICECCS), pp.53-62, March Munmun De Choudhury, Hari Sundaram, Ajita John, and Doree Duncan Seligmann, Context Aware Routing of Enterprise User Communications, in the proceedings of the fifth annual IEEE international conference on pervasive computing and communications workshops (PerComW 07), March Yu Zheng, Lizhu Zhang, Zhengxin Ma, Xing Xie and Wei-Ying Ma, Recommending Friends and Locations Based on Individual Location History, in ACM Transactions on the Web, Vol. 5, No. 1, February Yu Zheng and Xing Xie, Learning Travel Recommendations from User-Generated GPS Traces, in ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 1, January Lin Liao, Donald J. Patterson, Dieter Fox, and Henry Kautz, Learning and Inferring Transportation Routines, in Artificial Intelligence 171(5-6), pp ,

101 Evaluation of Assessment Tools for High-care Student Groups in Vocational High Schools Chen-Feng Wu 1, Chun-Ta Lin 1, Pei-Min Wang 2 and Pei-Ru Wang 3 1 Department of Information Management, Yu Da University No 168, Hsueh-fu Rd, Chaochiao Township, Miaoli County, 361 Taiwan 2 Chung Hsing Commercial & Industrial Vocational High School No.211, Daying Rd., Zhunan Township, Miaoli County 35051, Taiwan 3 Department of Civil Engineering, National Chiao-Tung University, Hsinchu, Taiwan 1 {cfwu, maxlin}@ydu.edu.tw, 2 cindy613@ms10.hinet.net, 3 lulu3302@ms24.url.com.tw Abstract. Student dropout problems are a continuous concern in the educational world. According to relevant statistics, dropouts are responsible for a large proportion of the crimes committed by young people. Thus, students are a potential threat to public order after dropping out of school. Researchers are diligently developing efficient and effective assessment tools in the name of prevention over treatment. This paper employs group decision making for professional counselors and uses questionnaires to select a series of key criteria for high-care group assessment tools that can be adopted in school counseling units. In addition, the analysis hierarchy process (AHP) is used to establish a model for selecting assessment tools for high-care students groups in vocational high schools. We hope that this model can offer counseling units in vocational high schools in Taiwan an objective and effective method for selecting optimal assessment tools. Keywords: High-care students, assessment tools, analysis hierarchy process (AHP) 1 Introductions Dropping out of school and defiant behaviors are closely associated with criminal behavior. A study of education dropouts and juvenile criminals reported that among 218 juvenile delinquents, 65% had dropped out of education. In other words, more than three-fifths of the juvenile offenders were education dropouts [1]. According to domestic and foreign literature, dropping out of school can increase the crime rate [2], increase government expenditure on social welfare, and increase feelings of alienation among dropouts. Statistical analysis of juvenile delinquency conducted in 2001 by the Criminal Research Center of the Ministry of Justice stated that the total number of criminals was 14,727 [3], of which, 35.14% were high school dropouts and 52.54% were between 16 and 18 years of age. These data demonstrate that dropping out of school can be a significant concern for personal development and social stability. Using information technology, this study analyzes and compares four assessment 91

102 tools, namely, a checklist, a self-report questionnaire for predicting dropout probability, a statistical prediction model, and teachers as predictors to identify the appropriate assessment tools for predicting potential dropouts. The objective of this study is to determine the most suitable assessment tool for identifying which students of higher vocational schools are at risk of dropping out. 2 Literature Review Multiple factors influence students to leave school. Most previous studies concerning dropouts mention both the conceptual model for dropouts developed by Tinto [5] and that developed by Miller [6] subsequently; these two models are essentially the same. Both models emphasize that the academic and social performance of students with different background characteristics can influence their perceptions at the psychological level. The analytic hierarchy process (AHP) is a decision-making strategy developed by Saaty in With constant modifications and verifications, by 1978, the AHP method had matured. AHP has a wide range of theoretical applications; the amount of literature on AHP is substantial. The research methods of AHP theory were proposed and explained in [7]. The methods frequently used by domestic and foreign education units to predict student dropouts include checklists, self-report questionnaires for dropout probability, statistical prediction models, and teacher assessments [5]. These four methods use different subjects, approaches, durations, and data presentation; the appropriate sample size for these methods also differs. The four prediction tools are described below. A. Checklist Teachers who have direct contact with students are instructed to select the items that apply to the students to assist in determining how likely the students are to leave education and related phenomena, and related research was completed by Ginaras and Careage [8]. B. Self-report questionnaire for dropout probability The targets of self-report questionnaires for assessing dropout probability are students. Examples of these scales include the Student At-Risk Identification Scale (SARIS) developed by McKee et al. [8]. C. Statistical prediction model A statistical prediction model is used to determine the factors associated with dropping out and potential indices [9]. D. Teacher assessments Teachers should identify at-risk students based on their daily contact and interaction with the students. Wells et al. [10] conducted relevant research and discussions using this method. 92

103 3 Methodology Research procedures that support the study theme were formulated as shown in Fig. 1. The targets of this study were first-year students of higher vocational schools. The assessment tools included a checklist, a self-report questionnaire for dropout probability, a statistical prediction model, and teacher assessments. Other assessment tools were not included in this study. 3.1 Description of hierarchical factors We adopted AHP in this study and structured various evaluation factors in hierarchical order using a top-down inductive method. Four main criteria are summarized and described below. A. Accuracy Accurate predictions rely on screening rate, teacher-student relationship, and student factors. Because the counseling units of educational institutions are responsible for an excessive amount of tasks, the counselors hope that the lists they receive actually contain at-risk students who genuinely require counseling. For assessment tools in which subjects are teachers, teachers personal subjective evaluations rely on teacher-student relationships and whether teachers can objectively evaluate students. For assessment tools in which subjects are students, the evaluation may be invalid because students can refuse to answer or provide false answers because of psychological factors specifically for at-risk students. B. Convenience Regarding convenience, the factors considered are the ease of operation and workload. Checklists do not require extensive textual descriptions and the questions primarily concern phenomena that can be easily observed or understood by teachers and that exist in schools. Numerous students are screened using self-report inventories and statistical prediction models, which can be a burden to counseling staff. C. Speed Regarding speed, factors considered are the number of test participants and the operation duration. D. Practicality Regarding practicality, the factors considered are data analysis and transitional function. Personal information enables teachers to concretize their vague perceptions of students and understand the various risk factors faced by students. Counselors and mentors who are not directly involved in the assessment procedure should be able to take over assessed cases relatively rapidly after obtaining detailed assessment result. 3.2 Establishing and applying models The establishment of a selection method in this study involves six steps [7]. We adopted AHP to determine the weights of selection criteria and construct a selection model. The results are described below. A. Analysis and establishment of research architecture 93

104 (1) Problem description: We first conducted problem analysis for the theme and goal of this study before collecting relevant information to fully understand objectives. Next, we differentiated the causal relationships among the secondary objectives to facilitate the partitioning of subsequent hierarchical structure. (2) Establishment of hierarchical structure: Several populations were defined by analyzing the study goal. Each population was further divided into several subpopulations; the hierarchical structure is shown in Fig. 2. B. Calculation of factor weights for each hierarchy level The weights for evaluating assessment tools for at-risk populations at higher vocational schools were established by calculating the geometric means and summarizing the composite score determined by expert groups according to weights defined by experts. C. Calculation of eigenvectors and eigenvalues The weights for factors in each hierarchy level were obtained from a pairwise comparison matrix of the primary and secondary criteria using eigenvector equations. D. Consistency test Saaty suggested verifying the consistency of a pairwise comparison matrix using a consistency index (CI) and consistency ratio (CR) [7]. E. Calculation of the relative weights in each hierarchy level After calculating the weights of factors in each hierarchy level, the weights of the overall hierarchy were calculated, and then, calculations for the overall hierarchy were conducted. F. Calculation of weights for the overall hierarchy and selection of the optimal assessment tool The most appropriate alternative for the research goal can be determined by calculating the weights of factors in each hierarchy level. Figure 1. Research procedure. Figure 2. Hierarchical structure. 4 Results and Discussion To evaluate the assessment tools selected by counseling units for identifying at-risk populations, we drafted an index set after reviewing literature and conducting a questionnaire. We also developed the indices required for this study by conducting interviews and mentor meetings, inviting students, mentor teachers, and counseling units to adjust the index set. The weights of various factors were assessed using AHP 94

105 to explore the alternatives and strategies that should be used for selecting assessment tools. A. Weights of the factors in each hierarchy level First, the results of expert questionnaires regarding the primary criteria were analyzed. The experts judgments on the primary criteria of alternative assessment tools were compared in pairs. The weights of criteria in each hierarchy level can be calculated from the pairwise comparison matrix of the primary and secondary criteria using eigenvector formulae; the results are shown in Table 1. Table 1. Weights of eigenvectors for primary and secondary criteria. Primary Criteria Weights of criteria Accuracy Convenience Speed Practicality Secondary criteria Weights of The two weights secondary criteria multiplied Screening rate Teacher-student relationship Student factors Ease of operation Workload Number of respondents Operation duration Analysis of personal information Transitional functionality Regarding the single factors judged by experts, the screening rate was the most important factor of accuracy, ease of operation of convenience, operation duration of speed, and analysis of personal information of practicality. As shown in Table 1, among the four criteria, the screening rate of the accuracy criterion was the most important factor and has a combined weight of The respondents attached relatively minimal importance to the factor number of participants in the speed criterion; the combined weight of this factor was only This result indicates that experts consider the most important factor to be the screening rate. The primary purpose of assessment tools is to identify or screen for the students at high risk of leaving education. Therefore, the screening rate is the most critical index of an assessment tool. B. Selection of the optimal assessment tool The composite score for the four assessment tools was obtained by organizing the weights of various criteria obtained from questionnaires completed by counseling personnel. The composite score of the factors that influence the respondents choice of assessment tool for identifying at-risk populations are shown in Table 2. Table 2. Selection of assessment tools for counseling units. Primary Secondary Decision factors Combined weight Priority criteria criteria Checklist Self-report questionnaire for

106 dropout probability Statistical prediction model Teacher assessments In summary, under various influences, the assessment tool most preferred by counseling personnel, according to AHP analysis, was, from most to least preferred, checklists, teacher assessments, self-report questionnaires of dropout probability, and statistical analysis models. The primary criteria for checklists scored , which was higher than scores for the three other assessment tools. This result indicates that counseling personnel value test outcomes when selecting assessment tools. Additionally, they consider whether assessment outcomes can be transferred for use in counseling, which is the practical aim of this study. 5 Conclusions Education units are concerned with students school attendance and have proposed multiple measures to reduce the number of dropouts. However, specific actions that evaluate assessment tools for identifying at-risk students at higher vocational schools have not been conducted. Therefore, this study proposed methods for selecting assessment tools that can effectively identify the students who are likely to leave education. We created indices to evaluate assessment tools for identifying at-risk populations at higher vocational schools based on the properties of the assessment tools and also calculated the weights of the indices according to performance evaluating questions using AHP to determine the relative importance of each index. References 1. Shang, J.C. Drop out school and juvenile delinquency-an example of Hsin-Chu School. Master s thesis, Graduate Institute of Social Work, National Chengchi University (1995) 2. Zheng, C.C. Transition schools and counseling for dropout students. Discipline research, Vol. 38(2), pp (1999) 3. Center for Criminal Research, Ministry of Justice, 2001 Crime situation and analysis (2002) 4. Ministry of Education, Education statistics education statistics (2011) 5. Tinto, V. Dropout from Higher Education: A Theoretical Synthesis of Recent Research. Review of Educational Research, Vol. 45, pp (1975) 6. Miller, A.P. An analysis of persistence/dropout behavior of Hispanic Students in a Chicago Public High School. Chico, Illinois: The Annual Meeting of the American Educational Research Association (1991) 7. Saaty, Thomas L. The Analytic Hierarchy Process, New York: McGraw-Hill (1980) 8. McKee, J.M., Melvin, K.B., Ditoro, v. and Mckee, S.P. SARIS: Student At-Risk Identification Scale. Journal of At-Risk, Vol. 4, No. 2, pp (1998) 9. Campbell, T.C. Predicting educational status: A multivariate, multi-perspective, longitudinal analysis of the process of dropping out of secondary school. (AAT ) (1997) 10. Wells, D., Miller, M.J. and Clanton, R.C. School counselors accuracy in identifying adolescents at risk for dropping out. Adolescence, Vol. 34, pp (1999) 96

107 Generating Test Cases for Cyber Physical Systems from Formal Specifications Lichen Zhang, Jifeng He and Wensheng Yu Shanghai Key Laboratory of Trustworthy Computing East China Normal University Shanghai , China Abstract. Formal methods and testing are two important approaches that assist in the development of cyber physical systems. Formal specification can be used to assist testing and Formal methods and testing are seen as complementary. In this paper, we address the problem of generating test cases for cyber physical systems from formal specifications, and reduce an infinite set of testing parameters into a finite set. Keywords: Cyber Physical Systems, Formal Methods, test, Dynamic Logic 1 Introduction Cyber physical systems [1], due to their increased size and complexity relative to traditional embedded systems, present numerous developmental challenges. The long-term viability of requires addressing these challenges through the development of new design, composition, verification, and validation techniques. These present new opportunities for researchers in cyber physical systems. It is natural to advocate the use of formal techniques in this application area in order to cope with these challenges and indeed a large body of knowledge exists on their use. Formal specifications contain a great deal of information that can be exploited in the testing of an implementation, either for the generation of test-cases, for sequencing the tests, or as an oracle in verifying the tests. In this paper, we address the problem of generating test cases for cyber physical systems from formal specifications, and reduce an infinite set of testing parameters into a finite set. 2 Generating Test Cases for CPS from Formal Specifications Much of the process of test execution and monitoring is automated in modern software development practice. But the generation of test cases has remained a labor-intensive manual task [2]. Methods are now becoming available that can automate this process [3][4]. A simple test-generation goal is to find an input that will drive execution of a (deterministic, loop-free) program along a particular path in its control flow graph. By performing symbolic execution along the desired path and 97

108 conjoining the predicates that guard its branch points, we can calculate the condition that the desired test input must satisfy. Then, by constraint satisfaction, we can find a specific input that provides the desired test case. This method generalizes to find tests for other structural coverage criteria, and for programs with loops, and for those that are reactive systems (i.e., that take an input at each step). A major impetus for practical application of this approach was the realization that (for finite state systems) it can be performed by an off-the-shelf model checker: we simply check the property always not P, where P is a formula that specifies the desired structural criterion, and the counterexample produced by the model checker is then the test case desired. Different kinds of structural or specification-based tests can be generated by choosing suitable P [5]. For example, The voice communication systems [6] can be seen as the nerve of the airport. The voice communication systems are cyber physical systems. It serves as the sole communication systems between the pilots, the air-traffic control personnel working on the airport. The ground personnel on the runways, other parties external to the airport and even other airports The main voice communication path is established between the operators working at the operators positions(op) and line bound parties as well as radio parties. The radio sets, necessary for communication via radio, are external to the system are connected to it via radio interfaces (RIFS). Line bound voice communication is established via the line interfaces(lifs), which are highly configurable [12]. The following specification describes the architecture of VCS 3020s system in VDM++ notation[7]. Here, the specifications of the classes RIF and LIF are skipped, since they are similar to the one of the OP classes[8]. c la s s O P in s ta n c e v a r i a b le s id : O P _ I D ; s w i tc h e n d O P c la s s S W I T C H S W I T C H in s ta n c e v a r i a b le s o p s O P s e t; in v o p s V c a r d o p s lifs L I F s e t; in v l ifs V c a r d l ifs r ifs R I F s e t; in v r i fs V c a r d r i fs e n d S W I T C H 2 4 ; 9 6 ; 4 8 ; The instance variable frq-coupling of the class SWITCH stores the coupling groups as a finite mapping from operator positions to a set of frequencies. The SWITCH has to guarantee, that a frequency can only be member of a single coupling group and that no more than 15 frequencies are in one coupling group. This property is expressed by means of a data-invariant denoted by the key word inv [8]. 98

109 instance variables ( ) frq couplings: OP_ IDmur FRQ_ ID set ; inv frq couplings V ( s rng frq couplings card s 15) ( card rng frq couplings> I rng frq couplings= { }) init objectstate V { a } frq couplings: = ; 1 ; The following two methods are called from the OP if frequency coupling is switched on or off for a frequency [8]. methods O FrqCouplingStart( op: OP_ ID, frq: FRQ_ ID) values B V if frq Urng frq couplings ( op dom frq couplings card frq couplings( op) > 14) then return false else( if op dom frq couplings then frq couplings: = frq couplingsψ { opa { frq} frq couplings( op) } else frq couplings: = frq couplingsψ{ opa { frq} }; return true ) pre op frqs( frq). tx methods O FrqCouplingEnd( op: OP_ ID, frq: FRQ_ ID) V if card frq couplings( op) = 1 then frq couplings: = { op} < frq couplings else frq couplings: = frq couplingsψ{ opa frq couplings( op) \{ frq} } pre op dom frq couplings frq frq couplings( op) From VDM++ specification, the formal specification of the test cases for the coupling/radio Re-Transmission feature is given out as follows [8]. 99

110 methods TestCoupling ( ) value B V ( dcl result : B : = true; op1! E FrqChangeC oupling ( m k FRQ _ ID ( 1 )); op1! E FrqChangeC oupling ( m k FRQ _ ID ( 2 )); result : = result op1! Test IsFrqC oupled ( m k FRQ _ ID ( 1) ) op1! Test IsFrqCoupled ( m k FRQ _ ID ( 2 )); op1! E PttPush( ); result : = result rif1! Test IsPttActiveBy( m k O P _ ID ( 1) ) rif 2! Test IsPttActiveBy( m k OP _ ID( 1 )); op1! E Ptt Re lease( ); op1! E FrqC hangetx( mk FRQ _ ID ( 3 )); op1! E PttPush( ); result : = result rif1! Test IsPttActiveBy( m k OP _ ID( 1) ) rif 2! Te st IsPttActiveBy( m k OP _ ID( 1) ) rif 3! Test IsPttActiveBy( m k OP _ ID( 1 )); op1! E Ptt Re lease( ); op1! E FrqC hangec oupling( m k FRQ _ ID ( 1 )); op1! E FrqC hangec oupling( m k FRQ _ ID ( 2 )); result : = result op1! Test IsFrqC oupled ( m k FRQ _ ID( 1) ) op1! Tes t IsFrqC oupled ( m k FRQ _ ID( 2 )); op1! E FrqC hangerx( m k FRQ _ ID( 1 )); op1! E FrqC hangerx( m k FRQ _ ID( 2 )); op1! E FrqC hangerx( m k FRQ _ ID( 3 )); return result ) Because the set of testing parameters is an infinite set for cyber physical systems, it is obvious that we cannot exhaustively test each of the testing parameters. However, it is possible that one testing parameter is representative of many others. A testing parameter is said to be robust if a slight (quantifiable) perturbation of the parameter is guaranteed to result in a test with the same qualitative properties (for example, safety and correctness). It is obvious that robustness can lead to a significant reduction in the set of testing parameters. In fact, ideally, we would like to be able to reduce an infinite set of testing parameters into a finite set, and quantify the coverage by the performed tests[13][14]. A key idea for tests of cyber physical systems is to decompose entire test into: (a) a closer investigation of the actual complex dynamics of a single system component; and (b) an integration of local correctness results into global system test. Furthermore, both (a) and (b) need to handle parameters, which naturally arise from the degrees of freedom of how a single component can be instantiated in a system environment. A first-order dynamic logic, dl [9] [12], that provides both as fundamental system behaviour. Further, dl can even be used for parameter extraction, i.e., automatic derivation of constraints for safety parameters. For example, the car has state variables describing its current position (xc), velocity (vc), and acceleration (ac). The continuous dynamics of the car is described by the differential equation system of ideal-world dynamics for longitudinal position changes ( x c = vc, v c = a c )We assume bounds for acceleration ac in terms of a maximum acceleration A 0 and a minimum positive braking power b>0. We introduce a constant ε that provides an upper bound for sensor and actuator delay, communication between the traffic center or traffic sign detector and the car controller, and computation in both. The car controller1 and the traffic center may react and exchange messages as quickly as they want, but they can take no longer thanε. The Variable speed limit control is modeled by DL as follows [10]: 100

111 The continuous dynamics (11) of the model describe the evolution of the car s position and velocity according to the current acceleration [10]. It uses a variable t that evolves with constant slope (i.e., a clock) for measuring time within the upper bound ε, and constrain the evolution of velocity vc to non-negative values, see (12). In the following model, a model is provided for variable speed limit control in the presence of an incident moving towards a car. Cars in this model follow the same control as in the previous section. They take care to comply with speed limits and potentially satisfy or optimize secondary objectives. Accordingly, the lower bound Safesl of the speed limit remains unchanged. The state variables describe an incident s position (xi) and its velocity of movement (vi) towards cars. The system dynamics, are extended with motion of an incident. It uses a minimum velocity (vmin), which is often mandatory on freeways and highways, to exclude unreasonable car behavior from the model (e.g., avoid having a car brake to a complete stand still, wait for the incident to arrive at the car s position, just to finally accelerate with maximum acceleration and rush beyond the incident). Variable speed limit control in presence of static and moving incidents (vsli) is modeled by DL as follows [10]: Once DL specification has been created, the test cases can be generated according to D L rules. For the function that the user has indicated, the pre-condition and post-condition are examined, and a partition representing them is generated. Subsequently, the partition is used to produce predicates which need to be passed to an external solver to find out whether they are satisfiable, and, if they are, to choose a random sample from each partition. Test cases are generated in the following way:(1) For each symbolic state, create a concrete trace leading to it with the initial state as a 101

112 starting point. To do this, a strengthened symbolic state is created that consists of all states that will lead to the target state. This is necessary as not all states inside a partition will lead to the target partition. We do this by starting at the target transition and follow the trace back to the initial state so that all constraints in the transitions of the trace will evaluate to true. This proceeding is called back propagation. (2) The strengthened traces created in step 1 are transformed to specific traces with concrete values for delays. We take an approach to extract the test cases from the formal specification by the four steps in partition analysis: (1)Extraction of definitions by collecting all parts (pre-condition, post-condition, invariants); (2)Unfolding of all definitions (in case of recursive definitions the unfolding is limited to some predefined limited number); (3)Transformation of the definition to DNF to get the disjoint sub-domains; (4)Further simplification of each sub-domain. 3 Conclusion In this paper, we address the problem of generating test cases for cyber physical systems from formal specifications and reduce an infinite set of testing parameters into a finite set. The further work is devoted to develop the test case generating methods and tools for the verification of the dynamic continuous features of cyber physical systems. Acknowledgments. This work is supported by national high technology research and development program of China (No.2011AA010101), national basic research program of China (No.2011CB302904), the national science foundation of China under grant No , No , No ), doctoral program foundation of institutions of higher education of China (No ), national; science foundation of Guangdong province under grant No.S References 1. Wolf.W. Cyber-physical Systems. Computer,Volume: 42 Issue: 3,88-89, Dino Mandrioli, Sandro Morasca, and Angelo Morzenti. Generating Test Cases for Real-Time Systems from Logic Specifications. ACM Transactions on Computer Systems,13(4): , Rachel Cardell-Oliver. Conformance Tests for Real-Time Systems with Timed Automata Specifications, Formal Aspects of Computing (2000) 12: Bahareh Badban, Martin Franzle, Jan Peleska & Tino Teige Test Automation for Hybrid Systems. In Proceedings of the Third International Workshop on Software Quality Assurance (SOQUA 2006). ACM, New York, NY, USA, pp Moez Krichen & Stavros Tripakis Conformance testing for real-time systems. Formal Methods in System Design 34, 3, pp

113 6. Johann Horl. Formal Specification of a Voice Communication System used in Air Traffic Control. Master's thesis, Institute for Software Technology (IST), Technical University of Graz, Austria, The VDM Tool Group, IFAD.User Manual for the IFAD VDM++ Toolbox. The Institute of Applied Computer Science, Forskerparken 10, 5230 Odense M, Denmark/Europe, 1.0 edition, September Doc.Id.: IFAD-VDM J Horl, B K Aichernig. Requirements validation of a voice communication system used in air traffic control. An industrial application of light-weight formal methods,the Proceedings of 4th International Conference on Requirements Engineering, Volume: II, Publisher: Springer, Pages: , André Platzer. Differential dynamic logic for hybrid systems. Journal of Automated Reasoning, 41(2), pp , Stefan Mitsch, Sarah M. Loos, and André Platzer.Towards formal verification of freeway traffic control.in Chenyang Lu, editor, ACM/IEEE Third International Conference on Cyber-Physical Systems, Beijing, China, April André Platzer. Differential dynamic logic for verifying parametric hybrid systems. LNCS 4548, pp Springer, J Horl, B K Aichernig. Formal Specification of a Voice Communication System Used in Air Traffic Control, FM99 Formal Methods,Volume: 1709, Publisher: Springer, Pages: , Julius AA, Fainekos GE, Anand M, Lee I, Pappas GJ (2007) Robust test generation and coverage for hybrid systems. In: Hybrid Systems: Computation and Control HSCC. LNCS. Springer, Berlin, pp H. Abbas, G. Fainekos, S. Sankaranarayanan, F. Ivancic, and A. Gupta.Probabilistic Temporal Logic Falsification of Cyber-Physical Systems, Accepted on for publication in ACM Transactions on Embedded Computing Systems. 103

114 Reliable Integration of Exact and Approximated Arithmetic with Three-Valued Logic in Python Reeseo Cha 1, Wonhong Nam 2, and Jin-Young Choi 1 1 Korea university, Seoul Korea {reeseo,choi}@formal.korea.ac.kr 2 Konkuk university, Seoul Korea wnam@konkuk.ac.kr Abstract. The error-ranges of exact rational numbers and intervals can be guaranteed even during the arithmetic operations whereas we cannot rely on the error-ranges of floating-point numbers. In this paper, we propose a novel number system, where the exact rational numbers are strictly separated from inexact floating point numbers and carefully integrated with the inexact numbers. A three-valued logic is also shipped with our number system to appropriately deal with uncertainties due to the inexactness. A prototype implementation of our number system in Python is demonstrated. 1 Introduction A number of modern programming languages and computer algebra systems provide unlimited integers and rational numbers with symbolic computation for exact arithmetic [1]. Some of them also support various ways to deal with approximated numbers more precisely, such as arbitrary-precision decimal arithmetic [2]. Moreover, the interval arithmetic is a dedicated approximation system where error-ranges are strictly guaranteed during the arithmetic operations [3]. These systems, however, are not so well integrated with unreliable approximations such as IEEE 754 floating-point numbers. For example, the class method from_float of the Fraction class that is a rational number type in Python [4], maps 0.3 not to but to Indeed, this is the fractional representation of , which is an erroneous result of approximating intended 0.3 into a IEEE 754 format [5]. Construction rules like this brake the reliability of the entire rational numbers in the fraction module of Python. To confidently rely on the number systems, we claim that inexact numbers should be strictly distinguished from the exact numbers and intervals, and that the inexactness should invade the world of exactness minimally and appropriately. Especially, conditional branches in a program should not be affected inappropriately by uncertainties due to the inexactness of the approximated numbers. This research partially was supported by the MKE (Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency): NIPA 2012 H

115 In this paper, hence, we propose a novel number system where exact numbers and intervals are explicitly separated from the inexact approximated numbers and carefully integrated with them, along with a three-valued logic system [6] where we can appropriately deal with uncertainties. First, we define three classes of numbers. Based on these classes, we define a set of coercion rules among them for the arithmetic operations and then define the equalities and order relations between them using three-valued logic. Finally, we demonstrate a prototype implementation in Python. 2 Constructions of Numeric Datatypes In our number system, all the numbers are categorized into three classes as follows. A number n is: an exact number if its location on the number line can be determined as a point and represented exactly. The class of exact numbers is actually a representable subset of algebraic numbers. a proper interval if its exact location on the number line is unknown but its possible range can be strictly bounded as a line segment and represented exactly. Each ends of an interval should be exact numbers, and can be either open or closed. an inexact approximation if its location or bounds cannot be guaranteed or cannot be represented exactly. A number in this class contains only blurred, unreliable information about its location. 2.1 Exact numbers Exact numbers are constructed using exact() which expects at most two arguments and returns a reduced form of a rational number. The first argument is a numerator, and defaults to 0 of type int if omitted. The second one is a denominator, and defaults to 1 of typeint if omitted. The arguments ofexact(), if any, should have a typeint,long,string, or anotherexact. If any of its argument is a string, it should have the form: digit (.digit ( digit + )? )?, where digit is [0-9] and the underscore means the recurring decimals. For example, the string "1.33_428571" means and exact("1.33_428571") constructs a rational number , which can also be constructed by exact(467,350). 2.2 Proper intervals Proper intervals are constructed using interval() which expects at least two and at most four arguments. The first two mandatory arguments mean its minimal and maximal ends, and should be exact numbers or other types which can be automatically converted into exact numbers such as int, long, and string. The maximal end must be greater than the minimal end. The remaining two arguments are the closedness of two ends, and should have the type bool. These 105

116 optional arguments default to True if omitted, regarding the interval is closed. We do not deal with degenerate intervals since they are equivalent to the corresponding exact numbers and are not approximations conceptually. Although we currently do not deal with unbounded, half-bounded, empty, or multiple intervals, these intervals can be included in the future developments in order to permit, for example, a reciprocal of [ 1,1] which is {(, 1],NaN,[1, )}. 2.3 Inexact approximations Inexactly approximated numbers are constructed using approx(), which expect at most one argument defaulting to 0.0 of the type float. Every IEEE 754 compatible floating-point numbers fall back to this category. Some exact numbers which cannot be constructed consistently also fall back here. This class resembles the decimal module of Python [4], but is regarded as unreliable. 2.4 Coercions during arithmetic operations When an arithmetic expression contains two or more different types of numbers, one of them is coerced to another. The basic rule of coercions within our number system reflects the fact that errors invade. In this point of view, exact numbers are more recessive than intervals, which are again more recessive than approximations. If at least one operand in an arithmetic expression is an approximation, then the others fall back to an approximation. If at least one operand is an interval and all the remaining operands are exact numbers, then the result falls back to an interval. For example, 3+[2.4,2.6) is not 5.5 but [5.4,5.6). Similarly, [5.4, 5.6) + approx(0.8) is not [6.2, 6.4) but approx(6.3). When numbers beyond our number system are mixed with at least one number in our number system, they are coerced into numbers of our number system as follows: int and long are coerced into exact numbers, and Decimal is coerced into an interval, and the others such as float are coerced into inexact approximations. 3 Logical Operations 3.1 Three-valued logic For intervals and inexact approximations, their equalities or order relations cannot be guaranteed. Hence, we propose a three-valued logic system where we can explicitly declare that something is uncertain. The TVL class consists of three distinct values: TVL := inevitable uncertain impossible We never define the bool () method for the TVL class to avoid mistakes of programmers, especially by confusing the meaning of else block of if statements. Instead, to use these TVL values in the conditional judgments such as if 106

117 or while statements, we define three predicates namely inevitably(), never(), and uncertain(), of the type TVL bool. For the coercion in logical expressions, if a logical expression contains at least one number of our number system, then the other numbers are coerced in the same way described in the previous section. 3.2 Equalities We implement overloaded operators = and using two special methods eq () and ne (). There are six cases for the comparison of two numbers since equalities are symmetric: Two exact numbers are always inevitably equal or never equal without any uncertainty. An exact number e and an interval i cannot be inevitably equal since we rule out degenerate intervals. Their equality is uncertain if e is equal to one of the closed ends of i or e is in the range of i, and impossible otherwise. An exact number e and an approximation a are inevitably different if the exponents of e in the IEEE 754 form is different from that of a. In all the other cases, their equalities are uncertain. Two intervals are inevitably equal if they are the same instances of Python classes. If they have no intersection at all, then they are inevitably different. In all the other cases, their equalities are uncertain. An interval i and an approximated number a cannot be inevitably equal. If a is inevitably different from any of the two ends of i, and a is not in the range of i, then they are inevitably different. In all the other cases, their equalities are uncertain. The equality of two approximations is uncertain if their significant digits and exponents are equal; otherwise, they are inevitably different. 3.3 Order relations For the overloaded order relation <, there are nine cases according to the three classes of its two operands, since this relation is not symmetric. Furthermore, especially for the intervals, a b does not always coincide with a < b T a = b where T is the disjunction operator for our three-valued logic. So, we should define overloaded for those nine cases separately. Fortunately, overloaded > and are still the converse relations of < and, respectively. We omit the definitions for the orders of two exact numbers < ee and ee, since they are obvious without any uncertainty. Using these two relations and their converses, along with the equality = ee defined in Section 3.2, we formalize the order of two intervals, < ii and ii. LetE min ande max befunctionsfromintervaltoexact,eachofwhichmaps an interval to its minimal end and maximal end, respectively. Let C min and C max be predicates from interval to bool, each of which maps an interval to the closedness of its minimal end and maximal end, respectively. Let I : TVL bool 107

118 be the predicate inevitably() defined earlier. Then, for any two intervals a and b, inevitable if I(E max (a) < ee E min (b)) (I(E a < ii b max (a) = ee E min (b)) (C max (a) C min (b))) impossible if I(E max (b) ee E min (a)) uncertain otherwise Similarly, inevitable if I(E max (a) ee E min (b)) impossible if I(E a ii b max (b) < ee E min (a)) (I(E max (b) = ee E min (a)) (C min (a) C max (b))) uncertain otherwise Orders between other combinations of classes such as < ie are also omitted in this paper, since these are more obvious than < ii. 4 Prototype Implementation In Python, built-in floating-point numbers should be handled with great care since they do not behave as in the elementary mathematics. For example, the summation of ten floating-point 0.1 s is not exactly 1. The program below cannot escape from the while loop, since the count does not exactly hit 2 but count, offset = 1, 0.1 while True: count += offset if count == 2: break We have implemented the number classes and the three-valued logic described in Section 2 and 3, as a module in Python. With this module, we can easily make numbers more reliable. The while loop above can be rewritten using our module as follows. The new program can escape from the loop since the summation of ten exact 0.1 s is exactly 1, and the count exactly hits 2 after ten iterations. from relnum import * count, offset = 1, exact("0.1") while True: count += offset if inevitably(count == 2): break 5 Conclusion We have designed and implemented a novel number system to distinguish and separate any inexactness from the exactness. We have also developed a corresponding three-valued logic, to guarantee certainties excluding any uncertainties 108

119 resulted from the inexact numbers. Our prototype implementation shows that we can avoid serious program errors, especially at the conditional branches where incorrect judgment of the equalities or orders of numeric values can occur. For the future work, we will develop this system further to include algebraic surd numbers and exponential operations on them. The extension will also include multiple intervals so that we can deal with reciprocals of intervals which contain zero. Moreover, the next implementation will also be ported to Haskell in order to type-check inappropriate numeric operations at compile time, and will be formalized in Coq with dependent types. References 1. Gowland, P., Lester, D.R.: A survey of exact arithmetic implementations. In: International Workshop on Computability and Complexity in Analysis. (2000) Bailey, D.H.: High-precision floating-point arithmetic in scientific computation. Computing in Science and Engineering 7 (2005) Hickey, T.J., Ju, Q., van Emden, M.H.: Interval arithmetic: From principles to implementation. Journal of the ACM 48(5) (2001) Python Software Foundation: Python v2.7.3 documentation. (2012) python.org/. 5. Hough, D.: Applications of the proposed ieee-754 standard for floating point arithmetic. Computer 14(3) (1981) Putnam, H.: Three-valued logic. Philosophical Studies 8 (1957)

120 Experiment study of Android Software Test using Moment Invariants Algorithm Won Shin 1, Tae-Wan Kim 2, Chun-Hyon Chang 1, * 1 Dept. of Computer Engineering, University of Konkuk, Seoul, Korea {wonjjang, chchang}@konkuk.ac.kr 1 2 Dept. of Electrical Engineering, University of MyongJi, Seoul, Korea twkim@mju.ac.kr Abstract. There are many tools for testing android software. However, previous tools are only focused on a functional test. The conventional tools thus, cannot analyze or detect distorted or broken image on the screen due to characteristic of device and android platform. To resolve this problem, in this paper, we propose an experiment model using a moment invariants algorithm, which is one of image comparison algorithms. In the experiments, we compared a normal screenshot with a test screenshot firstly. Secondly, we analyzed the experiment how much variants are changed. As a result, the resolution is a most important factor to affect an image. And a platform version is not related to the result. Developers can simply use the proposed moment invariants algorithm to find error candidates in a general case. Keywords: Moment Invariants, Android Software, interoperability, Software Test 1 Introduction Developers must consider various android platforms and devices during development phase because android does not support interoperability. On the other hands, it is not sure that software has no problem when environments of software are changed. Thus developers have to verify whether their software operates differently, or whether screens become corrupted according to type of devices. To overcome their difficulties, many kinds of supporting tool have been releasing in recent such as JUnit [7], Robotium [12], Android Testing Framework [2]. Those tools provide functionalities such as automatic test-case generation, automatic running test-case and analyzing software logs. Those functionalities are very helpful for developers, however, it is not enough because those tools only focus on functionality of software. For instance, if distorted images or broken image are existed on the screen of devices, the tools do not report it, but users may be uncomfortable during use the software. Therefore, a methodology, which finds strange things in the screen, is needed to developers. To resolve this problem, we survey image comparison algorithm to find differences * Corresponding author. 110

121 among screenshots and then analyze which screenshot is most abnormal image from normal image. In this paper, we explain our several experiments and its results. In the experiments, we use moment invariants algorithm which is shape-based technique and it is effective features for recognition under changing viewpoint and illumination [5]. The rest of the paper is organized as follows. Section 2, presents the experiment environment. Section 3 explains the experiment results. Finally, Section 4 concludes and explains future works. 2 Experiment Environment A goal of this experiment is to check whether image comparison technique can generate meaningful value for testing non-functionalities, especially usability. To archive the goal, we analyze relationship between android software and effect elements like a platform version, device and resolution. For experiment, we select several devices and software. There are five kinds of devices in Table 1. Those have different resolution and are made by different manufacture. Also, there are four kinds of software in Table 2. Two of them are game and those have their own images. The others are just made by using default android UI components such as a label, a button and so on. Table 1. Devices Description Product Name Resolution Manufacture Platform Version Galaxy S WVGA(800*480) Samsung 2.3 Vega WVGA(800*480) Pantech 2.3 Atrix qhd(960*540) Motorola 2.3 Sensation qhd(960*540) HTC 2.3 Galaxy Tab WSVGA(1024*600) Samsung 2.2 Table 2. App Description for Testing App Name Description Android s Fortune The well-known Fortune for Android: print a random, hopefully interesting, adage [1]. AnkiDroid A flashcards application for Android [3]. Crazy Penguin Game. Fend off invading Polar Bears by hurling courageous penguins at them with your trusty catapult [4]. Open Sudoku A simple open source sudoku game [11]. We use the OpenCV to generate meaningful value which represents difference between two screenshots. OpenCV is a library of programming functions for real time computer vision [10]. To compare screenshots require a basis of comparison, so we choose that resolution is WVGA(800*640) and platform version is 2.2 because it comprise a large 111

122 proportion of whole devices. API named matchshapes is used to compute the meaningful value and it offers three methods for comparing such as cv_controus_match_i1, cv_controus_match_i2, cv_controus_match_i3 [8]. In this paper, we use m1, m2, m3 as a name of method instead of cv_controus_match_i1, cv_controus_match_i2, cv_controus_match_i3. 3 Experiment Result and Evaluation 3.1 Analyze Experiment Result There are three experiments for discovering whether developers need to consider about resolution, device and platform version during testing software. (1) Experiment 1 : possibility of using emulator screenshot for testing The purpose of this experiment is to discern difference between screenshot of emulator and screenshot of device. If the difference is existed, it means that developers need to test their software on both of emulator and device, otherwise they just use either one. (a) Android's Fortune (b) AnkiDroid atrix galaxys galaxytab sensation vega atrix galaxys galaxytab sensation vega m1 m2 m3 m1 m2 m3 (c) Crazy Penguin (d) Open Sudoku atrix galaxys galaxytab sensation vega atrix galaxys galaxytab sensation vega m1 m2 m3 m1 m2 m3 Fig 1. Result of difference between device and emulator (a) galaxys (b) vega (c) atrix (d) sensation (e) galaxytab Fig 2. Android s Fortune screenshot on diverse devices In Fig 1, (a) Android s Fortune software just has high value but the others have not. A characteristic of (a) is that its components are almost label 112

123 components and contains lots of letters like Fig 2. Devices have a special their own font type each other, therefore its value must be higher than the others. The result of this means that consideration to use emulator or device is not essential except text based software. (2) Experiment 2 : to verify whether platform version influence result The purpose of this experiment is to verify whether platform version affects screenshot or not. On the other hands, same pictures are generated if capturing different platform versions. In Fig 3, it is difficult to discern difference among (a), (b), (c) and (d). Only the shape of top is slightly changed from (b) to (c) and (d). A key pad, in the bottom of (d), is optional so it can be hided from the screen. (a) 2.1 (b) 2.2 (c) 2.3 (d) 4.0 Fig 3. AnkiDroid screenshot on various platform versions Fig 4 describes version two point three and four point higher than two point one and two point two but it is a shade difference. Version two point one does not even have a value. As mentioned above, android s fortune software only has huge value but, it is not related to platform version (a) Android's Fortune m1 m2 m (c) AnkiDroid m1 m2 m (b) Crazy Penguin m1 m2 m (d) Open Sudoku m1 m2 m Fig 4. Result of various platform versions (3) Experiment 3 : to verify whether resolution affects result 113

124 In Fig 5, right side contains distorted image in 854*480. Right sides of 960*540 and 1024*600 are removed, and they are smaller than 800*480. We can expect that 960*540 and 1024*600 would have bigger value than 854*480. (a) 800*480 (b) 854*480 (c) 960*540 (d) 1024*600 Fig 5. Crazy Penguin screenshot on various resolutions Fig 6 illustrates 960*540 and 1024*600 have bigger value than 854*480 in (a) Crazy Penguin. Also, m2 have bigger value than m1 and m3 so that it requires consideration when interpreting the result of m2. Moreover, values of (a) and (b) are bigger than (c) respectively. Software (c) consists of android UI components but (a) and (b) have their own image or make a graphic image. It means that android UI components like a button, label and list-box does not affect a result (a) Crazy Penguin (b) AnkiDroid (c) Open Sudoku m1 m2 m3 800x x x x m1 m2 m3 800x x x x m1 m2 m3 800x x x x600 Fig 6. Result of various resolutions 3.2 Evaluation Result Several factors are discovered by experiments. First of all, platform version does not change a result except special case which software contains many letters because those produce much change. Secondly, it does not matter screenshot of emulator instead of device for testing. Thirdly, if software is made by own image or graphic method, it is quite possibly that there is a lot of distance on the result. Finally, a value from comparing between images is valuable because it describes how many differences are existed. It is clear that each of software has different range of values in the experiments, thus comparing each software values is not meaningful. However, a 114

125 result from computing same software is valuable because the software may have problem on the screen if value is existed. In conclude, developers can use a value from comparing images to decide whether check their software or not. 4 Conclusion Android software can make distorted or broken image in a screen of device based on android platform because devices have different resolution, platform respectively. Due to the problem, developers have to check whether android software makes distorted or broken image in screen or not. To resolve this problem, we try to verify that it is possible to use moment invariants algorithm, which is image comparison algorithm. We do several experiments and then prove that platform versions are rarely influential to the result, though, resolution most affect to it. Also, it is possible to test using emulator instead of device. Developers find problem simply using moment invariants algorithm and reduce the time for testing their software. However, it may take huge time to make images of devices and platforms for testing. In the future, we develop tool for supporting to compare and analyze images automatically. Acknowledgments. This research was supported by R&DB Support Center of Seoul Development Institute, Korea, under Seoul R&BD Program (ST100107) References 1. Android s Fortune, 2. Ankidroid, 3. Android Testing Framework, 4. Crazy Penguin, 5. Datta, R., Joshi, D., Li, J., and Wang, J. Z., Image retrieval : Ideas, influences, and trends of the new age, Journal of ACM Computing Surveys, Vol 40, Issue 2, Article 5, Gary Bradski, Adrian Kaehler, Learning OpenCV : Computer Vision with the OpenCV Library, O Reilly, JUnit, 8. Moment Invarients, structural_analysis_and_shape_descriptors.html 9. Monkey, OpenCV, Open Sudoku, Robotium, 115

126 Parallel Simulation Testbed for Swarming MAVs Ge Li Institute for Automation, National University of Defense Technology Changsha, Hunan, China Abstract. The objective of the research presented in this paper is to develop an applicable parallel simulation testbed for studying the behavior of autonomous agents and their collective interaction with relatively high fidelity and speedup. A parallel simulation testbed for swarming Micro Air Vehicles (MAVs) is built based on a parallel simulation engine developed by us called KD-PARSE. The framework is described and object-oriented models are discussed. Issues in Parallel Discrete-Event Simulation, such as speed-up and processor distributions, are discussed and analyzed. Finally scenarios are executed in MAV simulation. Results show parallel simulation cannot guarantee high speed-up, and our solution is provided. Keywords: Parallel simulation, Testbed, Swarm, MAV 1 Introduction Swarms of MAVs(Micro Air Vehicle) provide an added level of robustness, fault tolerance, and flexibility over individuals, as the failure of one MAV does not result in failure of the task, as long as the remaining MAVs can redistribute and share the tasks of failed MAV.The complex and highly interdependent nature of the behavior of MAVs makes it difficult to be simulated. To achieve accuracy and high fidelity, complex models are often used in simulating the behavior of MAVs [3] [4] [6]. As the number of MAVs to be simulated increase, the performance suffers drastically from the computation. More often, researchers choose Parallel Discrete-event Simulation (PDES) paradigm when dealing with large simulation applications. PDES is concerned with execution on multiprocessor computing platforms containing multiple processors that interact frequently. The fundamental challenge of PDES is to efficiently process events concurrently on multiple processors while preserving the overall causality of the system as it advances in simulated time. This problem is solved using synchronization mechanism that ensures simulations on all processors being synchronized at any time when they must interact or exchange data [1]. A Parallel Simulation Engine(KD-PARSE) developed by us provides an excellent platform to distribute this multiple-mav simulation across clusters. 116

127 2 Parallel Simulation Testbed for Swarming MAVs The KD-PARSE framework is an object-oriented PDES Framework. KD-PARSE supports multiple time management approaches to guarantee logically correct time management in a distributed simulation. Supported approaches utilize lookahead and rollback in various schemes to provide many forms of optimistic time managements. Disabling the rollback mechanism at run time permits conservative time management to synchronize event processing. KD-PARSE serves as the environment for developing PDES simulation of MAVs. 2.1 Object-oriented MAV Simulation Framework MAV simulation framework should defining state variables of MAV, defining events of MAVs, and events handlers to each event. We try to separate the simulation framework and the MAV models and make the framework extendable to new models and parameters. Each MAV is an independent individual, which receives information, manipulates according to some control rules, and publishes information to others. They interact with each other to perform some collective missions. The MAV simulation framework is shown in Figure 1. Initial Scan MAV Schedule query on other MAV Receive query from other MAV Schedule acknowledge on other MAV Query acknowledge Avoid Update Get broadcast from other MAVs SimObjs Broadcast information to other MAVs Fig. 1. MAV simulation framework MAV object has five event handlers. In an update event, information from other MAV is stored in a table. A scan event is scheduled to process this information. In the scan event, one MAV checks the possibility of encountering other MAVs based the MAV information stored in the table. If another MAV falls into its detection region (either at present time or some time in future), a query event will be scheduled to that MAV at that certain point in time. 117

128 In a query event, MAV verifies an encounter with the MAV by which this query event is scheduled. An acknowledge event with the verification result is scheduled back to the MAV. An avoid event is scheduled to start avoidance motion, if the result is positive. In an acknowledge event, MAV receives verification result from other MAVs, and schedule an avoid event to itself if the result is positive. In the avoid event, MAV manipulates the avoidance movement until it collides with other MAV or no other MAVs remains in its detection region. Based on a predefined set of control rules, MAV adjusts thrust and/or bank angle, thus changing its speed and heading angle. Upon the end of avoidance motion, MAV publishes its information to others and schedules a scan event to itself. 2.2 MAV model MAV models include motion dynamics and behavior dynamics model. MAV behavior dynamics model focus on the collective behavior of swarms of MAVs or other robots. Wu et al [5] utilized Genetic Algorithm in an evolved controller to dynamically distribute a group of MAVs appropriately in a surveillance area for maximum coverage. Spears et al [7] explored the performance of rule-based control strategies on multi-asset surveillance. Lin et al [2] also studied the behavior control of the Unmanned Combat Air Vehicle s swarm. They used Monte Carlo simulation in conjunction with GA to evolve the robust control when wind-gust disturbance exists. MAV models in those works have less fidelity because the battlefield is composed of grid points, and the MAV moves on the grip points only with fixed step size. Dynamic Model. In this research MAVs are defined to move freely in the x-y plane. A MAV is set to have a constant mass of kilograms, a maximum thrust of 1.0 Newton, and a negative drag force from the static air whose amplitude is related to the MAV s speed. The flat-earth two-dimensional point-mass airplane equations of motion and the kinematics equations are given below, where, m is mass, V is (absolute) speed, T is thrust, D is drag, φ is heading angle, ψ is bank angle. d V/dt = ( T D ) / m. (1) dφ /dt = G tan ψ / V. Behavior Model. We suppose every MAV possesses an array of eight optical sensors, each of which covers a 45-degree range, for detecting neighboring region. Since MAVs have simple and primitive sensors, we assume that those sensors can only perceive the presence of objects (other MAVs, boundaries, etc.) within the detecting sector. Sensor cannot detect the number or distance of objects within its range. Furthermore, we will divide the neighborhood into a collision region and a detection region. We define the collision region as a 0.5-meter-radius circle around MAV. Whenever two MAVs detect each other in collision region, we will label them 118

129 as dead and then take them off the simulation. The radius of detection region is chosen to be 20 meters in the simulation. The surroundings are divided into 8 sections; they are Front, Left, Rear, Right, Front-left, Rear-left, Rear-right, and Front-right. Whenever other MAVs fall into the surroundings, MAV will adjust its thrust and bank-angle, thus have the right speed and heading-angle to avoid crashing. Otherwise, it has zero bank-angle and a certain thrust, which will make its heading-angle and speed constant. According to the signals from detecting sensors, one or more of rules can apply at the same time: Rule 1. If there is MAV(s) in Front section, it will decrease its thrust by 10% and decrease bank-angle 10 degrees. Rule 2. If there is MAV(s) in Front-right, Right, Rear-right section, it will decrease its bank-angle by 5 degrees. Rule 3. If there is MAV(s) in Front-left, Left, Rear-left section, it will increase its bank-angle by 5 degrees. Rule 4. If there is MAV(s) in Front-left, Front, Front-right section, it will decrease its thrust by 5%. Rule 5. If there is MAV(s) in Rear-left, Rear, Rear-right section, it will increase its thrust by 5%. Rule 6. If there is MAV(s) in Rear section, it will increase its thrust by 5%. 3 Case Studies 3.1 Speedup VS. Number of MAVs In this case MAVs are simulated in a 500-meter by 500-meter square field for duration of 500 seconds. As the number of MAVs in the simulation increases from 5, to10, 20, 30, 40, and 50, the time cost will increase too, but exponentially. This increment of CPU time cost is due to the increasing number of MAVs, as well as the increasing possibilities of MAV encountering others(figure 2). These runs are repeated on two processors using Time Warp synchronization approach. The simulation CPU time cost, which also increases exponentially as number of MAVs increases, is less than the time cost of simulation running sequentially. Speedup of each run is calculated and plotted in Figure 2. Speedup varies as number of MAVs increases, and the maximum speedup occurs when number of MAVs is about

130 Fig. 2. Speedup vs. number of MAVs 3.2 Speedup Speedup is defined as the serial execution time divided by the parallel execution time. Running simulation on multiple processors will not always shorten the time cost. When more processors are involved in simulation, more efforts are needed to get every processor synchronized, which will offset the expected speedup. Scenario 2 is applied in this test: 24 MAVs flying in a 1200-meter by 600-meter area for duration of 120 seconds are simulated on 1, 2, 3, 4, 5, 6 processor(s) respectively. Since the simulation time cost may vary randomly, we repeat each case and use the average CPU time cost to calculate speedup. To reduce the interaction between MAVs, the field is divided into six 200-meter by 600-meter subareas. MAVs are label from 0 to 23 and can only fly within their respective subareas (Figure 3). 600m m 200m 400m 600m 800m 1000m 1200m Fig. 3. MAVs distribution in subareas The speedup is plotted in Figure 4. It is obvious that when 24 MAVs are distributed to 2, 3, and 6 nodes, the speedup almost reaches its best performance. Actually we can see from Figure 3 that MAVs within the same subareas are distributed in the same node in these conditions, thus the interactions are constrained locally in one node. As there are no interactions between nodes, good speedups are achieved. On the other hand, when MAVs are distributed to 4, and 5 nodes, MAVs within the same subareas are distributed in different nodes, large amount of interactions and rollbacks show up so that reduce the speedup. 120

131 Fig. 4. Speedup Speedup cmt 4 Conclusions The parallel simulation testbed is feasible to study the motion behavior of multiple MAV. It separate the simulation framework and the MAV models and make the framework extendable to new models and parameters. Case studies also show that elaborate design of the MAV applications is crucial to good performance. Acknowledgments. The work is partially supported by Chinese National Natural Science Foundation Grant No References 1. Li, G., Lin, K., Huang, K.: An Architecture to Facilitate the Development of Parallel and Distributed Simulation System. Advances in Modeling & Analysis. D, vol.10 No.1, AMSE 1-16 (2005) 2. Lin, K., Yu, H., Zhou, L., Xia, Z., Sisti, A., Alexander, S.: Robust Control of A Swarm of UCAVs. In: Proceedings of SPIE, Vol. 4716, pp (2002). 3. Corner, J. J., Lamont G. B.: Parallel Simulation of UAV Swarm Scenarios. In: Proceedings of the 2004 Winter Simulation Conference, pp (2004) 4. Russell, M. A., Lamont G. B., Melendez K.: On Using Speedes As A Platform For A Parallel Swarm Simulation. In: Proceedings of the 2005 Winter Simulation Conference, pp (2005) 5. Wu, A. S., Schultz, A. C., Agah, A.: Evolving control for distributed micro air vehicles. In: Proceedings of the 1999 IEEE International Symposium on Computational Intelligence in Robotics and Automation, pp (1999) 6. Xia, Z., Lin, K.: Distributed Combined Discrete-Continuous Simulation for Multiple MAVs Motion Analysis. In: Proceedings of The 2003 International Symposium on Collaborative Technologies and Systems, Simulation Series, Vol. 35, Num. 1, pp (2003) 7. Spears, W. M., Zarzhitsky, D., Hettiarachchi S., Kerr W.: Strategies for Multi-Agent Surveillance, 121

132 Formal Specification for Transportation Cyber Physical Systems Lichen Zhang, Jifeng He and Wensheng Yu Shanghai Key Laboratory of Trustworthy Computing East China Normal University Shanghai , China Abstract. Transportation cyber physical systems such as automotive, aviation, and rail involve interactions between software controllers, communication networks, and physical devices. These systems are among the most complex cyber physical systems being designed by humans, but added time and cost constraints make their development a significant technical challenge. Formal specification technologies are now indispensable for quickly developing safe and reliable transportation systems. In this paper, we propose a formal specification approach for Transportation cyber physical systems. The proposed formal framework is such a formwork. On the one hand, it can deal with continuous-time systems based on sets of ordinary differential equations. On the other hand, it can deal with discrete-event systems, without continuous variables or differential equations. We present a combination of the formal methods Timed-CSP, ZimOO and differential (algebraic) equations or differential logic. Each method can describe certain aspects of a transportation cyber physical system: CSP can describe communication, concurrent and real-time requirements; ZimOO expresses complex data operations; differential (algebraic) equations model the dynamics and control (DC) parts. A case study of train control system illustrates the specification process for Transportation cyber physical systems. Keywords: Transportation Cyber Physical Systems, ZimOO, Timed-CSP, Differential Logic 1 Introduction Transportation cyber physical systems[1] automotive, aviation, and rail involve interactions between software controllers, communication networks, and physical devices. These systems are among the most complex cyber physical systems being designed by humans, but added time and cost constraints make their development a significant technical challenge. Formal specification technologies are now indispensable for quickly developing safe and reliable transportation systems. Transportation cyber physical systems consist of three parts: the dynamics and control (DC) parts, the communication part and computation part. The DC part is that 122

133 of a predominantly continuous-time system, which is modeled by means of differential (algebraic) equations, or by means of a set of trajectories. The evolution of a hybrid system in the continuous-time domain is considered as a set of piecewise continuous functions of time. The computation part is that of a predominantly discrete-event system. A well-known model is a (hybrid) automaton, but modeling of discrete-event systems is also based on, among others, Z,VDM, process algebras, Petri nets, and data flow languages. Clearly, cyber physical systems represent a domain where the DC, communication and computation aspects must be met, and we believe that a formalism that integrates the DC, communication and computation aspects is a valuable contribution towards integration of the DC, communication and computation methods, techniques, and tools [2]. In this paper, we provide some ideas for formal specification of transportation cyber physical systems and one well known case study to validate formal specification. 2 Formal Specification for Transportation Cyber Physical Systems Transportation systems are complex systems and current formal specification technology does not scale to the sizes of these systems. These systems need to be analyzed at several levels of abstraction. It is unlikely that a single specification technique will suffice at every level. CSP is suitable for showing the order of the occurrence of events but lack the ability to handle complex abstract data types and operations.[3] ZimOO [4] is based on Object-Z [5], an object-oriented extension of Z [6], ZimOO is an extended subset of Object-Z allowing descriptions of discrete and continuous features of a system in a common formalism ZimOO supports three different kinds of classes: discrete as in Object-Z, continuous and hybrid classes. The differential dynamic logic (dl) [7] is a logic for specifying and verifying hybrid systems [17][15]. The logic dl can be used to specify correctness properties for hybrid systems given operationally as hybrid programs. The basic idea for dl formulas is to have formulas of the form [α]φ to specify that the hybrid system α always remains within region φ, i.e., all states reachable by following the transitions of hybrid system α statisfy the formula φ. Dually, the dl formula <α>φ expresses that the hybrid system α is able to reach region φ, i.e., there is a state reachable by following the transitions of hybrid system α that statisfies the formula φ. Aspect-oriented approaches[8] use a separation of concern strategy, in which a set of simpler models, each built for a specific aspect of the system, are defined and analyzed. Each aspect model can be constructed and evolved relatively independently from other aspect models. Aspect-oriented specification is made by extending TCOZ [9] and ZIMOO notation with aspect notations. The schema for aspect specification in has the general form as shown in Fig

134 Fig.1. Aspects of Model Structure 3 Case Study: Formal Specification of Train Control Systems Train control systems contain several components connected by communication channels. One important component is the train controller whose purposes are to limit the speed of the train, decide when it is time to switch points and secure crossings, and make sure that the train does not enter them too early. The odometer component keeps track of the speed and position of the train. The speed controller supervises the speed and makes sure that it does not exceed the limit set by the train controller, otherwise it automatically slows down the train. When the speed limit is set to zero, the train will break until it comes to a safe halt. The communication with crossings is done by the radio controller. As said above, the communication medium is radio based. Special care has to be taken, because radio transmissions are inherently unsafe. The safety must still be established under the assumption that no message can be transferred [10][11] For specification of train control using formal methods, First, the communication channels of the class are declared. Every channel has a type which restricts the values that it can communicate. There are also local channels that are visible only inside the class and that are used by the CSP, ZIMOO, and differential dynamic logic (dl) parts for interaction. Second, the CSP part follows; it is given by a system of (recursive) process equations. Third, the Z part is given which itself consists of the state space, the Init schema and communication schemas. For each communication event a corresponding communication schema specifies in which way the state should be changed when the event occurs. Finally, below a horizontal line the differential dynamic logic (dl) part is stated. Classes can be combined into larger specifications by CSP operators like parallel composition, hiding and renaming. The first aspect is communication. These communications can be naturally modelled with CSP. As an example we can model the loop supervising the speed in CSP by the following recursive equation: c Radio_ com= SuperviseTrain1 SuperviseTrain2L SuperviseTrain1= c getspd getpos calcmaxspd setmaxspd SuperviseTrain1 SuperviseTrain 2= L L c It is assumed that an MA (movement authority) has been granted up to some track position, which we can call in, and the train is located at position z, heading with current speed v towards m. We represent the point SB as the safety distance s relative 124

135 to the end m of the MA (i.e., m-s=sb). In this situation, differential dynamic logic (dl) can specify the following crucial safety property of the train control system, which we state as a DL formula which expresses that a train always remains with its MA [12]. ψ [( control ; dirve )*] z m where control drive (? m z τ : = 0; ( s; a : = b ) (? m - z s; a : = A ), z ' = v, v ' = a, τ ' = 1& v 0 τ ε ). A train controller limits the speed of the train, decides when it is time to switch points and secure crossings, and makes sure that the train does not enter them too early [13].There are main variables and operations in the train movement process: speed, time, reportinfo, update log, receive command, brake, supervise. This paper uses ZIMOO to specify state space. Fig.2 gives out the model of the train control. Train_ com= Re port Supervise c Re port= reportinfo Re port c c Supervise= superviseinfo Supervise ψ [ ( control; dirve)* ] position dangerpositon where control (? dangerposition position d; a: = b) (? dangerposition- position d; a: = A), run τ : = 0; ( position' = v, v' = a, τ' = 1& v 0 τ ε). Fig. 2. Modeling Train Controller by the integration of CSP, ZIMOO, and differential dynamic logic (dl) The movement permissions of trains are neither known beforehand nor fixed statically. They are determined based on the current track situation by a Radio Block Controller (RBC) [14] [15] [16]. Trains are only allowed to move within their current 125

136 movement authority (MA), which can be updated by the RBC using wireless communication. Hence the train controller needs to regulate the movement of a train locally such that it always remains within its MA. The RBC is modeled by CSP[17], ZIMOO, and differential dynamic logic (dl) [18] as shown in Fig.3. c Radio _ com= SuperviseTrain1 SuperviseTrain 2L c SuperviseTrain1= getspd getpos calcmaxspd c SuperviseTrain 2= L L setmaxspd SuperviseTrain1 n= # train= # speed 0< d = brakingdist(max Speed) s, s': Speed s s' brakingdist( s) brakingdist( s') emergencyt rain > n i: dom speed 0 speed ( i) max Speed i: dom train i 1 trian( i) < train( i 1) brakingdis t( speed ( i)) ( speed, emergencytrain) newemergencytrain?: N newemergencytrain? n emergencytrain' = min{ newemergencytrain?, emergencytrain} speed' ( emergencytrain' ) = 0 i N i emergencytrain' speed' ( i) = speed( i) Fig.3. Specification of Radio Block Controller 4 Conclusion In this paper, we proposed a formal specification approach for Transportation cyber physical systems. The proposed formal framework is such a formwork. On the one hand, it can deal with continuous-time systems based on sets of ordinary differential equations. On the other hand, it can deal with discrete-event systems, without continuous variables or differential equations. We presented a combination of the formal methods Timed-CSP, ZimOO and differential (algebraic) equations or differential logic. Each method can describe certain aspects of a transportation cyber physical system: CSP can describe communication, concurrent and real-time requirements; ZimOO expresses complex data operations; differential (algebraic) equations model the dynamics and control (DC) parts. A case study of train control system illustrates the specification process for Transportation cyber physical systems. system was used to illustrate the specification process of formal specification for cyber physical systems. The further work is devoted to integrated formal specification with AADL further. 126

137 Acknowledgments. This work is supported by national high technology research and development program of China (No.2011AA010101), national basic research program of China (No.2011CB302904), the national science foundation of China under grant No , No , No ), doctoral program foundation of institutions of higher education of China (No ), national; science foundation of Guangdong province under grant No.S References 1. Grand Challenges for transportation Cyber-Physical Systems 2. E. A. Lee and S. A. Seshia, Introduction to Embedded Systems A Cyber-Physical Systems Approach, Berkeley, CA: LeeSeshia.org, Adnan Sherif, Ana Cavalcanti, Jifeng He, Augusto Sampaio: A process algebraic framework for specification and validation of real-time systems. Formal Asp. Comput. 22(2): (2010) 4. Viktor Friesen. An Exercise in Hybrid System Specification Using an Extension of Z. citeseerx.ist.psu.edu/viewdoc/download?doi= &rep. 5. Graeme Smith. The Object-Z Specification Language[M]. Software Verification Research Centre University of Queensland J. Spivey: The Z Notation: A Refernce Manual (2rd Edition). Prentice Hall, UK, André Platzer. Differential dynamic logic for hybrid systems. Journal of Automated Reasoning, 41(2), pp , Kiczales G,et al.aspect-oriented Programming. In:proc.of the 11 th European Conf. on Object-Oriented Programming, June B. P. Mahony and J.S. Dong. Blending Object-Z and Timed CSP: An introduction to TCOZ. ICSE'98, April Jochen Hoenicke. Combination of Processes, Data, and Time. PhD thesis, University of Oldenburg, July Jochen Hoenicke:Specification of Radio Based Railway Crossings with the Combination of CSP, OZ, and DC A. Platzer. Logical Analysis of Hybrid Systems: Proving Theorems for Complex Dynamics. Springer, p. ISBN Johannes Faber, Swen Jacobs, Viorica Sofronie-Stokkermans: Verifying CSP-OZ-DC Specifications with Complex Data Types and Timing Parameters. Integrated Formal Methods 2007, July 3 rd. 14. André Platzer. Differential dynamic logic for verifying parametric hybrid systems. LNCS 4548, pp Springer, Jochen Hoenicke and Patrick Maier. Model-checking of specifications integrating processes, data and time. In J.S. Fitzgerald, I.J. Hayes, and A. Tarlecki, editors, FM 2005, volume 3582 of LNCS, pages Springer, J. Hoenicke and E.-R. Olderog. CSP-OZ-DC: A combination of specification techniques for processes, data and time. Nordic Journal of Computing, 9(4): , Davies J, Schneider S. A Brief History of Timed CSP[J]. Theoretical Computer Science, 1995, 138(1): A. Platzer. A complete axiomatization of quantified differential dynamic logic for distributed hybrid systems. Logical Methods in Computer Science, 42 pages. Special issue for selected papers from CSL'10 127

138 Linking Data for an Information Support System in Traditional Korean Medicine Hyunchul Jang 1,, Yong-Taek Oh 1, Sang-Kyun Kim 1, Anna Kim 1, Sang-Jun Ye 1, Chul Kim 1, Mi-Young Song 1 1 Korea Institute of Oriental Medicine, 1642 Yuseong-daero, Yuseong-gu, Daejeon, Republic of Korea {hcjang, ydydxor, skkim, ankim2012, tomita, chulnice, smyoung}@kiom.re.kr Abstract. Medical information in a clinical setting is primarily centered on patients symptoms and the methods for treating these symptoms. Traditional medicine in East Asian regions, including South Korea, may also employ a process of pattern identification to determine treatment methods. Such a treatment process leads to the collection of patient symptoms, doctor s diagnostic decisions, and the selection of the method of treatment. Depending on the patient s condition or the doctor s clinical decision, this process may be repeated in full, or part of the process can be omitted. The use of such medical knowledge to build an ontology for each of these processes can facilitate the application of appropriate knowledge according to each process, which allows for a broad variety of medical information to be leveraged using information systems. Various applications are possible by linking data based on classic literature, medicinal materials, formulas, acupuncture, and diseases. Keywords: Linked data analysis, Ontology, CDSS, Traditional medicine, Traditional Korean medicine 1 Introduction Throughout history, there have been efforts to observe patients symptoms and select appropriate methods of treatment to care for patients. These efforts brought us to our present situation via various processes and methods that are unique to various types of medicine and cultures. Symptoms currently present in a patient form the basis of diagnostic data, patient s physical information, and past medical history, which are designed to accurately describe the overall patient condition. In a similar manner, the treatment method is determined at the end of the doctor s actions, such as the diagnosis of the disease to be treated, decisions regarding treatment medications and combinations, and the determination of dosages. Various processes are also used in traditional Korean medicine (TKM), depending on the specific care provider, although the universal composition of knowledge in TKM, excluding basic theoretical concepts, consists of medications, acupuncture Corresponding author: Hyunchul Jang, fax , hcjang@kiom.re.kr 128

139 points, symptoms, and diseases (or patterns) [1]. Medical literature on TKM has focused on specific diseases, formulas, or medicinal materials, and the ontologies of diseases, formulas, medicinal materials, or acupuncture points can be built based on these reports. Nevertheless, it is difficult to declare that a formula for treating a disease, as found in a book that mainly describes diseases in a semantic web environment, is identical to the disease-treating formula obtained from another book that mainly describes formulas; similarly, it is difficult to determine that the two described diseases are actually the same disease. 2 Related Work Content link detection aims to discover similar content across different input and make such links explicit. For example, when reading a news article, content link detection can discover other articles that could serve as background for the current story. A number of machine learning based approaches [2], [3], [4] and [5] showed how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate other articles. But automated approaches cannot be applied in medical domain even though any reduction method like [6] is provided. In medical domain, information systems are demanded content based approach like [7]. Furthermore, contents have to be interpreted semantically. 3 Basic Knowledge Figure 1 illustrates TKM knowledge, where a node represents class or instance and a link represents a property. Because a pattern in TKM or the traditional Chinese medicine is similar to a disease, it was omitted from the diagrams in Figure 1 to increase visibility. Treatment targets are often represented as a combination of diseases or symptoms. Fig. 1. (a) A summary diagram of TKM Medicinal Material Ontology, (b) A summary diagram of TKM Disease Ontology 129

140 A disease contains the following information: the symptoms of a disease that can appear in patients, the causes of a disease, the mechanisms of a disease, and the methods of treatment. The medicinal materials, formulas, and acupuncture points (meridian and collateral), for example, contain information on the effects as treatment methods, as well as information on the treatment target or disease. In addition, methods of treatment are linked to the corresponding proper effects, whereas treatment targets can be linked to the proper diseases, symptoms, diagnoses, or causes of a disease. Therefore, to link these fragmentary data, we must determine whether two given diseases can be declared to be the same disease or that two formulas are the same, duplicated, or independent of one another. If they are found to encompass the same concepts at a certain level based on the determined results, then the concepts of the two ontologies must be declared to be the same and linked. Fig. 2. Linking literature-based knowledge in traditional medicine domain Such knowledge is described in books on traditional medicine, and depending on the book, it is described with a focus on the diseases, medicinal materials, or formulas. However, the contents of these books, such as textbooks, are organized according to the table of contents and thus cannot actively provide information at an appropriate time, even when information search systems are used. Relevant information is used when the search is conducted at the direction of a Korean medicine doctor, and such information becomes fragmentary information centered on a node. The abovementioned manner of utilizing knowledge requires a continuous information search process, and many difficulties are encountered when integrating and utilizing a series of concepts obtained in this manner, which makes it unreasonable to use this type of knowledge in clinical practice. From the perspective of computer engineering, such knowledge can be separated and built into a medicinal material ontology, formula ontology, or disease pattern ontology, as shown in Figure 1, and the sum of knowledge, as shown in Figure 3, can be reproduced by linking the ontologies together. 4 Method A resource description framework (RDF) [8] was used to build the TKM ontology, and the Jena Ontology application programming interface (API) was used to process the appropriate data. The ontologies of medicinal materials, formulas, or disease patterns described above were linked together by experts in TKM to prevent any 130

141 restriction in accessing the linked data or ontologies from the systems perspective. In the real world, each element of the expert knowledge will be published by the experts in each field and could be represented as fused knowledge through a mutual connection. Figure 3 shows the process of linking concepts, such as medicinal materials, formulas, acupuncture points, and diseases, from each of the ontologies and represents the linking of three ontologies described above into a graph. Fig. 3. A graph of the TKM Ontology The representative formulas and diseases in TKM were linked under the following criteria. Linking formulas: Because formulas are distinguished by the drug components and the quantities of the drugs used, the formulas are determined to be the same and can be linked if the components or the amounts are inferred to be identical when the drug components and the amounts for the formula used in the disease ontology are compared with the drug components and the amounts built into the formula ontology. Linking diseases: Depending on the methods of treatment or the specific book, the contents describing the diseases may be comprehensive, a disease may be a subclass of another disease, or the method of treatment may have been described depending on the causes of a disease or displayed symptoms, rather than the names of the diseases. For these reasons, it is difficult to link two concepts as being identical only because they have the same disease names. Of course, it is not impossible to link two concepts at an appropriate level because doctors can determine a more appropriate method of treatment among several that were presented depending on the displayed symptoms, even if the methods were described in the same book. Accordingly, clinical studies, such as the objectification of diagnosis for all diseases, may be necessary, whereas linking concepts at an appropriate level and the use of a simple support system based on this understanding of the concept linking at an information presentation level is thought to be appropriate at this stage. In the stage of determining the methods of treatment, the formulas linked to the entered symptoms are selected or the formulas linked to the decided diagnosis results are selected. As shown in Figure 3, information regarding the given formulas are initially given for the disease. After the treatment method is selected by the doctor, the effects linked to the treatment method are searched, and additional formulas with corresponding effects can be found. 131

142 Moreover, if the major indications of the searched formulas were analyzed and it was found such that there is a match between the information on the linked disease and the disease as a result of a diagnosis, then the selection of the corresponding formulas can be presented as an appropriate choice. The formulas linked to the collected symptoms can be selected from among the possible selections of random formulas. 5 Results We searched formulas from the TKM disease ontology by a symptom and then searched after linking effect to method of treatment. This result shows more information can be obtained by linking data not merely because an additional ontology was used. By linking medicinal materials used for diseases to formulas, more formulas can be retrieved. The primary property statistics for finding formulas to care symptoms in the disease ontology are shown in Table 1 and Table 2. Table 1. Number of formulas to care cough. Number of having effect to cough 22 (a) Number of diseases having Number of formulas treat cough diseases having cough overlapped with (a) Table 2. Number of methods of treatment for diseases having cough. Number of methods for 56 diseases having cough Number of effects EXACTLY corresponding these treatment methods Number of formulas having these effects overlapped with above 136 From the TKM disease ontology, 109 referable formulas from diseases having cough were found and 58 methods of treatment were found. It is dependent on how to find effects corresponding to these methods, but 12 effects were linked by name. From the TKM formula ontology, 43 formulas having these effects were found and 8 formulas were overlapped. But, 35 referable formulas could be presented after linking. 6 Conclusion Knowledge regarding TKM is largely based on traditional medical literature, and such knowledge actually exists independent of TKM. This information exists in the minds of Korean medicine doctors through the incorporation and interpretation of such knowledge, and the interpretation and application of TKM theory is determined by 132

143 Korean medicine practitioners. However, knowledge based on traditional medical literature, as well as clinical knowledge, must be accumulated and shared by linking these types of knowledge to achieve the standardization and objectification of traditional medicine. If the knowledge discussed in Section 4 is divided and expressed in a different space in which the data storage or access methods are varied, and the management of the knowledge is conducted independently, it will be virtually impossible to integrate and manage such knowledge; therefore, it will be extremely difficult to merge and use elements of knowledge that are not linked. However, if each piece of knowledge is shared or linked using a unique uniform resource identifier (URI) through RDF/OWL [9], this knowledge can be readily accessed on the Web, and each set of data can be shared rather than becoming subordinate to a specific system [10]. Furthermore, knowledge that is made public or shared based on a URI would be reviewed and refined by many individuals and could be realized as user-agreed knowledge. Acknowledgments. This research was conducted with funding from the Korea Institute of Oriental Medicine (K12090). References 1. Jang, H., Kim, J., Kim, S.-K., Kim, C., Bae, S.-H., Kim, A., Eum, D.-M., Song, M.-Y.: Ontology for Medicinal Materials Based on Traditional Korean Medicine. Bioinformatics 26(18), (2010) 2. Milne, D., Witten, I. H.: Learning to Link Wikipedia. In Proc. Of the 17 th International Conference on Information and Knowledge Management (2008) 3. Kaptein, R., Serdyukov, P., Kamps, J.: Linking Wikipedia to the Web. In Proc. Of the 33 rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2010) 4. Nomoto, T.: Two-Tier Similarity Model for Story Link Detection. In Proc. of the 19th International Conference on Information and Knowledge Management (2010) 5. West, R., Precup, D., Pineau, J.: Automatically Suggesting Topics for Augmenting Text Documents. In Proc. of the 19th International Conference on Information and Knowledge Management (2010) 6. West, R., Precup, D., Pineau, J.: Completing Wikipedia s Hyperlink Structure through Dimensionality Reduction. In Proc. Of the 18 th International Conference on Information and Knowledge Management (2009) 7. Yi, X., Allan, J.: A Content based Approach for Discovering Missing Anchor Text for Web Search. In Proc. of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2010) 8. Klyne, G., Carroll, J.: Resource Description Framework (RDF): Concepts and Abstract Syntax. W3C Recommendation (2004) 9. McGuinness, D., Harmelen, F.: OWL Web Ontology Language Overview. W3C Recommendation (2004) 10. Tim, Berners-Lee: Design Issues: Linked Data. (2006) 133

144 A Proof-of-Concept of D 3 Record Mining using Domain-Dependent Data Yeong Su Lee 1, Michaela Geierhos 1, Sa-Kwang Song 2, and Hanmin Jung 2 1 Center for Information and Language Processing, University of Munich, Germany {yeong micha}@cis.uni-muenchen.de 2 Korea Institute of Science and Technology Information (KISTI), Daejeon, Korea {esmallj jhm}@kisti.re.kr Abstract. Our purpose is to perform data record extraction from online event calendars exploiting sublanguage and domain characteristics. We therefore use so-called domain-dependent data (D 3 ) completely based on language-specific key expressions and HTML patterns to recognize every single event given on the investigated web page. One of the most remarkable advantages of our method is that it does not require any additional classification steps based on machine learning algorithms or keyword extraction methods; it is a so-called one-step mining technique. Moreover, another important criteria is that our system is robust to DOM and layout modifications made by web designers. Thus, preliminary experimental results are provided to demonstrate proof-of-concept of such an approach tested on websites in the German opera domain. Furthermore, we could show that our proposed technique outperforms other data record mining applications run on event sites. 1 Introduction There are numerous web sites providing large databases containing information such as yellow page listings or event calendars. Current approaches to data record mining [1, 2] exploit the structured character of HTML documents. For this purpose, two or more similar web pages have to be compared in order to extract the corresponding data records. These systems often expect preclassified web pages as input [2]. Based on the fact that data records are dynamically generated from a back-end database, some applications like MDR [3] try to reconstruct the given web page benefiting from the regularities of HTML structure. Hereby, the main focus is to determine iterations of HTML tag sequences by using the DOM tree representation of a web page. Zheng et al. [4] try to extract records represented by the so-called broom structures. In this approach a set of training pages are converted to DOM trees by an HTML parser. Then, semantic labels of a specific extraction schema are manually assigned to certain DOM nodes to indicate their semantic functions. Based on these labels, a broom-extraction (...) algorithm can be applied on each DOM-tree [5, p. 163]. Although they achieve high values in precision and recall, the training pages have to be manually annotated because most wrapper-induction systems expect such user-generated input formats. 134

145 Due to the loose strictness of HTML, others [6 8] try to exploit the visual information provided by a web browser by using rendering techniques for data record mining. They apply these methods on the displayed query results in order to determine the record boundaries. Even flat and nested data records can be extracted by Visual Structure based Analysis of web Pages [8] based on some heuristics [7]. Although rendering methods achieve good results, they have one big drawback: They require a web browser, which correctly displays the investigated web page, to determine some typical visual cues of a data region (e.g. size, background color, icons, font colors). Regardless of the technique used, all methods have one point in common: Current approaches to data record mining disregard any language-specific information dependent on the application domain. We observed that existing data record mining techniques are not satisfactory enough for specialized search purposes limited to restricted domains across websites. The success of event calendar search highly depends on language-specific trigger words indicating at least some date. We therefore propose a novel method for data record mining on demand. Browsing restricted domains allows us to define some key words, e.g. weekdays, nested in the data records of event sites. By means of limited vocabulary, we are able to analyze the document s HTML structure and locate the corresponding data record boundaries. Our technique is quite robust in variability of the DOM, upgradeable and keeps data up-to-date. The paper is structured as follows. In Section 2, we present our data mining technique. Section 3 evaluates the proposed method. In Section 4, we summarize our work and finally highlight future research directions. 2 The proposed technique After retrieving a large website, each web page has to be classified into pages with event calendars or without, depending on its key expressions 3. If the page contains at least two or more key expressions, then the search for event calendar records (ECR s) 4 will start. Otherwise, the page will be skipped. Thus, the classification of pages containing ECR s can be performed without the help of machine learning algorithms or keyword extraction methods. We now present the two steps of our approach: 3 An instance of a feature set that classifies a data record and can be described by regular expressions or string variants is called key expression. Please note that the selection of a key expression highly depends on the record type and on the domain respectively. For example, within a shopping record, the key expression may be some price information, and within a computer description, it may be the CPU type. We therefore speak of domain-dependent data described by its corresponding key expressions based on a limited vocabulary. 4 An event calendar record (ECR) is primarily a data record which provides information on event details like event title, event location, event date, etc. 135

146 1. First, we create the DOM tree of a selected web page in order to exploit its HTML structure (cf. Section 2.1) 2. Secondly, we assume that there is only one smallest maximum data region for the ECR s [3, 8 10] and it corresponds to only one HTML tag region. The smallest maximum data region can be determined by a top-down traversal of the tree using key expressions. It is predictable that there must be two or more key expressions in one HTML tag region of the tree. Otherwise, this tag region will be cut off from the DOM tree. 2.1 Exploiting the structure of ECR s within the DOM tree Each website has its own distinct method of presenting information. Therefore, the high variability observed in HTML structure should be taken into account. However, the number of possible tag combinations which can be considered for event calendar records (ECR s) is very limited. table html with key expression with key expression head body... tr tr tr... with key expression div div div div div with key expression td td td td td td td td td td td td div div div div #PCDATA #PCDATA #PCDATA #PCDATA #PCDATA #PCDATA #PCDATA #PCDATA #PCDATA #PCDATA #PCDATA #PCDATA record record record record div div div div div div div... div div div div div div div div div div div div data record data record data record span span span span span span span span (a) One record under one tag region (b) Sample core tree structure Fig. 1. Different types of event calendar records The following types of ECR s according to their tree structure have been registered and were classified as follows: (a) One single record under one node (cf. tr in Fig. 1(a)) (b) Each record consists of a set of children nodes with the same parent node. In Figure 1(a), we act on the assumption that all data records are siblings among each other with their own parent nodes, all containing the same value (e.g. tr ). If this tag region contains some key expression (e.g. weekday) and other event-related information, it will be selected as ECR. In Figure 1(b)), each ECR consists of a set of children nodes belonging to the same parent node. We can thereby distinguish between three structure types of data records (HTML tag regions) depending on the co-occurrence of tag attributes and values. 136

147 1. repetition of an HTML tag, e.g. div, with non-recurring attributes within the data records, including their text values, 2. repetition of an HTML tag, e.g. div, with non-recurring attributes within the data records and incomplete attribute-value-pairs (Fig. 2), 3. missing both tag attribute and value. Among these, case (1) is really rare and (3) can sometimes happen, but in practice, case (2) occurs quite frequently. In Figure 1(b), we showed that the key expression is inherited to only one HTML tag node ( html body the 5 th div the 4 th div ), and all records are the children of this one single node (cf. div class= content in Figure 2). When we zoom in and look at the record structure in detail, each record is composed of six div tags and its corresponding attributes: kalendariumtag, kalendariumdatum, kalendariumuhrzeit, kalendariummitte, kalendariumpreise and kalendariumlinie. As shown in Figure 2, the text values (ε) are missing for the first two mentioned HTML tag attributes in the one record, but are filled with #PCDATA A1 and #PDCATA B1 in the preceding record. We therefore resolve such coreferences by linking the text values of the same attributes in successive records. div class= content HTML tag with other infos Attribute div class= kalendariumtag div class= kalendariumdatum div class= kalendariumuhrzeit div class= kalendariummitte div class= kalendariumpreise div class= kalendariumlinie div class= kalendariumtag div class= kalendariumdatum div class= kalendariumuhrzeit div class= kalendariummitte div class= kalendariumpreise div class= kalendariumlinie img #PCDATA E2 #PCDATA D2 #PCDATA C2 ε ε img #PCDATA E1 #PCDATA D1 #PCDATA C1 #PCDATA B1 #PCDATA A1 Text value Fig. 2. Tag iteration of div with attributes and missing text values within an ECR But one problem that still remains to be solved is how to decide where the record starts and where it ends: The boundary between records can be determined by comparing the bordering tag attributes. Based on the assumption that the key expression, e.g. weekday, is placed in first position (cf. #PCDATA A1 in Figure 2), then we have two possibilities: We can go forward or backward to recognize the record boundaries. If we move forward, the same tag attribute 137

148 will recur after six steps. That way, we learn that one record consists of six tag attributes. However, we do not know yet where the record begins. In order to solve this problem, we go back until we find a tag attribute totally different from the six common attributes in Figure 2. Now we can initialize the starting points for all records embracing six tag attributes each. 2.2 Computing the tag similarity Assuming that some tag attributes are varying for the same type of event information because of layout issues, then we compute their tag similarity sim by using Dice s coefficient: sim = 2 X Y X + Y In the particular case here considered, we have two tag attributes for date and weekday which differ in background color, but both specify the day the event takes place. Thus, we consider them as one single information bit. td class= verd12 width= 56 align= center valign= top bgcolor= #ffffff td class= verd12 width= 56 align= center valign= top bgcolor= #cc0000 Despite their attribute differences we can assign both tags to the same type of information by calculating their tag similarity. We therefore separate the corresponding attribute-value pairs from the HTML tag: For this example, we extract { class, width, align, valign, bgcolor } as attribute set and { verd12, 56, center, top, #ffffff } as value set of the first tag. Moreover, we can observe that for the second tag, there is only one difference concerning the value of bgcolor which is #cc0000. Then, we apply the function sim where X is the set of attributes and values of the frist tag and Y of the second tag: In our example, X is 10, as is Y because the attribute set and value set contain five elements each. Consequently, {X} {Y} equals 9. sim = = = 0.9 In order to compare two tags, their position within one event record has to be the same, otherwise we cannot compute their tag similarity. Furhtermore, the similarity threshold can be adjusted by the help of some heuristics. 3 Experimental evaluation oper-frankfurt.de To evaluate the quality of the proposed record mining technique from arbitrary websites, we concentrate our case study on websites of German opera and theater houses. Our test set consists of 20 event calendar pages randomly retrieved from websites of opera houses (e.g. oper-frankfurt.de, bayerische.staatsoper.de, staatsoper-berlin.org, semperoper.de). We achieved a recall of 93.81% on the test data. 138

149 4 Upgradable features and future work One presumption is that our key expression driven record mining technique expects a valid HTML page for the DOM tree construction. If the tree cannot be built up, we will use some open source tools for correction purposes. We must admit that our approach is not able to reconstruct the DOM tree of web pages with no closing HTML tags. Thus, our method disregards totally nested record structures in order to protect itself of analyzing ECR s nested in other ECR s ad infinitum. So far we use hand-coded key expressions for the opera domain, but we work on learning such rules automatically. By measuring the similarity of content strings or tag regions, we will figure out the best candidates for domain-specific key expressions. Of course, they are language-dependent and we have to expand them to other languages apart from German. Moreover, until now we concentrated on a very special domain event calendars of opera houses. It could be interesting to adopt this technique to other domains dealing with different event information (e.g. sports, exhibitions, fairs). References 1. Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, San Diego, California, USA (2003) Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In: Proceedings of the 27th VLDM Conference, Rome, Italy (2001) 3. Liu, B., Grossman, R., Zhai, Y.: Mining data records in web pages. In: KDD 03: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, Washington D.C., USA (2003) Zheng, S., Song, R., Wen, J.R., C. Lee Giles: Efficient Record-Level Wrapper Induction. In: CIKM 09, Hong Kong, China (2009) Ashok Kumar, R., Rama Devi, Y.: Efficient approaches for record level web information extraction systems. International Journal of Advanced Engineering & Application 2(1) (January 2011) Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: Extracting Content Structure for Web Pages based on Visual Representation. In: Web Technologies and Applications: 5th Asia-Pacific Web Conference, APWeb 2003, Xian, China (2003) 7. Algur, S.P., Hiremath, P.S.: Visual Clue Based Extraction of Web Data from Flat and Nested Data Records. In: International Conference on Management of Data (COMAD 2006), Dehli, India (2006) 8. Hiremath, P.S., Benchalli, S.S., Algur, S.P., Udapudi, R.V.: Mining Data Regions from Web Pages. In: International Conference on Management of Data (COMAD 2005), Hyderabad, India (2005) 9. Liu, W., Meng, X., Meng, W.: Vision based Web Data Records Extraction. In: Ninth International Workshop on the Web and Databases (WebDB 2006), Chicago, USA (2006) Zhai, Y., Liu, B.: Web Data Extraction Based on Partial Tree Alignment. In: WWW2005, Chiba, Japan (2005) 139

150 Web Taxonomy Fusion using Topic Maps-driven Ontological Concepts and Relationships Ing-Xiang Chen 1 and Cheng-Zen Yang 2 1 Ericsson Taiwan Ltd., 11F, No.1 Yuandong Road, Banqiao Dist., New Taipei City, Taiwan, 220, R.O.C. ing-xiang.chen@ericsson.com 2 Department of Computer Science and Engineering, Yuan Ze University, 135 Yuan-Tung Road, Chungli, Taiwan, 320, R.O.C. czyang@syslab.cse.yzu.edu.tw Abstract. Since most of the Web taxonomies and catalogs are organized in conceptual hierarchies, taxonomy fusion can be viewed as a specialized case of hierarchical ontology coalition in real-world applications. Hence, different kinds of semantic information can be further extracted to facilitate Web taxonomy fusion, such as intra-ontological concepts and interontological relationships. This paper proposes approaches to effectively improve the accuracy of Web taxonomy fusion by using a taxonomy fusion model based on the ontological concepts and relationships of Topic Maps. Specifically, a novel fusion model based on inter-ontological mapping as well as intra-topic concept is presented to outperform a Naïve Bayes (NB) classifier and a Support Vector Machine (SVM) by 20% to 30% in F 1 measure over real-world Web taxonomies. Keywords: Web taxonomy fusion, Topic Maps, Hierarchical ontology, Intra-ontological information, Inter-ontological information 1 Introduction As more and more semi-structured digital contents are organized into taxonomybased ontologies (i.e., hierarchical ontologies), information fusion on large-scale ontologies becomes an important issue in digital content management. Mergers and acquisitions among enterprises are practical examples in which the large amount of taxonomical semi-structured data of an enterprise is integrated into the categorized ontologies of another enterprise. For realizing the fusion work, the ultimate goal is to develop an integrated view with a single ontology or a small set of ontologies to which all partakers will conform as noted in [1]. However, a general fusion work on semi-structured ontologies faces several severe challenges. First, ontologies to be integrated may be created independently in reality, and thus exist in different formats or structures. Second, the semantical or conceptual diversity existing in different ontologies further complicates the ontology fusion problem. Since the assertion of Topic Maps is to outline the real-world things into topics with names, and the ontological knowledge can be clearly described based 140

151 2 Lecture Notes in Computer Science: Authors Instructions on the TAO spirit of Topic Maps without tagging, Topic Maps can convey the knowledge of resources through a view of virtual maps, in which the resource subjects and the relationships between them are distinctly depicted [2]. Topic Maps are thus suitable to facilitate semantic-level ontology integration and describe Semantic Web. In this paper, we propose a taxonomy fusion model based on the ontological concepts of Topic Maps, in which semantic concepts, hierarchical relations, and physical instances within the ontological knowledge are utilized to facilitate taxonomy integration. In advance, inter-ontological mapping relationships are studied to complement the loss of semantics in the hierarchical structure and enhance the semantic-level integration. An information fusion mechanism based on the Topic Maps-driven (TM-driven) ontological concepts is thus illustrated to facilitate Web information fusion with the scenario of taxonomy integration. 2 Related Work In previous studies on Web taxonomy fusion, different kinds of implicit information embedded in the source taxonomy are explored to help information fusion. These implicit source features can be mainly categorized into four types. First, the co-occurrence relationships of source objects are studied to enhance a Naïve Bayes classifier based on the concept that if two documents are in the same source category, they are more likely to be in the same destination category [3]. Second, latent source-destination mappings are explored to improve the integration performance with an effective learning scheme in [4]. Third, a cluster shrinkage (CS) approach, in which the feature weights of all objects in a document category are shrunk toward the category centroid, is proposed [5]. Therefore, the cluster-binding relationships among all documents of a category are strengthened. Fourth, the parent-children information embedded in hierarchical taxonomies is intentionally extracted [6]. Based on the hierarchical characteristics, these approaches are extended to improve the integration performance. Even though, the semantic information embedded in the source taxonomy has not been discussed in past studies. The semantic conceptual relationships existing in Web taxonomies are particularly ignored. This observation thus motivates us to study the effectiveness of intra/inter-ontological information for taxonomy fusion. 3 Topic Maps-driven Integration We propose an integration model based on TM concepts to merge semantic resources from diverse sources. As described in the basic definition of TAO, semantic resources from different sources can be primitively transformed into TM concepts using topics, associations, and occurrences. In the following, both syntactic and semantic mappings are addressed to fulfill more comprehensive ontology matching. 141

152 Title Suppressed Due to Excessive Length Matching Semantic Resources For syntactic mappings between TM concepts and general ontological knowledge, ontologies are directly mapped to TM based on the definition of TAO [2]. by individually mapping concepts, relations, and instances to topics, associations, and occurrences. For semantic mappings between different concept domains, the mappings between most similar semantic concepts need to be determined. Here, we define a knowledge ontology as a triple O = (C,R,I), where C, R, and I represent concepts, relations, and instances, respectively. In our model, a similarity function δ: O s O d R s is used to express the similarity between two ontologies. Since each concept domain, C O, consists of its basic constitutes, namely, instances, the centroid of these instances can be used to represent the concept domain. We thus estimate δ(c si,c dj ) by calculating the centroid similarity between C si and C dj and counting the correct and incorrect mapping number of I between each concept domain C si O s and its corresponding concept domain C dj O d. Here, δ(c si,c dj ) is determined by the similarity between the centroids and the the number of correct mapping instances. Ontology matching between O s and O d can be therefore determined. 3.2 Integration Process In the TM-driven ontology integration procedure, intra-tm and inter-tm information is employed to extract ontological knowledge and find the concept mappings. Integration with Intra-TM Concepts To establish the internal semantic concepts, a weighting scheme is designed to control the impact of the semantic concepts of each hierarchical level. Equation 1 calculates the enhanced feature weight of each instance I, where L k is the relevant concept feature weight assigned as 1/2 k with a k-th level depth, L x denotes the hierarchical weight of the concept feature x, f x,i is the original weight of feature x, f x,i represents the enhanced feature weight, and λ is used to control the magnitude of relation. The weight of each hierarchical concept is exponentially decreased as 1/2 k, since the higher-level concept domains shall have weaker semantic relationships to the concept domain where the instance is located. The weight f x,i is assigned by TF x / TF n, where TF x is the term frequency of x, and n denotes the number of the stemmed terms in each instance. The feature weight L k of each concept domain is exponentially decreased and accumulated based on the increased levels. f x,i = λ L x n k=0 L +(1 λ) f x,i (1) k Integration with Inter-TM Concepts In the TM-driven model, inter-topic Maps information (inter-tm) is further extracted to enhance the semantic-level 142

153 4 Lecture Notes in Computer Science: Authors Instructions cen s i I s1 I s3... cen s idj I d1 I d2 cen d j I s Source Ontology Destination Ontology Fig. 1. The idea of common concept shrinkage (CCS). ontology integration. The main idea is to augment the feature space of each source instance with the mapping destination concepts to help the corresponding integration process. In the augmentation process, our model first calculates the similarity between the source concepts and the destination concepts using a cosine similarity function to find the potential augmentation mappings. Each semantic concept is represented by its centroid which is calculated by averaging the feature vectors associated with the instances in the concept domain. To obtain the centroid of the set of instances in each concept, the instances are transformed into feature vectors, and each feature weights are averaged by the number of instances in the concept domain, namely, cen = 1 C I C I, where C represents the number of instances I in the concept domain C. For each source concept (C si ) and each destination concept (C dj ), we obtain cen si and cen dj to represent C si and C dj, respectively. Each destination concept C dj will haveamapping relationshipto asourceconceptc si that hasthe highest similarity to C dj. In order to obtain the common conceptual information of C si and C dj, the instances of C si are shrunk to cen sid j using the cluster shrinkage algorithm in [5]. The idea of common concept shrinkage (CCS) is illustrated in Figure 1, and its algorithm is depicted in Figure 2. for each pair of mapped concepts C si and C dj { compute its centroid: cen si d j = 1 C si } I si C si I si ; for each instance I si C si { replace it with I s i = α cen si d j +(1 α)i si, where 0 α 1; } Fig.2. The CCS algorithm for each pair of mapped concepts. 143

154 Title Suppressed Due to Excessive Length 5 Table 1. The experimental categories and the numbers of documents. Y Class Y Test G-Y G Class Autos , Movies 27 1,311 5, Outdoors , Photo Software , Total 116 2,791 14, Fig. 3. Performance comparison of different models in Y G integration. 4 Applications on Web Taxonomy Fusion In the current Web environment, many information resources exist in hierarchical relationships or tree structures such as Web taxonomies and enterprise catalogs. When the nature of Web taxonomy is further considered, a Web taxonomy is essentially constituted of concepts (categories), relations (hierarchical structures), and instances (documents). Therefore, the proposed TM-driven integration model can be intuitively transformed to a hierarchical taxonomy integration scheme for Web taxonomy fusion. In the experiments, five directories from Yahoo! (Y) and Google (G) were extracted to form two experimental taxonomies. Table 1 shows these directories and the number of the extracted documents after ignoring the documents that could not be retrieved. The documents appearing in only one category were used as the training data ( G-Y ), and the common documents (Y G) were used as the testing data ( Y Test ). In the experiments, we measured the integration performance with F 1 measure. We have conducted experiments on integrating Yahoo! to Google (Y G) using the TM-driven model. The experimental results show that using SVM with both intra-tm and inter-tm conceptual semantic information (TM-SVM) can significantly and consistently improve the integration performance of SVM in most cateogries in Y G fusion. The experimental results show that TM-SVM can get the most improved integration performance using the TM-driven model with λ=0.1 and α=0.25. To 144

155 6 Lecture Notes in Computer Science: Authors Instructions demonstrate the performance of the TM-driven model, TM-SVM is further compared with the enhanced Naïve Bayes (ENB) model [3] over the five categories. According to the performance comparison in Figure 3, TM-SVM performs best on average and outperforms ENB in four out of the five categories. In photo, TM-SVM has a significant boost because inter-tm can find correct mappings between subcategories with more true positive documents, such as photographers, and techniques and style, and reduce the number of false positives as well. In further analysis of ENB, ENB performs best only in software because over 80% documents belonging to software are distributed in three main subcategories, namely, desktop customization, operating systems, and internet, in both source and destination taxonomies. In such a case, the performance improvement of the TM-driven model is hence not as significant as ENB. 5 Conclusion Although different methodologies have been studied for Web information fusion, integrating and merging semantic resources is still a major challenge and research issue for Web information fusion because of the large-scale, dynamic, heterogeneous, and hyperlinked nature. In this paper, an integration mechanism based on the TM-driven framework is proposed for Web information fusion. Augmented with both the intra-category and inter-category features, the resources in the source category can be more precisely integrated into the correct destination category to advance Web information fusion. The experimental results show that inter-tm-svm can achieve the best F 1 score in most cases in Y G taxonomy fusion. The results also show that the ontological concepts and relationships extracted by the TM-driven integration model can help F 1 enhancements in a significant portion of all cases. References 1. Alasoud, A., Haarslev, V., Shiri, N.: A Hybrid Approach for Ontology Integration. In: Proceedings of the 31st VLDB Conference, Trondheim, Norway (2005) 2. Garshol, L.M., Moore, G.: Topic Maps Data Model (2008) 3. Agrawal, R., Srikant, R.: On Integrating Catalogs. In: Proceedings of the 10th International Conference on World Wide Web (WWW 2001), Hong Kong (2001) Sarawagi, S., Chakrabarti, S., Godbole, S.: Cross-training: Learning Probabilistic Mappings Between Topics. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (2003) Zhang, D., Lee, W.S.: Web Taxonomy Integration Using Support Vector Machines. In: Proceedings of the 13th International Conference on World Wide Web (WWW 2004). (May 2004) Wu, C.W., Tasi, T.H., Lee, C.W., Hsu, W.L.: Web Taxonomy Integration with Hierarchical Shrinkage Algorithm and Fine-grained Relations. Expert Systems with Applications 35(4) (2008)

156 A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers Soohyun Yang and Yeonseung Ryu Department of Computer Engineering, Myongji University Yongin, Gyeonggi-do, Korea Abstract. Recently as the energy dissipation in mission critical computers are being important issue, non-volatile memory-based hybrid main memory organization have been emerged as a solution. In this paper, we study a new memory management scheme which considers DRAM/PRAM hybrid main memory and flash memory based storages. The goal of proposed scheme is to minimize the number of write operations on PRAM and the number of erase operations on flash memory. In order to evaluate proposed scheme, we compare it with legacy memory management schemes through trace-driven simulation. Keywords: Buffer Cache, Buffer Replacement, Non-volatile Memory, Flash Memory, DRAM/PRAM Hybrid Main Memory 1 Introduction For several decades, DRAM has been used as the main memories of computer systems. However, recent studies have shown that DRAM-based main memory spends a significant portion of the total system power and the total system cost with the increasing size of the memory system. Fortunately, various non-volatile memories such as PRAM (Phase change RAM), FRAM (Ferroelectric RAM) and MRAM (Magnetic RAM) have been developed as a next generation memory technologies. Among these non-volatile memories, PRAM is rapidly becoming promising candidates for large scale main memory because of their high density and low power consumption. In order to tackle the energy dissipation in DRAM-based main memory, some recent studies introduced PRAM-based main memory organization [1] and DRAM/PRAM hybrid main memory organization [2, 3]. A buffer cache mechanism is usually employed in modern operating system (OS) to enhance the performance that is limited by slow secondary storage. When OS receives a read request from an application, file system in OS copies the data from storage to the buffer cache in the main memory and serves the next read operations from the faster main memory. Similarly, when OS services a write request, it stores data to the buffer cache and later flushes several data together to the storage. This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology( ). 146

157 2 In this paper, we propose a new buffer cache management scheme called PA-CBF (PRAM-aware Allocation Clean Block First Replacement) for next generation mission critical computers with DRAM/PRAM hybrid main memory and flash memory storage devices. Proposed PA-CBF scheme tries to minimize both the number of write operations on PRAM and the number of erase operations on flash memory. When PA- CBF allocates a buffer to accommodate requested data, it allocates DRAM or PRAM according to the type of request. If the request type is read, the PA-CBF tries to allocate buffer from PRAM. Otherwise, it tries to allocated buffer from DRAM. During the buffer replacement procedure, PA-CBF considers merge operations performed in FTL in order to minimize the number of erase operations. Trace-driven simulation results show that PA-CBF outperforms the legacy buffer cache schemes. 2 Background and Previous Works 2.1 Flash Memory and Buffer Cache Schemes A NAND flash memory is organized in terms of blocks, where each block is of a fixed number of pages. A block is the smallest unit of erase operation, while reads and writes are handled by pages [4]. Flash memory cannot be written over existing data unless erased in advance. The number of times an erasure unit can be erased is limited. The erase operation can only be performed on a full block and is slow that usually decreases system performance. If the flash memory is used as storage device, OS usually employs a software module called flash translation layer (FTL) between file system and flash memory device [5-6]. An FTL receives read and write requests from the file system and maps a logical address to a physical address in the flash memory. A log block scheme, called block associative sector translation (BAST), was proposed in [5]. In the BAST scheme, flash memory blocks are divided into data blocks and log blocks. Data blocks represent the ordinary storage space and log blocks are used for storing updates. When an update request arrives, the FTL writes the new data temporarily in the log block, thereby invalidating the corresponding data in the data block. Whenever the log block becomes full or the free log blocks are exhausted, garbage collection is performed in order to reclaim the log block and the corresponding data block. During the garbage collection, the valid data from the log block and the corresponding data block should be copied into an empty data block. This is called a merge operation. After the merge operation, two erase operations need to be performed in order to empty the log block and the old data block. When the data block is updated sequentially starting from the first page to the last page, the FTL can apply a simple switch merge, which requires only one erase operation and no copy operations. That is, the FTL erases the data block filled with invalid pages and switches the log block into a data block. As the flash memory based storage devices have been widely used, buffer cache schemes have also been studied to consider the erase-before-write characteristic of flash memory. The early page-level buffer management schemes did not consider the existence of FTL [7]. They only focused on reducing the number of write operations to flash memory because the write operations accompany the erase operations. Recently some block-level buffer management algorithms consider the address translation algorithm used in FTL to reduce the overhead of garbage collection [8, 9]. 147

158 3 In [7], a page-level scheme called CFLRU (Clean first LRU) was proposed. CFLRU maintains the page list by LRU order and divides the page list into two regions, namely the working region and clean-first region. In order to reduce the write cost, CFLRU first evicts clean pages in the clean-first region by the LRU order, and if there are no clean pages in the clean-first region, it evicts dirty pages by their LRU order. CFLRU can reduce the number of write and erase operations by delaying the flush of dirty pages in the buffer cache. In [8, 9], block-level replacement schemes called FAB (Flash Aware Buffer management) and BPLRU (Block Padding LRU) were proposed, which consider the block merge cost in the log block FTL schemes. When a page in the buffer cache is referenced, all pages in the same block are moved to the MRU position. When buffer cache is full, FAB scheme searches a victim block from the LRU position which has the largest number of pages in the buffer cache. Then, all the pages of the selected block are passed to the FTL to flush into the flash memory. BPLRU scheme also evicts all the pages of a victim block like FAB, but it simply selects the victim block at the LRU position. In addition, it writes a whole block into a log block by the in-place scheme using the page padding technique. Therefore, all log blocks can be merged by the switch merge, which results in decreasing the number erase operations. 2.2 PRAM A PRAM cell uses a special material, called phase change material, to represent a bit. PRAM density is expected to be much greater than that of DRAM (about four times). Further, because the phase of the material does not change after power-off, PRAM has negligible leakage energy regardless of the size of the memory. Though PRAM has attractive features, the write access latency of PRAM is not comparable to that of DRAM. Also, PRAM has a worn-out problem caused by limited write endurance. Since the write operations on PRAM significantly affect the performance of system, it should be carefully handled. 3 Proposed Scheme Fig. 1 illustrates the system configuration considered in this paper in which main memory consists of DRAM and PRAM, and secondary storage is based on flash memory. The proposed PA-CBF (PRAM-aware Allocation Clean Block First Replacement) scheme is a block-level scheme which maintains a LRU list based on the flash memory block as shown in Fig. 2. The LRU list is composed of block headers. Each block header manages buffers of member pages which are loaded from flash memory. When a page p of block b in the flash memory is first referenced, the PA-CBF allocates a new buffer and stores page p in the allocated buffer. If the block header for block b does not exist, the PA-CBF allocates a new block header and attaches the buffer of page p to the header of block b. Further, b is placed at the MRU position of LRU list. Whenever a page in the buffer cache is referenced, all pages in the same block are moved to the MRU position. The PA-CBF defines a window as n blocks from the LRU position in the list. The size of window is defined as n. In Fig 2, for example, window size is

159 4 Fig. 1. System configuration. When the PA-CBF allocates a new buffer, it allocates it from DRAM or PRAM according to the type of request (i.e., read or write). If the request type is read, the PA- CBF tries to allocate buffer from PRAM. Otherwise, it tries to allocate buffer from DRAM. When there is no free space in the desired memory, the PA-CBF allocates it from any type of memory. For example, in Fig. 2, page p12 of block B3 is stored in PRAM buffer because it is requested by read operation. Fig. 2. Buffer cache structure of PA-CBF. If all free buffers are used up, the PA-CBF must perform buffer replacement procedure. During the buffer replacement procedure, the PA-CBF will find a clean block from the window region (see Fig. 2). A block is clean if member buffers are all clean. If the PA-CBF cannot find a clean block, it selects victim block at LRU position. If the victim block contains dirty pages, then the PA-CBF performs page padding technique when it flushes the victim block to the flash memory in order to consider block merge overhead. If the victim block is clean, the PA-CBF can make the pages of the victim block free without write operations to the flash memory. 4 Simulation Results In order to evaluate the proposed scheme, we compared PA-CBF with two of the existing schemes using simulation: LRU and CFLRU. We assume that the hybrid main memory consists of DRAM and PRAM, which are divided by a memory 149

160 5 (a) Hit ratio (b) Erase count on Flash (c) Write count on PRAM Fig. 3. Performance comparison varying buffer cache size address. The memory which has the low memory address is DRAM and the high section is allocated to PRAM. In case of previous schemes, we assume that they allocates buffer from DRAM and PRAM alternately. We also assume that medium of storage device is flash memory. The flash memory model used in the simulation was the Samsung 16GB NAND flash memory [4]. The page size is 4 KB and the number of pages in a block is 64. We implement BAST scheme as an FTL scheme of flash memory because it is a representative and basic log block scheme. In BAST scheme, 100 log blocks were used. For the workload for our experiments, we extracted disk I/O traces from Microsoft Windows XP-based notebook PC, running several applications, such as document editors, web browsers, media player and games. The read/write ratio of workload is 67%/33%. We measured the hit ratio, the required number write operations on PRAM and the required number of erase operations on flash memory while varying the buffer cache size from 4 to 20MB. As the hit ratio of CFLRU is affected by the window size, in this experiment, we set it to 30% of maximum capacity of buffer cache. Fig. 3 shows experiment results. According to Fig. 3(a), cache hit ratio of the PA- CBF is slightly less than other two schemes. This is because the PA-CBF is a blockbased buffer cache schemes. Since the block-based schemes replace all pages of the 150

161 6 victim block, it manifests lower cache hit ratio but it can reduce erase overhead as shown in Fig. 3(b). According to Fig. 3(b), the erase count of the PA-CBF is less than other two schemes. The reason is that the PA-CBF performs block padding like BPLRU to reduce merge overhead when it evicts victim block. Further, if the victim block contains no dirty pages, the PA-CBF evicts the victim block free without write operations to the flash memory. Fig. 3(c) shows that the PA-CBF is much better than other two schemes in terms of write count on PRAM. This is because the PA-CBF allocates DRAM buffer for write requests. 5 Conclusion In this paper, we propose a new buffer cache management scheme called PA-CBF for computers with DRAM/PRAM hybrid main memory and flash storage devices. Proposed PA-CBF scheme minimizes both the number of write operations on PRAM and the number of erase operations on flash memory. We show through trace-driven simulation that the PA-CBF outperforms the legacy buffer cache schemes like LRU and CFLRU. References 1. M. K. Qureshi, V. Srinivasan, and J. A. Rivers, "Scalable High Performance Main Memory System Using Phase-Change Memory Technology," in Proc. of International Symposium on Computer Architecture, G. Dhiman, R. Ayoub, and T. Rosing, "PDRAM: A Hybrid PRAM and DRAM Main Memory System," in Proc. of Design Automation Conference, H. Park, S. Yoo, and S. Lee, Power Management of Hybrid DRAM/PRAMbased Main Memory, in Proc. of Design Automation Conference, pp , Samsung Electronics, K9XXG08UXM.1G x 8 Bit/2G x 8 bit NAND Flash Memory 5. J. Kim, J. Kim, S. Noh, S. Min, and Y. Cho, A space-efficient flash translation layer for compactflash systems, IEEE Transactions on Consumer Electronics, vol. 48, no. 2, Y. Ryu, SAT: switchable address translation for flash memory storages, in Proc. of IEEE Computer Software and Applications Conference (COMPSAC), Jul S. Park, D. Jung, J. Kang, J. Kim, and J. Lee. CFLRU: a replacement algorithm for flash memory, in Proc. of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp , H. Jo, J. Kang, S. Park, and J. Lee, FAB: Flash aware buffer management policy for portable media players, IEEE Transactions on Consumer Electronics, Vol. 48, No. 2, pp , H. Kim and S. Ahn, BPLRU: A buffer management Scheme for improving random writes in flash storage, in Proc. of 6 th USENIX Conference on File and Storage Technologies (FAST), pp ,

162 NC-based Interoperability Synergy Assessment Model Hyun-sik Son*, Tae-gong Lee** * NCW of Ajou Univ. ** Prof. at Graduate School of Information and Communication, Ajou Univ. Abstract. The future battle space environment is changing from a PCO(platform centric operation) to a NCO(network centric operation). NCO take place not only physical, information, cognitive, but also social domain. Interoperability in a NCO makes the entire command hierarchy ranging from commanders to warriors have the same situational awareness which create synergy from physical to social domain for the forces and enhancing mission effectiveness. NCEI model was developed to measure synergy from physical to social domain which enhances the mission effectiveness through information sharing. NCEI model consists of layer, function, attribute and metric. The assessment criteria for NCEI model were developed to compare organic interoperability with synergy interoperability. Keywords: Interoperability, Synergy 1 Introduction Ubiquitous information technology made systems intelligent and connectable between them, and consequently caused paradigm of the military operation to change from platform centric operation to network centric operation [1]. Interoperability in a PCO that includes the point-to-point concept is based on physical and information domains in which equipment, devices and applications are connected with each other to share information. But, interoperability in a NCO that includes the concept of plug and play is based on cognitive domain and social domain in which the entire command hierarchy ranging from commanders to warriors take the same perspective, then helps create synergy from physical domain to social domain to increase mission effectiveness. Interoperability maturity model is also changing from a platform centric maturity model such as LISI model with which interoperability between systems is assessed to net centric maturity model such as SCOPE model with which interoperability between organization and task is assessed [5]. Interoperability domains for NCO consist of physical, information, cognitive, and social domains [2]. However, until now interoperability maturity model is focusing the point-to-point concept based on platform centric which consist of physical and information domains, don t exist the NC based maturity model. The objective of this paper provides a NC based interoperability synergy assessment model which measures synergy from physical to social domain which enhances the mission effectiveness through information sharing. "This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA(National IT Industry Promotion Agency" (NIPA-2012-(H )) 152

163 2 NCEI Model 2.1 overview As shown in Fig. 1, NCEI Model was developed to measure interoperability in physical, information, cognitive, and social domains. In particular, it concentrated on assessment of synergy made by interoperability from physical domain to social domain. NCEI model consists of network layer, organic information layer, individual information layer, awareness layer, understanding layer, decision layer and action layer [4] [5]. Also, each layer consist of functions, quality attributes and metrics [3]. Quality attributes is the characteristics to measure interoperability. Metric must include a measurement formula and unit in measuring quality attributes. There is 1 to N relationship between model components and quality attributes. Quality attributes and metric have 1 to 1 relationship. Fig. 1. NCEI Model 2.2 Interoperability measurement method Assess organic/synergy NCEI. The assessment of NCEI Model consists of three things. First, entities participating into the scenario (organic interoperability) showed NCEI model layer, and secondly, interaction between the entities (interaction interoperability) showed NCEI model layer, and lastly, of each entity after the reflection of synergy created by interaction (synergy interoperability) showed NCEI Model layer. The criteria of NCEI Model Assessment 153

164 Assessment criteria of organic platform capability First, the interoperability capability of entities participating into the scenario prior to synergy occurrence is defined as organic interoperability. An assessment grade ranges from 0 to 1. The closer the grade is to 1, the better the capability is. Assessment criteria of the communication layer (Com organic ) are described in following. In the case of Air-to-Air of the US air force, data link capability of 0.74 and voice capability of 0.16 were applied [6]. For the assessment criteria of information collection layer(c organic ), two attributesaccuracy and updating-were selected. Accuracy was calculated as the number of collected information items was divided by the number of actual information items. According to an updating cycle of the collected information, updating is set to 1.0 from 0 to 10 seconds, 0.7 from 10 to 60 seconds, 0.5 from 60 seconds to 3 minutes, and 0.3 more than 3 minutes. Assessment criteria of information offering layer(i organic ) are presented in table 1. Type Table 1. Assessment criterial information offering layer capability Fusion COP generation Self COP creation Digital Text & simple digital Voice Capability Assessment criteria of decision layer(d organic ) are presented in table 2. Table 2. Assessment criteria of decision layer capability Accuracy Capability Updating cycle Metric Decision layer capability Video to 10 seconds 1.0 Symbol to 60 seconds 0.7 Text to 180 seconds 0.5 Voice 0.3 more than 3 minutes 0.3 (Accuracy + Updating)/2 Lastly, assessment criteria of action layer(a organic ) are following. In the case of shooter system, the capability can be (1) and in the case of non-shooter system, it is 0. Aorganic= ( CorganicDorganic+ Iorganic)/2 (1) Assessment criteria of Synergy platform capability. In the communication layer, synergy NCEI Model assessment criteria are Com organic. In the information collection layer, assessment criteria are n Csynergy= Corganic+ ( Comi Ci) i= 1, i is an input value coming from the interaction of other entities. In the information offering layer, synergy assessment criteria is n Isynergy = Iorganic + ( Comi Ii) ( i + 1) In the decision layer, synergy assessment criteria i=

165 n are D D Com D ( i 1) synergy = organic + i i + In the action layer, synergy assessment i= 1. criteria are A ( Csynergy Dsynergy Isynergy) 2 synergy = +. Instantiate entity. Once entities, their interoperability characters, and the states of those characters have been identified, then a specific system can be modeled, or instantiated by (2), as a sequence (Bullock, 2006; Amanowicz & Gajewski, 1996) of states of system characters. a b c d = X( E) = {{ a, b, c, d}{, e, f, g, h}{, i, j, k, l} } = e f g h i j k l then, a, b, c, d, e, f, g, h, i, j, k = {0,1} (2) Measure interoperability. To measure Interoperability have been well-studied.(thomas C.Ford, 2008) Given a pair of s, s instantiated as σ', σ" R [ 0, cmax], then I= Simreal= wmms, written out completely in (3), is an interoperability function which gives a weighted, normalized measure of the similarity of systems instantiated with real-valued character states where is the average character state value of a pair of system instantiations, MMS is the Modified Minkowski Similarity function, is the number of characters used to instantiate σ,σ,c max is the maximum character state value, and r is the Minkowski parameter (usually set to r=2). n ' n " 1 σ ( i) + σ ( i) r 1 ' ( ) " r i 1 i 1 n = = σ i σ ( i) I = Sim = 1 b real 2nc i max r n i= 1 c max (3) 2.3 Application Development scenario. Participants in this scenario are UAV, Orion, F-111, Arty, ARH, SF, and HQ, among them, UAV, Orion, F-111, and HQ send and receive information over their data link. The rest are assumed to use voice. Moreover, only UAV and ORION can detect the information on enemy threat. As shown in table 3, each entity's collected information items relating to 9 factors were assumed on the basis of enemy ground target. 155

166 Table 3. Collected information Type UAV ORION F-111 ARH HQ Arty SF IP O O O O X X X Heading O O O O X X X Distance O O O O X O O TGT Elevation O O O O X O O TGT Description O X X O X X O TGT Location O O O O X O O Type Marks O X X O X X O Friendly Location X O X X X X X Egress Direction O O O O X X X In the scenario, each entity's decision updating cycle was assumed as shown in table 4. Type UAV Table 4. Decision updating cycle ORIO N F-111 ARH HQ Arty SF Decision updating cycle None 2 min 1 min 30 sec Real-time 10 min 30 min Assessment of NCEI Model NCEI Model based Organic Platform Capability. For Organic interoperability in NCEI Model, each layer's organic interoperability assessment criteria are applied. And the application results are presented in Fig. 5. NCEI Model based Synergy Platform Capability. For Synergy interoperability in NCEI Model, each layer's synergy interoperability assessment criteria are applied. And the application results are presented in Fig. 5. For example, synergy NCEI Model application result of ARH, is shown as follows. In the information offering layer, synergy assessment criteria is n Isynergy= Iorganic+ ( Comi Ii) ( i+ 1), I organic is 0.4, i=1, and information offering layer i= 1 capability is received from HQ. Therefore, as each value is substituted in the formula, ARH's synergy information offering layer capability is 0.48 as shown in 0.4+(0.16*1.0)/2=0.48. Table 5. NCEI Model Assessment(organic/synergy) Com C I D A UAV 0.74/ / /0.4 0/0 0/0 Orion 0.74/ /1 0.4/ / /1 ARH 0.16/ /1 0.4/ / /0.71 F / /1 0.4/0/74 0.7/1 0.51/0.87 HQ 0.74/0.74 0/0.43 1/1 1/1 0/0 Arty 0.16/ / / /0.241 SF 0.16/ / / / /

167 Interoperability measurement The result of Interoperability measurement that applies the formula (3) is shown in Table 6. Interoperability of Synergy NCEI model is higher than organic NCEI model. Table 6. Interoperability measurement of the NCEI Model assessment(organic/synergy) UAV Orion ARH F-111 HQ Arty SF UAV 0.417/ / / / / / /0.26 Orion 0.509/ / / / / / /0.225 ARH 0.362/ / / / / / /0.236 F / / / / / / /0.236 HQ 0.353/ / / / / / /0.172 Arty 0.199/ / / / / / /0.276 SF 0.2/ / / / / / / Conclusion This study investigated related researches including interoperability definition and interoperability model, and thereby developed NCEI model. The NCEI model was developed to assess interoperability in physical, information, cognitive, and social domains, and especially focused on synergy from physical to social domain. In addition, each layer consists of function, attribute and metric. The developed NCEI model was applied to scenario and then capabilities before and after occurrence of synergy was compared to verify validity. The proposed model can be utilized for effectiveness analysis and establishment of military power (NCW operation concepts and acquisition). In the future, it is necessary to develop additional quality attributes of each layer with metric. 4 References 1. Hyun-sik Son, Tae-gong Lee, Sang-gun Park, Noh-hyuk Park, A Study on the Agile EA Quality Value Chain Framework for Agile Enterprise, JITA, Vol 8, No1, Tae-gong Lee, Theories and Application of NCW, Hongreung Publication Company, Nam-gyu Lim, A Methodology for the NCO Effectiveness Analysis Model Development, Ph.D. Dissertation of Ajou University, Hyun-sik son, Tae-gong Lee, A study on the Net Centric Entity Interoperability Layer, The Journal of Korea Information and Communication Society, Vol.37B, No4, Hyun-sik son, Tae-gong Lee, A study on the NC-based interoperability Synergy Assessment Model, The 16 th ROK-US Defense Analysis Seminar, Daniel Gonzales etc 3, Network-Centric Operations Case Study (Air-to-Air Combat with and without Link 16), RAND,

168 Performance Evaluation of Recursive Network Architecture for Fault-tolerance 1 Minho Shin, 1 Raheel Ahmed Memon, 1 Yeonseung Ryu *, 2 Jongmyung Rhee and 3 Dongho Lee 1 Department of Computer Engineering, Myongji University, Korea 2 Department of Information and Communication Engineering, Myongji University, Korea 3 Agency of Defense Development, Korea ysryu, mhshin {@mju.ac.kr} Abstract. Network fault tolerance is one of the most important capabilities required by mission-critical systems such as the naval Combat System Data Network (CSDN). In this paper, we present performance evaluation results of a fault-tolerant network scheme called Recursive Scalable Autonomous Faulttolerant Ethernet (RSAFE). The primary goal of RSAFE scheme is to provide network scalability, and autonomous fault detection and recovery within given a time. We show that proposed recursive architecture can support a large number of nodes while guaranteeing the fail-over time requirement. Keywords: Fault-tolerant Ethernet, Large-scale network, Fail-over time, Mission-critical systems 1 Introduction: In the network based mission critical systems such as unmanned vehicles, military weapon and aviation equipment control systems, Ethernet connectivity must provide fault tolerance to all the constituents for the smooth operations of such systems. There have been a lot of software-based fault tolerant approaches that adopts a heartbeat mechanism for failure detection [1-6]. In our previous work, we presented a fault tolerant Ethernet scheme called Recursive Scalable Autonomous Fault tolerant Ethernet (RSAFE) [7]. In RSAFE, a large network is divided into various subnets (a subnet contains limited number of nodes), further limited number of subnets are grouped together to form different groups, then groups are recursively combined to form levels, until only one group remains in the highest level. RSAFE uses a heartbeat mechanism for fault detection. The heartbeat mechanism may cause large bandwidth consumption as the size of network grows. But, in our proposed approach by dividing the large network into small subnets and limiting the size of subnet, heartbeat mechanism can be implemented efficiently. * corresponding author. This work was supported by Defense Acquisition Program Administration and Agency for Defense Development under the contract UD070019AD. 158

169 2 In this work, we perform theoretical evaluations to analyze the number of nodes that can be supported in proposed RSAFE while providing fail-over functionality within a given time. According to our evaluation, to maintain the failover latency below 1 second, RSAFE can support up to 2400 nodes for subnet size of 16, 4700 nodes for subnet size of 32, and nodes for subnet size of 64. Therefore, proposed recursive architecture can support a large number of nodes while guaranteeing the fail-over time requirement. In section 2, we describe related works. In section 3, we present our RSAFE scheme. We then show analysis study in Section 4. Finally, we summarize the work in Section 5. 2 Related Works In [1], Scalable Autonomous Fault-tolerant Ethernet (SAFE) scheme was proposed. SAFE divides large network into several subnets. In order to bound fault recovery times, SAFE limits the number of nodes in a subnet and configures the subnets as a star network. Because SAFE scheme is based on star topology, if two switches in the center subnet are destroyed at the same time, the entire network's operations could be interrupted. [6] proposed a recursive network scheme, called DCell, for interconnectivity in between exponentially increasing number of servers in data centers. High-level DCell is constructed from many low-level DCells while DCells at a same level are fully connected to each other. Since each server in DCell networks is connected to different levels of DCells via its multiple links, it becomes a rather costly solution. 3 RSAFE We proposed RSAFE (Recursive SAFE) scheme in [7]. RSAFE is a low cost fault tolerant architecture for mission critical networks. The architecture of RSAFE is shown in Fig. 1. RSAFE comprises of three basic building blocks; Subnet, Group and Level. Together these components form a Recursive structure. The building algorithm for RSAFE network is described in detail in our previous work. Fig. 1 (a) shows that a subnet is constructed from limited number of nodes and two switches. There are four possible paths for communication between two nodes. In the subnet one data path is defined as the primary path for communication between two nodes of a subnet. Rest of the paths will remain on standby and will activate in the case if the primary data-path encounter a fault. Fig. 1 (b) shows that a limited number of subnets together form a group where each subnet has a dual connectivity with other subnets of the group. The connectivity is formed as second switch of Subnet 0 is connected to first switch of Subnet 1 and second switch of Subnet 1 is connected to first switch of Subnet 2, for last subnet the second switch of Subnet n is connected to first switch of Subnet 0. Fig. 1 (c) shows that a limited number of Groups together forming a Level. Once the Subnets and Groups build the next step is to build levels, Level0 is combination of limited number of groups, and Level1 to onward is the combination of different Levels. In Fig. 1, it is shown that Level0,0 only, but suppose that there Level0,m 159

170 3 Fig. 1. Three basic building blocks of Recursive Scalable Autonomous Fault-tolerant Ethernet: a) Subnet, b) Group, and c) Level levels are available then the Level1 will build by combining all the levels of Level0. Similarly all the required levels can be created recursively; we can say that, in this structure all the higher levels are constructed recursively from lower levels. And for connectivity between the Levels or groups the subnets are selected randomly from each level/group and connected to each other. RSAFE basically detects network faults on the basis of subnet using heartbeat mechanism. A heartbeat message (HBM) is an Ethernet frame sent and received between nodes in each subnet. HBM can reach only the nodes in a subnet and cannot reach the outside nodes of the subnet because it is layer-2 data. Each node is responsible for detecting the faults of its belonging subnet and recovering from it. We adopt our previously proposed solution in order to exchange the information outside the subnets [1]1, 3]. RSAFE manages master nodes in each subnet for intersubnet fault recovery. Primary Master Node (PMN) is an active master node and Secondary Master Node (SMN) is a standby master node, in case if PMN fails then SMN can recover the fault. PMNs communicate with each other using IP packets to exchange the subnet status only when fault occurs. When a fault occurs in a subnet, the master node can detect it by heartbeat mechanism and notify to other master nodes to recover from the fault on the inter-subnet communication path. When the master 160

171 4 node receives a notification from other master nodes, it in turn sends a notification to nodes in its subnet to recover from the fault. 4 Evaluation 4.1 Failover latency within a subnet Eq. 1 shows the upper bound of the failure over time T sub within a subnet, where T HBM is heartbeat interval: 2. (1) If we increase the number of nodes in a subnet, the heartbeat interval should also be increased. This results in, however, increasing the fault recovery time because the node detects the faults when it has not received two consecutive heartbeats. 4.2 Failover latency across the network In this section, we compute the upper bound of the failover time throughout the network. Let N denote the number of nodes in the network and d denotes the number of groups in each level. For brevity of analysis, we assume that N = d k for some positive integer k, which is called the depth of the network. Note that RSAFE is recursively structured; each level consists of d groups connected as a ring, and each group forms a one-step lower level. Each subnet is level-0 and the whole network is level-k. Without loss of generality, we assume that d is an even number. We assume that each link has the same latency of T L. Also note that two connected groups of the Fig. 2. Failover message path in Level k. The PMN P s in a faulty subnet sends a message to another PMN P d, which belongs to the group G m, m hops far from the P s along the ring. 161

172 5 same level are connected via their gate-way switches chosen at random. When a fault occurs in a subnet, the PMN recognizes the fault event within T sub seconds. Then, the PMN sends a message to each PMN. Consider the highest level of the network (Fig. 2). As shown in the figure, the message travels from the source group (G 1 ) to the destination group (G m ) through the m-1 intermediate groups (G 2, G 3,, G m-1 ) along the group ring. The path from the source PMN (P s ) to the destination PMN (P d ) consists of the first hop from the source PMN to the subnet switch (P s S s ), the path from the switch to the gateway switch (s s g 12 ), multiple hops between groups along the ring (g 12 g 21, g 22 g 31,, g m-1,2 g m1 ), the paths from in-gateway to out-gateway in each intermediate groups (g 21 g 22, g 31 g 32, ), the path from the in-gateway of the destination group to the subnet switch of the destination PMN (g m1 s d ), the hop from the subnet switch to the destination PMN (s d P d ). Finally, the destination PMN sends a notification to each node of its subnet in one hop. Therefore, we can compute the upper bound of the fail-over-time latency by T T +T +T T +T where T grp (k) is the largest path length between switches within a group of level-k, i.e., the whole network. Now we focus on computing T grp (k). Due to recursive structure, we get T =R+R+1T for i>1, where R is the maximum hop count along the ring between two groups. When i=0, T grp (0) = 1 (one hop between two switches). By unfolding the recursive formula, we get T =2R+1 1 Since R=d/2 (d is the ring size, so d/2 is the path length between farthest groups) and k=log d (N/m) where N is the number of nodes and m is the subnet size. By Eq. 1, we have an upper bound of the failover latency as T 2T! +T +2T " d 2 +1$ % & '/) Fig. 3 shows T grp (k), the largest number of hop counts from the source switch to the destination switch as the number of nodes in the network increases. Considering 1 millisecond link delay, T grp (k) =100 is equivalent to the failover latency of 1 second. Three lines, from bottom to up, represent T grp (k) when the subnet size is 16, 32, and 64. To maintain the failover latency below 1 second, RSAFE can support up to 2400 nodes for subnet size of 16, 4700 nodes for subnet size of 32, and nodes for subnet size of Conclusion Network fault tolerance is one of the most important capabilities required by missioncritical systems such as the naval Combat System Data Network (CSDN). In 162

173 6 Fig. 3. Maximum hop count between two switches as the number of nodes increase. Multiplied by the link delay of 1 millisecond, it represents failover latency. this paper, we presented theoretical evaluation results of our recursive fault-tolerant network scheme called RSAFE. In RSAFE, a large network is constructed recursively from subnets, groups, and levels. We showed that proposed recursive architecture can support a large number of nodes while guaranteeing the fail-over time requirement. References [1] K.Y Kim, Y.S Ryu, J.M Rhee, and D.H Lee, SAFE: Scalable Autonomous Fault-tolerant Ethernet, Proc. of the 11th International Conference on Advanced Communication Technology (ICACT), pp , [2] H.A Pham, J.M Rhee, S.M Kim, and D.H Lee, A Novel Approach for Fault Tolerant Ethernet Implementation, Proc. of the 4th International Conference on Networked Computing and Advanced Information Management (NCM 2008), Vol. 1, pp , [3] H.A Pham, J.M Rhee, Y.S Ryu, and D.H Lee, Performance Analysis for a Fault-Tolerant Ethernet Implementation based on Heartbeat Mechanism, Proc. of the 41th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, [4] S. Song, J. Huang, P. Kappler, R. Freimark, J. Gustin, and T. Kozlik, Fault-Tolerant Ethernet for IP- Based Process Control Networks Proc. of the 25th Annual IEEE International Conference on Local Computer Networks (LCN 00), pp , [5] H.A Pham, J.M Rhee, Y.S Ryu, and D.H Lee, "An Adaptive and Reliable Data-Path Determination for Fault-Tolerant Ethernet Using Heartbeat Mechanism," Proc. of the 5th International Conference on Computer Sciences and Convergence Information Technology (ICCIT 2010), pp , [6] C. Guo, H. Wu, K. Tan, L. Shei, Y. Zhang, and S. Lu. Dcell: a Scalable and Fault-tolerant Network Structure for Data Centers ACM SIGCOMM, [7] R.A. Memon, Y.S. Ryu, M.H. Shin, J.M. Rhee, and D.H. Lee, "Building Scalable Fault Tolerant Network," Proc. of the 14th International Conference on Advanced Communication Technology (ICACT),

174 Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages Soohyun Yang and Yeonseung Ryu Department of Computer Engineering, Myongji University Yongin, Gyeonggi-do, Korea Abstract. A buffer cache mechanism is usually employed in modern operating system to enhance the performance that is limited by slow secondary storage. In this paper, we present the implementation of a trace-driven simulator for buffer cache schemes that consider DRAM/PRAM hybrid main memory and flash memory based storages. The goal of simulator is to analyze the legacy buffer cache schemes by measuring the number of write operations on PRAM and the number of erase operations on flash memory. Keywords: Simulator, Buffer Cache, Buffer Replacement, Non-volatile Memory, Flash Memory, DRAM/PRAM Hybrid Main Memory 1 Introduction Most modern operating systems (OS) usually employ a buffer cache mechanism to enhance the I/O performance that is limited by slow secondary storage. When OS receives a read request from an application, file system in OS copies the data from storage to the buffer cache in the main memory and serves the next read operations from the faster main memory. Similarly, when OS services a write request, it stores data to the buffer cache and later flushes several data together to the storage. For the past decades, buffer cache schemes have been implemented for DRAM-based main memory and hard disk based secondary storage. Recently, NAND flash memory is becoming important secondary storage for mobile computers because of its superiority in terms of fast access speed, low power consumption, shock resistance, high reliability, small size, and light weight. Further, recent studies have shown that DRAM-based main memory spends a significant portion of the total system power and the total system cost with the increasing size of the memory system [1]. Fortunately, various non-volatile memories such as PRAM (Phase change RAM), FRAM (Ferroelectric RAM) and MRAM (Magnetic RAM) have been developed as a next generation memory technologies. Among these nonvolatile memories, PRAM is rapidly becoming promising candidates for large scale main memory because of their high density and low power consumption [2]. In order This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology( ). 164

175 2 to tackle the energy dissipation in DRAM-based main memory, some recent studies introduced PRAM-based main memory organization [3] and DRAM/PRAM hybrid main memory organization [4, 5]. In this paper, we present an implementation study of buffer cache simulator that considers DRAM/PRAM hybrid main memory and flash memory storage devices. The goal of our simulator is to measure the number of write operations on PRAM and the number of erase operations on flash memory for several buffer cache schemes. Using our simulator, we can develop efficient buffer cache schemes to enhance the performance compared to existing schemes. The rest of this paper is organized as follows. Section 2 gives an overview of the non-volatile memory, flash translation layer and the previous buffer cache schemes. Section 3 explains the design of the simulator. Finally, Section 4 concludes the paper. 2 Background 2.1 Flash Memory and PRAM A NAND flash memory is organized in terms of blocks, where each block is of a fixed number of pages. A block is the smallest unit of erase operation, while reads and writes are handled by pages [6]. Flash memory cannot be written over existing data unless erased in advance. The number of times an erasure unit can be erased is limited. The erase operation can only be performed on a full block and is slow that usually decreases system performance. A PRAM cell uses a special material, called phase change material, to represent a bit. Table 1 shows the comparison of DRAM and PRAM. PRAM density is expected to be much greater than that of DRAM (about four times). Further, because the phase of the material does not change after power-off, PRAM has negligible leakage energy regardless of the size of the memory. Though PRAM has attractive features, the write access latency of PRAM is not comparable to that of DRAM. Also, PRAM has a worn-out problem caused by limited write endurance. Since the write operations on PRAM significantly affect the performance of system, it should be carefully handled. Table 1. Comparison of DRAM and PRAM Attributes DRAM PRAM Non-volatility No Yes Cost/TB Highest (~4x PRAM) Low Read Latency 50 ns ns Write Latency ns ~ 1us Read Energy ~ 0.1 nj/b ~ 0.1 nj/b Write Energy ~ 0.1 nj/b ~ 0.5 nj/b Idle Power ~ 1.3 W/GB ~ 0.05 W Endurance 10 8 for write 2.2 Flash Translation Layer 165

176 3 If the flash memory is used as storage device, OS usually employs a software module called flash translation layer (FTL) between file system and flash memory device [7-11]. An FTL receives read and write requests from the file system and maps a logical address to a physical address in the flash memory. The address mapping schemes used in FTLs are classified into three groups depending on their granularity: page-level, block-level, and hybrid-level. In the page-level scheme, a logical page number from the file system can be mapped to a physical page number in the flash memory. In the block level scheme, the logical page address is divided into a logical block number and a page offset. The logical block number is used for finding a physical block that includes the requested page, and the page offset is used as an offset to locate the page in the corresponding block. Because of the disadvantages of the page-level and the block-level schemes, hybrid-level schemes have been widely used in the industry as a compromise between the page-level mapping and the block-level mapping. Most hybrid-level schemes use a log block mechanism for storing updates [8, 10, 11]. They divide the flash memory blocks into data blocks and log blocks. Data blocks represent the ordinary storage space, and log blocks are used for storing updates. The hybrid-level schemes maintain the block mapping table for the data blocks and the page mapping table for the log blocks. A major problem of the log block scheme is that it requires merge operations to reclaim the log blocks; this problem is explained further in the following sections. A log block scheme, called block associative sector translation (BAST), was proposed in [8]. In the BAST scheme, flash memory blocks are divided into data blocks and log blocks. Data blocks represent the ordinary storage space and log blocks are used for storing updates. When an update request arrives, the FTL writes the new data temporarily in the log block, thereby invalidating the corresponding data in the data block. Whenever the log block becomes full or the free log blocks are exhausted, garbage collection is performed in order to reclaim the log block and the corresponding data block. During the garbage collection, the valid data from the log block and the corresponding data block should be copied into an empty data block. This is called a merge operation. After the merge operation, two erase operations need to be performed in order to empty the log block and the old data block. When the data block is updated sequentially starting from the first page to the last page, the FTL can apply a simple switch merge, which requires only one erase operation and no copy operations. That is, the FTL erases the data block filled with invalid pages and switches the log block into a data block. 2.3 Buffer Cache Schemes for Flash Memory Storages The legacy flash aware buffer management schemes can be classified into two categories: page-level [12-15] and block-level schemes [16, 17]. In [12], a page-level scheme called CFLRU (Clean first LRU) was proposed. CFLRU maintains the page list by LRU order and divides the page list into two regions, namely the working region and clean-first region. In order to reduce the write cost, CFLRU first evicts clean pages in the clean-first region by the LRU order, and if there are no clean pages in the clean-first region, it evicts dirty pages by their LRU order. CFLRU can reduce the number of write and erase operations by delaying the flush of dirty pages in the buffer cache. 166

177 4 In [16, 17], block-level replacement schemes called FAB (Flash Aware Buffer management) and BPLRU (Block Padding LRU) were proposed, which consider the block merge cost in the log block FTL schemes. When a page in the buffer cache is referenced, all pages in the same block are moved to the MRU position. When buffer cache is full, FAB scheme searches a victim block from the LRU position which has the largest number of pages in the buffer cache. Then, all the pages of the selected block are passed to the FTL to flush into the flash memory. BPLRU scheme also evicts all the pages of a victim block like FAB, but it simply selects the victim block at the LRU position. In addition, it writes a whole block into a log block by the in-place scheme using the page padding technique. Therefore, all log blocks can be merged by the switch merge, which results in decreasing the number erase operations. 3 Buffer Cache Simulator Fig. 1 illustrates the system configuration considered in our simulator in which main memory consists of DRAM and PRAM, and secondary storage is based on flash memory. We assume that the hybrid main memory consists of DRAM and PRAM, which are divided by a memory address. The memory which has the low memory address is DRAM and the high section is allocated to PRAM. We also assume that medium of storage device is flash memory. The flash memory model used in the simulation was the Samsung 16GB NAND flash memory [6]. The page size is 4 KB and the number of pages in a block is 64. We implement BAST scheme as an FTL scheme of flash memory because it is a representative and basic log block scheme. In BAST scheme, 100 log blocks were used. Fig. 1. System configuration for our simulator. Our buffer cache simulator is a trace-driven simulator that use disk I/O traces as input. We extracted disk I/O traces from Microsoft Windows XP-based notebook PC, running several applications, such as document editors, web browsers, media player and games. The read/write ratio of workload is 67%/33%. We also made several synthesized disk I/O traces. Table 2 shows characteristics of synthesized traces examples used in our simulator. A read/write ratio 80%/20% in Table 2 means that the read and write operations in the traces are of 80 and 20 percentages, respectively. The locality in Table 2, e.g., 80%/20%, refers to that 80% of total references are performed in the 20% of storage device. We can also use I/O traces from an OLTP 167

178 5 application running at a financial institution [18] made available by the Storage Performance Council (SPC). Table 2. Example of synthesized traces used in our simulator Type Total References Read/Write Ratio Locality T ,000 80%/20% 60%/40% T ,000 80%/20% 80%/20% In order to simulate representative buffer cache schemes, we implemented some page-level buffer cache schemes like LRU and CFLRU, and some block-level schemes like BPLRU and FAB. When we allocate buffer to store data requested from file system, we allocate buffer from DRAM and PRAM alternately. Fig. 2 illustrates an example of block-based buffer cache structure that is implemented in the simulator. Our simulator measures the hit ratio, the required number write operations on PRAM and the required number of erase operations on flash memory while varying the buffer cache size. Fig. 2. Example of buffer cache structure in our simulator. 4 Conclusion We have developed a simulator for several existing buffer cache schemes. In our buffer cache simulator, main memory consists of DRAM and PRAM, and secondary storage is flash memory. Our trace-driven simulator uses various kinds of disk I/O traces and measures performance metrics such as the hit ratio, the required number write operations on PRAM and the required number of erase operations on flash memory. By analyzing the behavior of legacy buffer schemes and their measured performance, we are going to design and implement novel buffer cache schemes to improve the performance. References 1. L. A. Barroso and U. Holzle, The Case for Energy-proportional Computing. Computer, Vol. 40, No. 12,

179 6 2. X. Dong, N. Jouppi and Y. Xie, PCRAMsim: System-level Performance, Energy, and Area Modeling for Phase-change RAM, in Proc. of International Conference on Computer Aided Design, M. K. Qureshi, V. Srinivasan, and J. A. Rivers, "Scalable High Performance Main Memory System Using Phase-Change Memory Technology," in Proc. of International Symposium on Computer Architecture, G. Dhiman, R. Ayoub, and T. Rosing, "PDRAM: A Hybrid PRAM and DRAM Main Memory System," in Proc. of Design Automation Conference, H. Park, S. Yoo, and S. Lee, Power Management of Hybrid DRAM/PRAMbased Main Memory, in Proc. of Design Automation Conference, pp , Samsung Electronics, K9XXG08UXM.1G x 8 Bit/2G x 8 bit NAND Flash Memory 7. Gal and S. Toledo, Algorithms and data structures for flash memories, ACM Computing Surveys, vol. 37, no. 2, J. Kim, J. Kim, S. Noh, S. Min, and Y. Cho, A space-efficient flash translation layer for compactflash systems, IEEE Transactions on Consumer Electronics, vol. 48, no. 2, Gupta, Y. Kim, and B. Urgaonkar, DFTL: a flash translation layer employing demand-based selective caching of page-level address mapping, in Proc. of International Conference on Architectural Support for Programming Languages and Operating Systems, pp , Y. Ryu, SAT: switchable address translation for flash memory storages, in Proc. of IEEE Computer Software and Applications Conference (COMPSAC), Jul Y. Ryu, A Flash Translation Layer for NAND Flash-based Multimedia Storage Devices, IEEE Transactions on Multimedia, 13, S. Park, D. Jung, J. Kang, J. Kim, and J. Lee, CFLRU: a replacement algorithm for flash memory, in Proc. of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp , Y. Yoo, H. Lee, Y. Ryu, and H. Bahn, Page replacement algorithms for NAND flash memory storages, in Proc. of International Conference on Computational Science and its Applications, pp , Z. Li, P. Jin, X. Su, K. Cui, and L. Yue, CCF-LRU: A new buffer replacement algorithm for flash memory, IEEE Transactions on Consumer Electronics, Vol. 55, No. 3, pp , August, X. Tang and X. Meng, ACR: An Adaptive Cost-aware Buffer Replacement Algorithm for Flash Storage Devices, in Proc. of International Conference on Mobile Data Management, H. Jo, J. Kang, S. Park, and J. Lee, FAB: Flash aware buffer management policy for portable media players, IEEE Transactions on Consumer Electronics, Vol. 48, No. 2, pp , H. Kim and S. Ahn, BPLRU: A buffer management Scheme for improving random writes in flash storage, in Proc. of 6 th USENIX Conference on File and Storage Technologies (FAST), pp , OLTP Trace from UMass Trace Repository

180 Critical Health Monitoring with Unreliable Mobile Devices Minho Shin Myongji University Abstract. As the nation s healthcare information infrastructure continues to evolve, new technologies promise to provide readily accessible health information that can help people address personal and community health concerns. Concerns about privacy and information quality, however, may impede the development and deployment of these technologies for remote health monitoring. In this research we plan to design a framework for secure remote health-monitoring systems. Specifically, we want to (i) build a realistic risk model for sensor-data quality, by interacting with health professionals, (ii) develop protocols and mechanisms for data protection and quality assurance, and (iii) propose a new health-monitoring architecture that is secure despite the weaknesses of common personal devices. 1 Introduction and Relevance The nation has an urgent need to build a national healthcare information infrastructure (NHII) that provides health information to all who need to make sound decisions about health [1]. Readily accessible and reliable health information would greatly improve everyone s ability to address personal and community health concerns. Health emergencies also require prompt and authoritative information about the situation to be readily available to those involved. Fortunately, present information technology brings us the hope that significant improvements in the public s health and well-being are not only possible but close at hand. In this research we propose to design a framework for secure remote health-monitoring systems, cutting across two core research areas of the I3P Cyber Security Research and Development Agenda: Trust Among Distributed Autonomous Parties and Wireless Security, and addressing information-security challenges in one of the nation s critical infrastructures: healthcare. Wearable and implantable medical sensors and portable computing devices present many opportunities for providing timely health information to health providers, public health professionals, and consumers [2]. By supplying real-time health information, or extensive measurements collected continuously, a sensorbased health-monitoring system complements the current healthcare information infrastructure which is based on relatively static, sparsely collected information in the patient s medical records. A remote health-monitoring system may help to reduce the cost of healthcare [3] and to simultaneously improve the quality of 170

181 the healthcare; patients may spend less time in the hospital and yet have more detailed health data, measured by wearable sensors as they go about their daily activities; caregivers can more quickly react to the medical emergencies of elders; trainers can analyze a trainee s fitness level; and consumers can maintain their own health and wellness. Privacy and information quality, however, are two major concerns in the development and deployment of remote health-monitoring systems [4, 5]. To be viable, any such system must provide usable devices that respect patient privacy while also retaining data quality required for the medical purpose it serves. There are many opportunities for the data to become lost, damaged, forged, or exposed: patients may fail to apply sensors correctly, leading to medically incorrect readings; the patient s device may be misplaced, stolen, or compromised, causing the medical data stored in the device to be divulged [6]; the sensor data may travel across multiple devices and networks before it is presented to the medical team. The problem is especially challenging, given the difficulty of hardening low-cost sensors and the personal devices that collect, process, and forward the medical data, and given that all such devices will communicate over wireless networks. In our research, we will address these issues by designing a framework for secure remote health-monitoring systems. Given the time available (one year), we will focus most on the data-quality issues. Specifically, we want to (i) build a realistic risk model for sensor-data quality, by interacting with health professionals, (ii) develop protocols and mechanisms for data protection and quality assurance, and (iii) propose a new health-monitoring architecture that is secure despite the weaknesses of common personal devices. For evaluation, we will implement a proof of concept for secure health monitoring. 2 Challenges and our Approach 2.1 Risk Analysis To design a secure health-monitoring system, we first need to understand what determines the quality of the medical sensor data and how we can quantify the degree of the data quality. Specifically, we want to identify factors that affect the data quality and then analyze to what extent they influence the data quality. Others have described overall security challenges in health-monitoring systems [4], and initial ideas for protecting health-data integrity [7], but an indepth and realistic analysis of the problem is lacking in the literature. As a preliminary analysis, we recently identified eleven factors that can affect the quality of medical sensor data [5] (see next section for detail). To ensure or evaluate the data quality of a health-monitoring system, one should take these factors into account. Without knowledge of physiology and practical concerns, however, it is difficult to quantify to what extent each factor will contribute to the data quality. In our research, we plan to exploit the collaboration to hold conversations with health professionals there, refining the above list of risk factors and developing our data-quality risk model so it can answer the following questions: 171

182 What are the high-priority concerns for achieving high-quality medical data? How much does each factor contribute to the data-quality problem? How can we evaluate and possibly improve the data quality? 2.2 Quality control To design a quality-control framework, we first analyzed the health-monitoring system as a sequence of processes, assigned related factors to each process, and then identified possible methods for the quality control of individual factors. Medical sensing begins with sensing the physiology of the patient (Sense process). Each sensor generates sensor data at a certain rate and transmits them to the device through a wireless connection (Transfer process). The monitoring device collects data from sensors, processes them as needed (Collect process), and then forwards them to the provider (Transfer process). Upon receiving the data from the device, the provider s server evaluates the validity of the data (Verify process) and then presents the data to the provider. When it presents the data, the server also presents the level of the data quality to the provider (Assess process). In the following, we discuss our analysis in more detail. (For brevity, we skip the factors that are self-explanatory.) Accuracy: the accuracy of a sensor depends on its design and manufacturer (i.e., sensor profile), the time since the latest calibration, and the age of the sensor. The data quality depends on the accuracy expressed by the expected error bound. Granularity: the quality of sensor data also depends on the level of detail that a sensor can provide. Application: the data quality also depends on correct application of the sensor to the body; if the sensor is not correctly applied to the body, it generates incorrect sensor data. If the patient is responsible for the application, the quality of sensor application depends on the patient s ability and diligence. The patient s ability depends on the education, age, and prior experience. When a sensor is incorrectly applied, the data is likely to deviate from the range of values that are considered reasonable as a physiological value. We call this reasonableness of the medical data soundness. The soundness of data includes physiological soundness and contextual soundness; we explain these in more detail below where we explain the verification process. Synchronization: it is often medically necessary to collect multiple sensor readings of different modalities, and a health professional can derive a medical condition from their combination. For the combination to be useful the sensor readings should be temporally synchronized. If sensors cannot timestamp each data, the device should do so, but it should also make sure that the sensor data is sampled at that moment (i.e., not replayed by an adversary). The data quality depends on the granularity of the synchronization. Information loss by aggregation: communication is costly. To save the amount of information to be sent, the device can aggregate sensor readings before 172

183 sending (e.g., reporting the average per minute). However, every aggregation loses some information in data and the quality of data depends on the amount of information lost by the aggregation. Most factors related to sense, collect, and transfer processes are syntactic (except sensor application); they depend little on the semantics of the medical data. For example, one can protect message integrity without knowing the meaning of the data contained in the message. However, medical data has rich semantics that can determine what data is sound as medical data. The verification process exploits the semantics of the medical sensor data to verify if the data is appropriate, useful, or acceptable for the purpose of health monitoring. Patient authentication: patient authentication verifies whether the sensors are monitoring the right person. Biometric data (e.g., fingerprint) is simple and accurate but its permanence can raise a privacy issue. We can also compare the data with the patient s past data or the medical profile (e.g., disease or weakness) to verify the patient s identity. The data quality depends on the likelihood that we are monitoring the right person. Physiological soundness: a physiological data cannot take arbitrary values. One can check if the value falls in a reasonable range (range check), if it is coherent with the known probability distribution (probability distribution), if its temporal change exhibits a reasonable behavior (auto-correlation), or if sensor values of different modalities accord with the known correlations between them. 1 Contextual soundness: Like physiological soundness, we can verify the data quality by comparing the medical data with some context data such as body movement, location, or temperature. For example, the acceptable values for heart-rate or blood pressure are different when the patient is running or sleeping. When quality verification fails, the quality of incoming data becomes uncertain. Even if all the verifications succeed, there are many opportunities for data to become incorrect (see Figure??). To deal with the uncertainty, the providers need to know how much they can trust the data and what is causing the problem. The assessment process takes all the factors into account and judges the current level of the data quality, and presents that judgment to the provider. Prior work on data integrity in health-monitoring systems focused on detecting packet loss [8], improving false positives using sensor correlation [9], or categorizing the data quality into four discrete states based on observed error and lack of data [10]. Giani et al. [7] proposed a broad range of methods for data validation but only basic concepts were proposed. Compared to prior work, our approach attempts to provide a generic framework for the quality control of a health-monitoring system. 1 Such an anomaly can also signify a medical problem of the patient, and the verification methods can also apply to the problem of anomaly detection. However, such emergency detection is outside the scope of this work. 173

184 DS-Theory has many uses; for example, it was recently used for evaluating the performance of intrusion detection systems (IDS) [11]. While they simply combined the partial judgments that are provided by existing IDS schemes, our work will actually define belief functions for each factor and also explore other possibilities for combining partial results, seeking methods that fit better to health-monitoring applications. 2.3 Architecture So that patients need not carry a dedicated monitoring device, we want to leverage the mobile device they already carry: their cellphone. Mobile phones are increasingly powerful, effectively personal computing devices with substantial computation, storage, and networking capabilities. Furthermore, they are increasingly able to sense location (GPS), motion (accelerometer), light, proximity, temperature, sound (microphone), and video (camera). The use of existing devices has advantages in deployment cost and usability [12]. On the contrary, turning a personal device to a health-monitoring device also has challenges. First, personal devices are diverse in software platform and security mechanism. The developer must adapt to the wide variety of features (and varying degrees of security) on mobile platforms such as Windows Mobile, Mac OS X, and Symbian. Although some future platforms may have strong security support such as a TPM [13, 14], a TPM may not allow the patient to install monitoring software without going through a complicated platformcertification process. To address these challenges and yet still leverage the patient s mobile phone as a platform, we plan to design a novel architecture that decouples the monitoring component from the personal device. Suppose the health provider distributes small health-monitoring units (HMU) to patients and asks them to keep the unit plugged into the device through a common interface such as SD card, miniusb, or SIM card. 2 The HMU can store secret keys and compute some cryptographic functions (as SIM card can do in today s GSM phones). As shown in The unit can authenticate sensors (authenticator) and verify the authenticity of sensor data forwarded by the monitoring software (auditor). When needed, it aggregates sensor data before sending to the provider (fusor). The HMU adds message authentication codes to messages sent to the provider and, without HMU, the device cannot prove authenticity of the sensor data to the provider. The HMU makes the health-monitoring portable from device to device, easy to manage, and hard to compromise; there are many opportunities for adversaries to access the device through software attacks [6], while it requires a hardware attack to compromise the HMU [15]. 2 Although not all current phones have expansion slots, and GSM phones only have one SIM-card interface, we imagine next-generation mobile phones that have a standard expansion slot of similar form factor and capability to these examples. 174

185 References 1. Detmer, D.E.: Building the national health information infrastructure for personal health, health care services, public health, and research. BMC Med. Inform. Decis. Mak. 3 (January 2003) 2. Jurik, A.D., Weaver, A.C.: Remote medical monitoring. Computer 41(4) (2008) Dimmick, S.L., Burgiss, S.G., Robbins, S., Black, D., Jarnagin, B., Anders, M.: Outcomes of an integrated telehealth network demonstration project. Telemedicine Journal and e-health 9(1) (March 2003) Stanford, V.: Pervasive health care applications face tough security challenges. IEEE Pervasive Computing 1(2) (2002) Sriram, J., Shin, M., Kotz, D., Rajan, A., Sastry, M., Yarvis, M.: Challenges in data quality assurance in pervasive health monitoring systems. In Gawrock, D., Reimer, H., Sadeghi, A.R., Vishik, C., eds.: Future of Trust in Computing. Lecture Notes in Computer Science. (July 2009) 6. Ghosh, A.K., Swaminatha, T.M.: Software security and privacy risks in mobile e-commerce. Communications of the ACM 44(2) (2001) Giani, A., Roosta, T., Sastry, S.: Integrity checker for wireless sensor networks in health care applications. In: Proceedings of the Second International Conference on Pervasive Computing Technologies for Healthcare. (2008) O Donoghue, J., Herbert, J., Fensli, R., Dineen, S.: Sensor validation within a pervasive medical environment. In: Proceedings of the IEEE Conference on Sensors. (Oct. 2006) Chen, C.M., Agrawal, H., Cochinwala, M., Rosenbluth, D.: Stream query processing for healthcare bio-sensor applications. In: Proceedings of the 20th International Conference on Data Engineering. (April 2004) Peter, C., Ebert, E., Beikirch, H.: A wearable multi-sensor system for mobile acquisition of emotion-related physiological data. In Tao, J., Tan, T., Picard, R.W., eds.: ACII. Volume 3784 of Lecture Notes in Computer Science., Springer (2005) Thomas, C., Balakrishnan, N.: Mathematical analysis of sensor fusion for intrusion detection systems. In: The First International Conference on Communication Systems and Networks. (January 2009) 12. Mann, W., Helal, S.: Smart phones for the elders: Boosting the intelligence of smart homes. In: Proceedings of the AAAI Workshop Automation as Caregiver: The Role of Intelligent Technology in Elder Care, AAAI Press (2002) : Mobile Phone Work Group, Trusted Computing Group : TCG Mobile Trusted Module Specification, Revision Clavier, C.: Side channel analysis for reverse engineering (SCARE) an improved attack against a secret A3/A8 GSM algorithm. Cryptology eprint Archive, Report 2004/049 (2004) 175

186 Confidence Metric in Critical Systems Minho Shin Myongji University 1 Introduction The advent of portable computing devices and miniature sensing devices presents many new opportunities for personal healthcare. Formerly, most medical sensing devices were used in a hospital setting under the care of trained medical and technical personnel; soon, many devices will be worn throughout a patient s daily life or installed at home and in assisted-living settings. These devices will collect health-related data for many purposes, by patients with chronic medical conditions (such as blood-sugar sensors for diabetics), people seeking to change behavior (e.g., losing weight or quitting smoking), or athletes wishing to monitor their condition and performance. The resulting data may be used directly by the person, or shared with others: with a physician for treatment, with an insurance company for coverage, with the adult children of elderly parents, or by a coach. Such systems have huge potential benefit to the quality of healthcare and quality of life for many people, but there are many opportunities for the sensor data to be tampered or otherwise inaccurate. Outside the hospital setting, in particular, the sensors may be applied by the patient or family members; the data may be gathered through a personal mobile device (such as a mobile phone), over a personal network (such as a wireless network at home), or over the public Internet. Therefore, the accuracy and availability of sensor data is difficult to ensure. To evaluate the trustworthiness of medical data gathered in this manner, we need a holistic system view that inspects contributions to risk and error in medical data as it flows from the patient to the caregiver. To be useful, systems must assure that high-quality information reaches the data user or, at least, the system must be able to express some degree of confidence in the data being presented. Knowing the degree of confidence in data can be beneficial. First, caregivers can make more accurate decisions based on an understanding of confidence in the data. A quantifiable metric for confidence also allows for quality control of the system. A well-designed system can detect an anomaly in data, whether accidental or malicious, and alarm the caregivers for further verification. 1.1 Problem Statement The project aims to answer the following question: how can we assess confidence in sensor data in the context of pervasive health monitoring, and how can we 176

187 present the confidence to user? To answer the question, we propose the following research hypothesis. 1.2 Research Hypothesis and Objectives In this section, we propose our research hypothesis statement, list research questions, and identify the research objective for each research question. Hypothesis. Confidence may be coarsely quantified and derived from a combination of several factors. Our hypothesis is that we can express confidence in sensor data with a quantifiable metric. To test our hypothesis, we raise the following research questions. Q1. What factors can contribute to the confidence in sensor data? Q2. How can we quantitatively measure the overall confidence in sensor data? Q3. How can we effectively present the confidence metric to users? 2 Confidence Metric In pervasive health monitoring, each patient carries sensors and a collection device, such as personal mobile phone. The collection device gathers sensor readings from sensors and reports to the central server through available network connections. We call a set of successive reports coming from the same collection device a sensing stream, but as a snapshot of the sequence of data rather than a whole set of accumulated reports. The confidence in a sensing stream is the belief that the reports, especially recent and incoming ones, contains correct values for the patient s physiology. The confidence metric is a quantitative assessment of the degree of confidence in a sensing stream. The confidence metric should meet the following requirements: The metric provides the best assessment of the system s confidence in data, given knowledge of the system configuration and architecture, training data, and the history of sensor readings. The metric is concise, desirably a single numeric value, but takes into account all the underlying factors that affect the data quality. The metric is time-dependent and represents the temporal change in data quality over time. To measure the confidence in data, the system needs to know the ground truth. In a pervasive health-monitoring, however, system often has no access to the ground truth. Therefore, the system estimates the ground truth with supporting evidences at hand. Such evidences include the known facts about the system, equipments, and the patient; a large database of physiological data for a specific patient or a set of patients; and the (recent) history of the sensing stream. 177

188 Pervasive health monitoring is dynamic. For example, the sensors can be applied differently everyday by the patient. The data can travel through a private corporation network today but through a public Wi-Fi network tomorrow. Even the accuracy of the sensor may decay over time. It is essential for the confidence metric to capture temporal dynamics of the situation, so the changes can be reflected in the metric. Many factors contribute to confidence in data. Since a mishap in any factor can degrade the data quality, the confidence metric should depend on each factor. Furthermore, the metric should reflect the relationship among factors. For example, if two factors are independent to each other, a change in one factor should not change the contribution of the other independent factor. Due to vast differences between sensing streams, in terms of sensor set, architecture, and the unique physiology of a patient, we do not intend to provide an absolute metric that can precisely compare confidence metrics of different sensing streams. Instead, the metric itself provides only a coarse assessment of the confidence in data, while the relative changes over time can be a good judgment about the dynamics of the data quality. 2.1 Definition The confidence metric of a sensing stream is an evidence-based quantification of the truthfulness of the recent and incoming sensor reading in the sensing stream. A sensing stream is truthful if the physiological data represents the ground truth, i.e., the physiological data we expect when we monitor the patient in the laboratory with ideal sensors. Since we cannot know the ground truth, we estimate the probabilistic distribution of the ground truth and then evaluate the truthfulness of the sensing stream against the probabilistic ground truth. We can estimate the probabilistic ground truth based on evidences at hand. Such evidences include known facts (e.g., the system configuration, system architecture, or the patient s medical record), patient s physiological model (e.g., obtained from training phase), or annotated data (e.g., cryptographic footprints or certificates). Even with the ground truth, evaluating truthfulness is not trivial because of the uncertainty involved in sensor data. First, a physiological value itself follows a probabilistic distribution. Second, the accuracy of a sensor is probabilistic. Third, threats on data quality, whether intentional or accidental, can be unpredictable. Therefore, we model confidence metric as a probability that represents our best guess about the truthfulness of the data, given evidences at hand. Definition 1 (Confidence Level). We define confidence metric at time t by the probability that the sensing stream is truthful conditioned by recent observation, or CL t = P [T t = 1 O t = d t ] (1) where T t is a indicator random variable for the sensing stream is truthful at time t, O t is an observation variable, and d t denotes a set of recent sensor readings. 178

189 2.2 Factorization Our approach is divide-and-conquer. We decompose the problem into eleven mutually independent factors, analyze each factor to derive a sub-confidence metric, and combine them to quantify the overall confidence level. In this section, we discuss the decomposition of confidence into factors. Note that the sensing stream is truthful only when it is truthful with respect to all the factors. Formerly, the event {T t = 1} is equivalent to the event {Tt S1 = 1 Tt S2 = 1... Tt A3 = 1}. Because of the independence between factors, we get CL t = CL S1 t CL A3 t (2) Therefore, assessing overall confidence metric boils down to assessing the subconfidence metric of each factor. In the following, we discuss how to derive subconfidence metric. 2.3 Factor analysis model In the following, we discuss three possible models for deriving sub-confidence metrics. Black-and-white confidence model. Some factors may have a bipolar contribution to data quality; either completely keeping the data close to the ground truth or completely falsifying the data. To apply this model, the system should provide a black-and-white verification method, say Verify(m), which outputs 0 (when verification fails) or 1 (when verification succeeds) for the sensor data m. Knowledge-based confidence model. For some factors, deriving confidence can be merely referencing a knowledge database that provides recommended confidence level of given factor. For example, the confidence of a specific sensor model can be looked up from a knowledge database. Building such database will require close analysis, lab experiments, or a reputation system. Functional confidence model. A factor can affect data quality in a finegrained mathematical way so that the data quality can be modeled by a function of measurable inputs such as the observation or known facts. Let us denote the data at time t by d t and a set of known facts by F. A confidence function CF X (d t, F ) is a continuous function that outputs a sub-confidence level of factor X. Bayesian model. When uncertainty plays a major role to the confidence, we can use probabilistic approach. In particular, we can use the well-known Bayesian Theorem to compute the sub-confidence level of factor X to get P t [T X t = 1 O t = d t ] = c P t [O t = d t T X t = 1] P t [T X t = 1] (3) 179

190 where the first probability of the right hand side is called likelihood and the second one called prior probability. The c is a constant to normalize the probability of the left hand side. The likelihood (P t [O t = d t Tt X = 1]) represents the probability of observing a specific data when the sensing stream is truthful. To compute the likelihood value, we need to know the probability distribution of the ground truth. We can estimate the ground truth based on analysis or heuristics or we can learn it from training data. 2.4 Presentation of Confidence Metric As a preliminary discussion for the presentation issue, we set forth the diagnostic use of confidence metric as a starting point. The goal of diagnostic presentation of confidence metric is to help users explore the underlying reasons for a poor confidence. To that end, the system may provide sub-confidence metrics or further analytic information when needed. It is essential to conduct a user study for discovering other presentation issues that the caregivers are concerned about. (Cross-time presentation) To inform the user of temporal change of confidence level and possibly its temporal trend, we consider graphical representation. (Smooth presentation) To minimize the confusion due to high variance in confidence level, we apply smoothing techniques such as moving average on sensor data or on the confidence factors. (Diagnostic presentation) To help users with further action against poor confidence levels, the system may provide individual confidence factor or further analytic information when needed. 3 Gap Analysis Several recent studies have investigated the potential of using physiological signals such as the Electrocardiogram (ECG), Photoplethysmograph (PPG) etc. for biometric identification and verification [2,1,3,5,7,8]. The findings from the studies indicate that the reliability of the mechanism degrades with varied sensing conditions such as activity or stress levels of the sensor-wearer that cause intrasubject variance. Hence these approaches are unsuitable for the desired ongoing verification of patient identity. A related study by Jea et al., [6] employs multiple sensor modalities (heart rate, blood pressure and weight) for biometric identification of users. However we believe that the reliability of such an approach can be further improved using accelerometry and galvanic skin response as additional modalities to rationalize trends in ECG sensor data during varied conditions. A closely related study is the HUMABIO project [4] for unobtrusive multimodal biometric authentication. The project aims to use dynamic physiological user profiles to unobtrusively validate user identity within a secure area after initial authentication. The physiological sensor modalities investigated in pilot studies 180

191 of the project include ECG and EEG. Findings indicate that features extracted from the heart beat shape have high discriminative power. However the HUM- ABIO approach is unreliable during intense activity and other varied conditions due to the assumption of controlled conditions within the secure area. We will investigate multiple modalities of sensor data taken together for a more robust approach. 4 Conclusion In this paper, we present our approach to measuring confidence in sensing data collected through remote monitoring. In future work, we plan to conduct factor analysis and then derive confidence metric that takes account those factors and influences of those factors to the confidence level. 5 Bibliography References 1. F. Agrafioti and D. Hatzinakos. Fusion of ecg sources for human identification. In proceedings of the 3rd International Symposium on Communications, Control and Signal Processing, ISCCSP 2008, pages , March L. Biel, O. Pettersson, L. Philipson, and P. Wide. Ecg analysis: a new approach in human identification. Proceedings of the 16th IEEE Instrumentation and Measurement Technology Conference, 1: , Y.Y. Gu, Y. Zhang, and Y.T. Zhang. A novel biometric approach in human verification by photoplethysmographic signals. In Proceedings of the 4th Annual IEEE Conference on Information Technology Applications in Biomedicine, UK, pages 13 14, April Evangelos Bekiaris Ioannis G. Damousis, Dimitrios Tzovaras. Unobtrusive multimodal biometric authentication: The HUMABIO project concept. EURASIP Journal on Advances in Signal Processing, Steven A. Israel, John M. Irvine, Andrew Cheng, Mark D. Wiederhold, and Brenda K. Wiederhold. Ecg to identify individuals. Pattern Recognition, 38: , January David Jea, Jason Liu, Thomas Schmid, and Mani B Srivastava. Hassle free fitness monitoring. In Proceedings of the 2nd International Workshop on Systems and Networking Support for Healthcare and Assisted Living Environments (HealthNet), Jun K.N. Plataniotis, D. Hatzinakos, and J.K.M. Lee. Ecg biometric recognition without fiducial detection. Biometrics Symposium: Special Session on Research at the Biometric Consortium Conference, 19:1 6, Aug T.W. Shen, W.J. Tompkins, and Y.H. Hu. One-lead ecg for identity verification. Proceedings of the 24th Annual Conference Engineering in Medicine and Biology and the Annual Fall Meeting of the Biomedical Engineering Society, EMBS/BMES Conference, 1:62 63,

192 Survey on Simulation Framework for Intelligent Transportation Systems Hyungsoo Kim 1, Beomseok Nam 2, Minho Shin 3 1 Korea Institute of Construction Technology 2 Ulsan National Institute of Science and Technology 3 Myongji University Abstract. The ongoing efforts to apply advanced technologies to help solve transportation problems include the growing trend of integrating mobile wireless communications into transportation systems. In particular, vehicular ad hoc networks (VANETs) allow vehicles to constitute a decentralized traffic information system on roadways and to share their own information. In this paper, we provide a comprehensive survey on various simulation methods for VANET systems. 1. Introduction To date, numerous VANET simulators have been proposed in the literature, and the quality of VANET simulation has grown over time. Yet, there remain limitations in existing methods and the state of art is far from fulfilling the needs that VANET research community calls for. In this section, we briefly review prior work on VANET simulation and discuss their unique properties as well as the differences from our framework. Based on architecture, we categorize existing VANET simulators into four. (i) Mobility-trace-based VANET simulator uses an existing network simulator with vehicular mobility trace fed into the simulator as a static input. (ii) Mobilityimplemented network simulator extends an existing network simulator by implementing vehicular mobility model within the framework. (iii) Mobility-network integrated simulator combines two existing simulators, one for mobility and the other for network, by implementing inter-simulator interface. (iv) Tightly-coupled VANET simulator is a standalone framework with two components (mobility and network) tightly coupled, and usually developed from the scratch. 2. Traffic Only Simulator In order to simulate various vehicle movements such as car following, lane changing, shock waves and queuing, microscopic transportation simulators were employed in many studies. SUMO is an open source microscopic transportation simulator designed for large road network [9] and MOVE is an extension of SUMO with GUI support [10]. 182

193 The limitation of SUMO and MOVE is that their features do not include any lane change or obstacle mobility models [13]. VanetMobiSim is a widely used mobility trace generator that supports macro-mobility features and micro-mobility features [7]. FreeSim [5] is another customizable macroscopic and microscopic transportation simulator licensed under the GNU. These transportation-oriented tools simulate various traffic circumstances but not the network communications. Corsim [4] and Paramics [17] are well-known traffic simulation tools that implement aforementioned mobility models. 3. Network Only Simulator GloMoSim is a network simulator that currently supports purely wireless network protocols [12]. It was designed similar to the OSI seven layer. QualNet, a commercial version of GloMoSim [19], is very powerful in a sense that it supports a large set of physical and link layer models and parallel architecture which makes it extremely scalable (up to tens of thousands of nodes). OPNET [16] is a commercial network simulator that has a large amount of network elements available, and it enables various network configuration functionalities. Ns-2, an open-source program, is the most widely used network simulator that simulates link-layer and limited physical layer characteristics [15]. However, ns-2 does not scale well (up to a few thousands of nodes) and its quality of simulation in physical layer is insufficient for highly dynamic nature of VANET. SWANS [6] is a network simulator similar to ns-2 or GloMoSim, but it supports larger networks. 4. Mobility-trace-based network-centric simulator Blum et al. [1] used CORSIM [4] and CARLINK [3] used VanetMobiSim to generate a mobility trace, which is fed into ns-2. This method, however, cannot simulate how the inter-vehicle communication can affect driving behavior; for example, it cannot show how vehicles can change their paths or lanes to avoid congestion, based on traffic information collected by VANET. In the following, we call a simulator with such network-to-transportation feedback bidirectional-feedback simulator; otherwise, we call unidirectional. Without bidirectional-feedback support, the simulator is only appropriate for infotainment applications. 5. Mobility-implemented network simulator Instead of using trace for vehicle movement, ASH [8] and Wischhof et al. [23] implemented mobility model directly inside the network simulator. Wischhof et al. [23] implemented a mobility model based on a cellular automation model, but bidirectional-feedback was not implemented. ASH (Application-aware SWANS with Highway mobility) takes the same approach. ASH implements IDM (Intelligent Driver Model) with car following and MOBIL lane changing model, inside SWANS network simulator. ASH supports bidirectional feedback. 183

194 6. Mobility-network integrated simulator Wu [24] combined QualNet with CORSIM by implementing an interface called CQCL. Authors focused on optimizing communication overhead between simulators, while supporting only unidirectional feedback. TraNS [18] is integrates transportation simulator SUMO [8] with ns-2 via an interface called TraCI. TraNS translates mobility commands of ns-2 into primitive driving directions, then are sent to SUMO. TraNS provides bidirectional feedback between ns-2 and SUMO. SWANS++ [22] is a tightly integrated simulator that added mobility model STRAW (STreet RAndom Waypoint) [21] into SWAN. STRAW is a random waypoint model over streets and supports no lane changing. SWANS++ only supports unidirectional-feedback. Veins (Vehicles in Network Simulation) [20] is another tightly coupled simulator that integrates a transportation simulator SUMO with a network simulator OMNet++ through a TCP connection. 7. Monolithic VANET simulator NCTUns [14] is a standalone VANET simulator that integrates transportation simulation capabilities with network simulation capabilities. The network protocol of NCTUns is integrated with the Linux kernel protocol stacks, making any Linux network application compatible with NCTUns. GrooveNet [11] is a hybrid VANET simulator that allows communication between simulated vehicles and real vehicles on the road. GrooveNet s modular architecture incorporates mobility, trip, and message broadcast models over a variety of link and physical layer communication models. GrooveNet and NCTUns are becoming widely accepted as VANET simulation tools, but they still need further improvement and extension to meet various needs of transportation research [13]. REFERENCES [1] Blum, J., Eskandarian, A., and Hoffman, L. (2004). Challenges of intervehicle ad hoc networks. IEEE Transactions on ITS, 5(4), pp [2] Bononi, L., Felice, M. D., D Angelo, G., Bracuto, M., and Donatiello, L. (2008). MoVES: A framework for parallel and distributed simulation of wireless vehicular ad hoc networks. Computer Networks, Vol. 52, No. 1, Elsevier, pp [3] CARLINK::UMA (2008). Simulation and evaluation of realistic MEU ad-hoc communications in CARLINK scenarios by using VanetMobiSim/Ns-2. Technical report, D 1.3.9, Spain. [4] CORSIM: Microscopic Traffic Simulation Model. [5] FreeSim, Available at: [6] Barr, R., Haas, Z.J., and Renesse, R., JiST: Embedding Simulation Time into a Virtual Machine, Proc. EuroSim Congress on Modelling and Simulation,

195 [7] Haerri J, Fiore M, Fethi F, Bonnet C. VanetMobiSim: generating realistic mobility patterns for VANETs. Institut Eurécom and Politecnico Di Torino, Available at: [8] Ibrahim, K. and Weigle, M.C., ASH: Application-aware SWANS with Highway mobility, in Proceedings of IEEE INFOCOM Workshop on MObile Networking for Vehicular Environments (MOVE), Apr [9] Krajzewicz D, Rossel C. Simulation of Urban Mobility (SUMO). German Aerospace Centre, Available at: [10] Lan, K-C and Chou, C-M, Realistic mobility models for Vehicular Ad hoc Network (VANET) simulations, 8th International Conference on ITS Telecommunications, , 2008 [11] R. Mangharam, D. S. Weller, R. Rajkumar, P. Mudalige and Fan Bai, GrooveNet: A Hybrid Simulator for Vehicle-to-Vehicle Networks, Second International Workshop on Vehicle-to-Vehicle Communications (V2VCOM), San Jose, USA. July 2006 [12] Martin J. GloMoSim. Global mobile information systems simulation library. UCLA Parallel Computing Laboratory, Available at: [13] Martinez, F, Toh, C.K., Cano, J-C, Calafate, C.T., Manzoni, P., A survey and comparative study of simulators for vehicular ad hoc networks (VANETs), Wireless Communications and Mobile Computing, DOI: /wcm.859, [14] NCTUns 5.0, Available at: [15] Network Simulator - ns-2, [16] OPNET Technologies, Inc., [17] Paramics Homepage. Quadstone Paramics, UK. [18] Piorkowski M, Raya M, Lugo AL, Papadimitratos P, GrossglauserM, Hubaux J-P. TraNS (Traffic and Network Simulation Environment). Ecole Polytechnique Fédérale de Lausanne, EPFL, Switzerland, Available at: [19] QualNet Homepage. Scalable Network Technologies, California. [20] Sommer, C., German, R. and Dressler, F., "Bidirectionally Coupled Network and Road Traffic Simulation for Improved IVC Analysis," IEEE Transactions on Mobile Computing, vol. 10 (1), pp. 3-15, January [21] STRAW - STreet RAndom Waypoint - vehiclar mobility model for network simulations (e.g., car networks), Available at: [22] SWANS++ Homepage. Aqua Lab, [23] Wischhof, L., Ebner, A., and Rohling, H. (2005). Information dissemination in self-organizing intervehicle networks. IEEE Transactions on ITS, 6(1), pp [24] Wu, H. (2005). Analysis and design of vehiclualr networks. Ph. D. dissertation, Georgia Institute of Technology, Atlanta, US. 185

196 Current State of Capability-based System of Systems Engineering in Korea Ministry of National Defense Jae-Hong Ahn 1, Yeunseung Ryu 2 and Doo-Kwon Baik 3 1 Agency for Defense Development, Korea koreaseman@daum.net 2 Department of Computer Engineering, Myungji University, Korea ysryu@mju.ac.kr 3 Dept. of Computer Science & Engineering, Korea University, Korea baikdk@korea.ac.kr(corresponding Author) Abstract. This paper surveys the concept of Capability Based Assessmment (CBA), System of Systems (SoS) and Enterprise Architecture (EA). Also, we present EA based weapon SoS capabiltity assessment approach in a conceptual way. Then we suggest some technical issues in order to further develop EA based SoS capability assessment methods. Keywords: Capability Based Assessment; System of Systems; Enterprise Architecture; 1 Introduction Capability Based Assessement (CBA) is an analysis process to validate the requirements of joint warfighting. The CBA assesses operational risks associated with the gaps. The CBA is used to provide decision makers with information about capability gaps between future objectives and current capabilities, alternatives of material and/or non-material to resolve the gaps, and probabilities to succeed a given operation mission. System of Systems (SoS) is an emerging research field. Although the concept of SoS has been around some times, the concept has not been fixed completly. Recently, US DoD defined the concept of SoS as an aggregation of independent systems to achieve some objectives, and published System Engineering Guide for SoS to address SoS engineering considerations. An aggregated weapon systems with interoperability to perform military operations mission is a typical SoS. The capabilities of weapon SoS should be assessed for the military operations and requirements generated from capability gaps. Performance is one element of a weapon SoS capabilities. And then weapon systems have their architecture including capability data such as performace, exchangable data and external interface profiles. Therfore, EA based weapon SoS capability assessment can perform more quantitative CBA by comparing future SoS required system performance from legacy system architecture data. This paper suggests EA based weapon SoS capabilities assessment approach in a conceptual way and introduces some technical issues for the future EA based SoS assessment methods. 186

197 2 Related Works 2.1 Capability Based Assessment The US Joint Capabilities Integration and Development System (JCIDS) is a process to validate the requirement of joint warfighting. The primary objective of the JCIDS process is to ensure that the capabilities required by the joint warfighter are identified with their associated operational performance criteria in order to successfully execute the missions assigned [1]. Capability Based Assessment (CBA) is the first step among 4 steps, the other 3 steps compose of Approval of Initial Capabilities Document (ICD) and Courses of Action, Approval of Capability Development Document (CDD) and Approval of Capabilities Production Document (CDP). The CBA identifies: the capabilities and operational performance criteria required to successfully execute missions; the shortfalls in existing weapon systems to deliver those capabilities and the associated operational risks; the possible non-materiel approaches for mitigating or eliminating the shortfall, and when appropriate recommends pursuing a materiel solution. Current US DoD thrust is to use the CBA to both identify gaps and help advise which particular gaps require action, and not attempt to dictate detailed solutions, otherwise, there would be no way to give recommendations on what to do [2]. The CBA analytical process contains The Study Definition Phase; The Needs Assessment Phase; The Solutions Recommendations Phase and the Opportunity-Based CBA. At these phase, capability analyst can answer several key questions: Define the mission Identify capabilities required Determine the attributes/standards of the capabilities Identify gaps Assess operational risk associated with the gaps Prioritize the gaps; identify and assess potential non-materiel solutions Provide recommendations for addressing the gaps Korea the Joint Chiefs of staff also published the CBA guidebook which is considering the jointness from Requirement Planning for top down requirement generation based future capability. They have defined Joint Capability Area and Joint Task List for the standard taxonomy. And they have applied M&S, experiment, Multiple Logistic Regression(MLR) to analyze capabilities. However, the main manner is the qualitative way that is decided by Subject Matter Experts (SME). If quantitative manners like weapon system performance are added to present ways, more scientific capability analysis could be achieved. 2.2 System of Systems A System of Systems is a set or arrangement of systems that results when independent and useful systems are integrated into a larger system that delivers unique capabilities [3]. 187

198 Fig. 1. System of Systems According to [4], there are four types of SoS. Virtual : Virtual SoS lack both a central management authority and centrally agree upon purposes. Large behavior emerges, and may be desirable, but the super system must rely upon relatively invisible mechanisms to maintain it. Collaborative : Collaborative SoS are distinct form directed systems in that the central management system organization does not have coercive power to run the system. The constituent systems must, more or less, voluntarily collaborate to fulfill the agreed upon central purpose. The internet is a collaborative system. Acknowledged : Acknowledged SoS have recognized objectives, a designated manager, and resources for the SoS; however, the constituent systems retain their independent ownership, objectives, funding, and development and sustainment approaches. Changes in the systems are based on collaboration between the SoS and the system. Directed : Directed SoS are those in which the integrated system-of-systems is built and managed to fulfill specific purposes. It is centrally managed during long-term operation to continue to fulfill those purposes, and any new ones the system owners may wish to address. Another SoS reference is an advanced transportation system. There is an enduring global need for the efficient transportation of people and goods across land. Worldwide trends such as increasing populations, urbanization, economic uncertainty, climate change, and susceptibility to loss from man-made and natural events are placing stress on existing land-based transportation systems and capabilities. At the same time, the world is moving and functioning faster through the use of the internet, cell phones, computers, and the telecommunications networks that enable the transmission of these digital signals. All of this has placed new challenges, opportunities, and complexity on our capacity to transport ourselves and sustain economic development. Land transportation in the 21st century must address such challenges, take advantage of the opportunities, and reduce or contain the complexity. To achieve these results will require the application of a system of systems approach that unifies existing, diverse transportation modes and systems into a functioning whole, optimizes their operations, and enables future capability growth to respond to national, regional, and local needs [5]. Standards are important, defining elements of the interfaces for SoS. Standards must also be managed over the life of the system to ensure they continue to be 188

199 enablers for system interoperability and performance. Standards that are locally imposed will be the easiest to manage over time whereas more broadly used standards will require existing or new standards-governing bodies to add, modify, or delete standards to address evolving requirements and technologies [5]. This reference shows applying SoS Approach of US Smart Grid case too. Above two SoS references emphasize SoS architecture include data items - constituent system, system function, system relationship, data flow and communication, technical standards. 2.3 Enterprise Architecture Architecture is the structure of components, their interrelationships, and the principles and guidelines governing their design and evolution over time. Enterprise Architecture (EA) is a strategic information asset base, which defines the mission, the information necessary to perform the mission and the technologies necessary to perform the mission, and the transitional processes for implementing new technologies in response to the changing mission needs [6]. The DoDAF Meta-model provides information needed to collect, organize, and store data in an easily understandable way, and the presentation description of various types of views provide the guidance on how to develop graphical representations of that data that will be useful in defining acquisition requirements under the DOD Instruction 5000-series [3]. EA uses to improve the business effectiveness and efficiency and to control planning and investment [7] and to improve business performance and productivity [8] EA encompasses system performance, interface, technical interface data, which can be used in assessing capabilities of weapon SoS performing military operations that is the most important business in military. In Korea MND, there is Enterprise Architecture Management Process such as Fig. 2. They have built EA for information systems interoperability by MND-AF ver 1.2 which is similar to DoDAF ver 1.0. The MND EA composes of high level information system EA and information systems architectures. The MND EA is stored and managed in Architecture Repository which is designed specified in MND-AF Meta Data Model. And the stored architecture data is reused during other information systems development. Fig. 2. Current status of MND EA management process 189

200 MND architecture management system in Fig.2 has some limitations for CBA. MND-AF ver 1.2 doesn t encompass capability concept while it is including in the latest DoDAF or UK MoDAF. So the architects are difficult to describe their architecture with capability concepts. And the authority of the Architecture Repository should have a tool to validate the quality of the submitted architecture when registering it in Repository System. It is difficult to review the consistency check of the architectural products relationships by only manual means. Also stored architecture data is scarcely used for decision making processes because of lack of relevant policies and technical supports. 3 EA based SoS Capability Assessment Approach and related Technical Issues Some weapon systems are composed of SoS to achieve a mission objectives. Required capabilities of this SoS are needed to be assessed for deciding whether it can achieve the mission or not by CBA. EA can present weapon system performance, interface, and technical standard profile data for assessing SoS performance capabilities. This chapter suggests an approach to assess weapon SoS capability based on EA, and lists down some technical issues to use CBA. Fig. 3. EA based CBA approach Fig.3 shows EA based CBA approach in conceptual model level. ToBe SoS Architecture Requirements are specified with the tasks, their processes, performance criteria, and information exchange requirements among them. SoS Architecture Tool can extract and assess Constituent Systems proper to the SoS task requirements performance among legacy systems in EA. With this approach that considers the object of CBA to SoS, CBA authority can more easily describe ToBe SoS architecture using EA and could quantitatively understand the gap of future capability from current weapon systems. For the purpose of such EA based CBA, some issues are : Defining Standard Taxonomies for architecting Need to standardize terminologies which are used in architecting For example, Capability, System, Task, Performance, Data, System function, Technical standard, etc 190

201 If standardization is too difficult for architects to comply with the standards and think to prevent expression freedom, ontology may be considered to an alternative. It is an advanced technology for computer machine to be able to understand the terminology meaning, when systems use different terms even though same meaning. Developing model for quantitatively measurable capability assessment Need to develop assessment models for quantitative element in capabilities. For example, assessment model using weapon system range, whether exchange data between systems or not etc. Managing Syntactically and Semantically Qualified EA data Need to validate whether architecture description include all data for capability assessment Need to assess architecture description comply with Standard Taxonomy and Architecture Meta Data syntactically and semantically 4 Conclusion In this paper, we presented the concept of Capability Based Assessmment (CBA), System of Systems (SoS) and Enterprise Architecture (EA). Also, we presented EA based weapon SoS capability assessment approach in a conceptual way. Then we suggested some technical issues in order to further develop EA based SoS capability assessment methods. We expect to be able to improve quantitative CBA technology and SoS engineering, and expand EA data usage to variety areas. References 1. CJCSI G Joint Capabilities integration and development system, URL: 2. Capabilities-Based Assessment(CBA) User s Guide version 3, URL: 3. DoD Architecture Framework Version 2.02, URL: 4. Systems Engineering Guide for System of Systems ver 1.0,URL: 5. James M. Parker Applying a System of Systems Approach for Improved Transportation, S.A.P.I.EN.S Online since 09 septembre URL : 6 A Practical Guide to Federal Enterprise Architecture, URL: 6. Global Information Grid(GIG) Architecture Federation Strategy Version 1.2, Director, URL: 7. FEA Practice Guidance Federal Enterprise Architecture Program Management Office OMB, 2007, 8. Volkswagen of America : Managing IT Priorities, Harvard Business Review, October 5, 2005, Robert D. Austin, Warren Ritchie, Greggory Garrett 191

202 Distributed and Self-adaptive Cluster-head Selection Algorithm for Hierarchical Wireless Sensor Networks Sai Ji 1,2,*, Liping Huang 1, Chang Tan 1, Jin Wang 1 1 Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, 219# Ningliu Road, Nanjing, China, The Aeronautic Key Laboratory for Smart Materials and Structures, Nanjing University of Aeronautics and Astronautics, 29# Yu Dao Street, Nanjing, China, jisai@nuist.edu.cn, {hlpwhy, passerby.tan}@gmail.com, wangjin@oslab.khu.ac.kr Abstract. In the hierarchical wireless sensor network (WSN), selecting cluster head (CH) is important issue to increase the network energy efficiency, scalability and lifetime. For the sake of balancing energy expenditure of sensor nodes and improving the performance of routing, We propose a distributed and self-adaptive cluster-head selection algorithm. Based on the hierarchical agglomerative clustering (HAC) method, the algorithm uses the qualitative connectivity data as input data, and tailor simple numerical methods to generate a cluster tree. From such clustering sequence, the CH and the backup CH can be quickly selected without extra message exchanges. Simulation results demonstrate that this method is effective and self-adaptive, which can enhance network self-control capability and resource efficiency, and prolong the whole network lifetime. Keywords: Wireless sensor networks; cluster head selection; hierarchical agglomerative clustering; backup cluster head 1 Introduction In the last few years there has been a growing interest in small, low-power hardware platforms that integrate sensing, processing data and wireless communication capabilities. These devices are called sensor nodes and are grouped to form a Wireless Sensor Network (WSN). A WSN has application in environmental monitoring, infrastructure management, transportation and many others [1, 2]. The hierarchical network architecture of WSN shows its advantages on sharing limited wireless channel bandwidth, balancing node energy consumption, enhancing management, and so on[3]. The hierarchical routing protocols can be classified into two categories: random-selected-ch protocol and well-selected-ch protocol. The representative random-selected-ch protocols are: LEACH [4] and HEED[5]. LEACH-C [6] and AHP [7] are well-known well-selected-ch protocols. The random-selected-ch protocols have two main disadvantages. Firstly, the randomly picked CH may have a higher communication overhead. Secondly, the periodic CH rotation or election which needs extra energy to rebuild clusters. To avoid 192

203 the problem of random CH selection, the approach of well-selected-ch has considered three factors: energy, mobility, and the better cluster quality. However, they usually have a more complex scheme and higher overhead to optimize the CH selection and cluster formation. In this paper, we propose a distributed HAC (DHAC) routing algorithm for wireless sensor networks. One of the most commonly accepted method, UPGMA, is used to make clustering decision in this paper. The qualitative one-hop connectivity information is adopted as input data, which can be easily obtained through message transmission with low or no extra communication cost. Simulations have validated its effectiveness. 2 Distributed hierarchical agglomerative clustering 2.1 DHAC Inrtroduction and Notations Definitions Hierarchical agglomerative clustering (HAC)[8] is a conceptually and mathematically simple clustering approach which uses four clustering methods, CLINK, SLINK, UPGMA, and WPGMA. Recently, the most research does focus on the clustering technique analysis and comparison. All of these methods comprise three common key steps: obtain the data set, build the similarity matrix, and execute the clustering algorithm. Based on the concept of HAC, we propose a DHAC method for distributed environments by improving the HAC algorithms. The main idea behind DHAC is that a node only needs one-hop neighbor knowledge to build clusters. To apply the DHAC algorithm in WSNs, we present a bottom-up clustering approach by simple six steps. Firstly, the qualitative connectivity data is obtained as input data set for DHAC. Secondly, the similarity matrix is built. Thirdly, the similar nodes are grouped together by executing the distributed clustering algorithm. The last three steps are cutting the cluster tree with the threshold, merging the smaller cluster, and electing the CHs. The process of all steps is illustrated in the following sections. Figure 2 and Figure 3 illustrate the pseudo code of the DHAC implementation for WSNs. Table 1 summarizes the notations we will use in our discussion. Table 1. Summary of notations Symbol Simi_Matrix Node_Id Ch_Id Min_Coeff Min_Coeff_Id T C size Min_Cluster_ Size Definition Similarity Matrix Node Id Cluster Head(CH) Id The minimum coefficient in the Similarity Matrix the cluster(ch_min) Id corresponding to Min_Coeff The threshold of Min_Coeff The number of cluster member in a cluster The threshold of minimum cluster size 193

204 2.2 Input Data Set In this paper, DHAC can use simple qualitative connectivity information of a network or quantitative data through received signal strength or GPS. The quantitative could be the location of each node, the nodes' residual energy, or other features. Either qualitative data or quantitative data is the properties of the sensor node, and the nodes with similar properties can be crusted together. For simplicity and without loss of generality, we use the qualitative connectivity information as the input data set for DHAC. 1. procedure obain_local_input_data () 2. Send HELLO, Node_Id to 1-hop neighbors; 3. if (ishelloreceived==false) 4. Keep listening to neighbors; 5. else 7. Build local data with (sender s Node_id, 1) 8. endif 9. end procedure 10. procedure simi_matrix () 11. Ch_Id=Node_id; 12. send AskLocalData and its local data to direct connected neighbors; 13. if (isasklocaldatareceived==false) 14. Keep listening to neighbors; 15. else 16. Obtain sender s data; 17. Establish Simi_Matrix via Dice coefficient; 18. endif 19. end procedure Fig. 1. A simple 8-node network. Fig. 2. Pseudo code of distributed HAC (a). 2.3 Build the Similarity Matrix To set up the local similarity matrix, in figure 2, lines (11-16), each node elects itself as a CH(cluster head) and send AskLocalData message to its direct connected neighbors. Then node keeps listening until accepted the senders' local qualitative connectivity data. After obtained the qualitative input data, the Similarity matrix could be built (Figure 2, line 17), and there are three typical methods [9] to calculate the similarity coefficient for qualitative data: Dice (Sorenson's) coefficient, Jaccard coefficient and Simple Matching coefficient. The Dice coefficient between node {a} and {b} can be 2C formulated as Sa, b = 1, Where C is number of positive matches between Na + Nb nodes {a} and {b} in input data. N a is total number of 1 value filled in the node {a} s local table that are directly connected. The calculation principle of N b is similar to that used inn a. 194

205 2.4 Executing the Distributed Clustering Algorithm After building the similarity matrix. Each node takes itself as the cluster head(ch) and obtains its own local resemblance matrix, from which its minimum coefficient(min_coeff) can be easily found. In table 1, we name the ID of CH as Ch_Id, and define the cluster(ch_min) Id corresponding to Min_Coeff as Min_Coeff_Id. If Ch_Id is smaller than Min_Coeff_Id, then CH sends AsktoMerge message to its CH_Min for merging themselves together, otherwise CH does nothing and just waiting. In figure 3, lines 3-11 show the process of sending message. 1. procedure execute_dhac () 2. do { /*---- Distributed Sending Message----*/ 3. if (Ch_Id== Node_Id) 4. Find minimum coefficient in simi_matrix to assign to variable Min_Coeff; 5. Set Node_id with Min_Coeff to CH_Min; 6. if (Ch_Id< Min_Coeff_Id) 7. Send AsktoMerge message to CH_Min; 8. else 9. CH keeps waiting; 10. endif 11. endif /*---- Distributed Receiving Message----*/ 12. if (isasktomergereceived==true) 13. if (Min_Coeff_Id ==sender s CH_Id) 14. Faceback CONFIRM message; 15. CH_ID=sender s CH_Id; 16. else 17. Faceback DENY message; 18. endif 19. elseif (isdenyreceived==true ) 20. CH stops sending AsktoMerge message until Simi_Matrix refreshed; 21. elseif (isconfirmreceived==true) 22. do merge_clusters(); 23. elseif(isrefreshreceived==true) Update the Simi_Matrix; 24. endif } 25. while(min_coeff<t) /*--- Control the minimum cluster size--*/ 26. caculate the cluster s member number to C size; 27. if (C size<min_cluster_size) 28. do merge_clusters(); 29. endif 30. end procedure 31. procedure merge_clusters() 32. merge two clusters; 33. blend local information of two clusters; 34. update Simi_Matrix by using UPGMA method 35. broadcast REFRESH message; 36. end procedure Fig. 3. Pseudo code of distributed HAC (b). In distributed clustering algorithm, another role action of node is receiving message(figure 3, lines 12-18). When a cluster head(ch) receives a ASKtoMerge message, it compares the sender s cluster head id with its Min_Coeff_Id. If they are just the same, then the CH facebacks a message to the source node to confirm the merging condition and elects the source to be the new CH. Otherwise, the CH sends back a DENY message. When a cluster receives the CONFIRM message from another cluster, it goes to merge the two clusters into a new cluter (figure 3, lines 31-36). At the same time, local similarity matrix and neigbor list are combined together, and the similarity matrices of two clusters are updated through the chosen HAC algorithm. CLINK, SLINK, UPGMA, and WPGMA are four main types of the HAC algorithm methods. Among them, un-weighted pair-group Method (UPGMA) is the most commonly adopted clustering method. This defines the similarity measure between two clusters as the arithmetic average of resemblance coefficients among all 195

206 pair entities in the two cluster. After a new cluster is formed, the CH broadcasts a REFRESH message to notify its neighbors to update their similarity matrices. Clusters update their own similarity matrix after receiving this REFRESH message, which contains the new cluster information and the merged neighbor list. Once a CH receives a DENY message from its CH_Min, the CH stops sending the AsktoMerge Message until its similarity matrix has been uptated. Since the qualitative connectivity data are very simple, and a few of Min_Coeff in initial local similarity matrix is usually very small. A do-while loop is used to ensure clustering process to be executed at least once. This process will repeat until the while condition (line 25). 2.5 The Last Three Steps of DHAC The last three steps are cutting the cluster tree with the threshold, merging the smaller cluster, and electing the CHs. After generating a cluster tree, a pre-configured threshold T (figure 3, lines 2-25) is used in do-while loop to controls the upper bound size of clusters. The predefined threshold can be transmission radius, number of clusters, or cluster density. If the cluster size is less than a pre-defined threshold, Min_Cluster_Size, merge the cluster with its closest cluster (figure 3, lines 26-29). To select the appropriate CHs in clustering tree or clustering sequence, DHAC simple chosse the nodes which satisfy two conditions: (1) the node is one of two nodes which are merged into the cluster at the first step. (2) the node with the lower ID. Another node which has the higher ID becomes the backup CH. 3 Simulation and performance evaluation In this section, we evaluate the performance of DHAC algorithm implemented in Ns-2 simulator. For increasing comparability, most parameters are similar to [10]. Each node is equipped with an omni-directional antenna. Computer simulation is carried out in a sensor network, where 400 sensor nodes are deployed randomly in a rectangular region of size units. The sink node is located at the center of netwok. Next, we will present the performance comparison among the proposed DHAC and LEACH protocols. We use two metrics to analyze and compare our simulation results for clustering and energy saving: network lifetime and cluster energy dissipation. Here we use node death rate versus the rounds of clustering to represent network lifetime. As figure 4 and figure 5 shown, the performance of DHAC is much better than LEACH. In figure 4, The clustering dissipated energy of DHAC is about 4 times less than that of LEACH which predicates that DHAC can achieve much high reliability and efficiency in energy consumption at 300 rounds timing. In figure 5, LEACH has the shortest network: the number of clustering rounds is 300 and only 50% of the nodes are alive. Compared to LEACH, DHAC prolongs it by 25%. 196

207 Clustering dissipated energy (J)/node LEACH DHAC Clustering dissipated energy at 300 rounds timing Fig. 4. Clustering dissipated energy at 300 rounds timing. Node death rate LEACH DHAC Number of clustering rounds Fig. 5. Node death rate versus the rounds of clustering. 4 Conclusions In this paper, we have proposed a distributed approach, DHAC, to classify sensor nodes into appropriate groups in stead of simply gathering nodes to some randomly selected CHs. We demonstrated the application and evaluation method, UPGMA, with qualitative data. Simulation results demonstrate that this method is effective and selfadaptive, which can enhance network self-control capability and resource efficiency, and prolong the whole network lifetime. In our future work, we will evaluate the cluster quality with different HAC methods, SLINK, CLINK, and WPGMA. Acknowledgements. This work was supported by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(Grant No. 11KJB ) and the PAPD. References [1] T. Nagayama, M. Ushita, Y. Fujino, Suspension Bridge Vibration Measurement Using Multihop Wireless Sensor Networks. Procedia Engineering, 2011, 14: [2] He Shibo, Chen Jiming, Sun Youxian. Coverage and Connectivity in Duty-Cycled Wireless Sensor Networks for Event Monitoring. IEEE Transactions on Parallel and Distributed Systems, 2012, 23(3): [3] J.N. Al-Karaki, A.E. Kamal, Routing techniques in wireless sensor networks: a survey, IEEE Wireless Commun. 11 (6) (2004) [4] W. Heinzelman, A. Chandrakasan,H. Balakrishnan, Energy-efficient communication protocol for wireless microsensor networks, in: Proceedings of the 33rd Hawaii International Conference on System Sciences (HICSS 00), January [5] O. Younis and S. Fahmy "Distributed Clustering in Ad-hoc Sensor Networks: A Hybrid, Energy-Efficient Approach", Proc. of IEEE INFOCOM, March 2004 pp [6] W.B. Heinzelman, A. Chandrakasan, H. Balakrishnan, An application specific protocol architecture for wireless microsensor networks, IEEE Transactions on Wireless Communications 1 (4) (2002)

208 [7] Y. Yin, J. Shi, Y. Li, P. Zhang, Cluster head selection using analytical hierarchy process for wireless sensor networks,in:proceedings of IEEE 17th International Symposium PIMRC,Helsinki,Finland,2006,pp.1 5. [8] M.R. Anderberg, Cluster Analysis for Applications, Academic Press Inc., New York, [9] O. Younis and Sonia Fahmy, Distributed Clustering in Ad-hoc Sensor Networks: A Hybrid, Energy-Efficient Approach, in Proceeding of ACM MobiCom 2003, San Diego, California USA, Sep.14-19,

209 An architecture description method for Acknowledged System of Systems based on Federated Architecture Jae-Hong Ahn 1, Yeunseung Ryu 2 and Doo-Kwon Baik 3 1 Agency for Defense Development, Korea koreaseman@daum.net 2 Department of Computer Engineering, Myungji University, Korea ysryu@mju.ac.kr 3 Dept. of Computer Science & Engineering, Korea University, Korea baikdk@korea.ac.kr(corresponding Author) Abstract. Recently, System of Systems (SoS) approach has been emerged as a solution to achieve a systemwide goal in a large organization by dynamically building a large system with existing constituent systems. In this paper, we present the process of acknowledged SoS architecture description, the essential metadata acknowledged SoS architecture with assessment characters for performance and interoperability among SoS constituent systems. Keywords: System of Systems; Enterprise Architecture; Interoperability 1 Introduction As tasks of large organizations in various domains, such as government, transportation and military, become more complex, there have been a lot of solutions that gather several existing systems with interoperability to accomplish their objectives rather than one single system. For example, Missile Defense (MD) system in the military consists of several systems such as sensors, C4I (Computer, Command, Control, Communication and Intelligence) systems and shooters to achieve a common objective that destroys attack missiles in right time and right place. To do so, it is necessary for the military to efficiently design system architecture using information assets such that all constituent systems (CS) work in an integrated and collaborative way. The US Department of Defense Architecture Framework (DoDAF) says Joint Capabilities Integration Development System (JCIDS) defines a collaborative process that utilizes joint concepts and integrated architectural descriptions to identify prioritized capability gaps [1]. But, it is difficult to find the research references on the usage of architecture. Recently, System of Systems (SoS) approach has been emerged as a solution to achieve a systemwide goal in a large organization by dynamically building a large system with existing constituent systems. According to the US DoD, SoS is defined as a set or arrangement of systems that results when independent and useful systems are integrated into a larger system that delivers unique capabilities [2, 3]. There are several types in SoS architectures [3]. Among them, acknowledged SoS have recognized objectives, a designated manager, and resources for the SoS. However, the constituent systems retain their independent ownership, objectives, funding, and 199

210 development and sustainment approaches. In a SoS, it is important to identify the critical set of systems that affect the SoS capability objectives and understand their interrelationships. However, there have been few studies to efficiently build an acknowledged SoS. The goal of this work is to study how to provide the SoS capability analyst, acquisition planner and SoS architect with architecture description approach that concretely and systematically identifies capability gaps for the acknowledged SoS based on Federated Architecture (FA). In this paper, we present the process of acknowledged SoS architecture description, the essential meta data acknowledged SoS architecture with assessment characters for performance and interoperability among SoS constituent systems. 2 Related Works A System of Systems is a set or arrangement of systems that results when independent and useful systems are integrated into a larger system that delivers unique capabilities [2]. According to [3], Acknowledged SoS have recognized objectives, a designated manager, and resources for the SoS; however, the constituent systems retain their independent ownership, objectives, funding, and development and sustainment approaches. Changes in the systems are based on collaboration between the SoS and the system. With the evolution of the understanding of operational capabilities in the US DoD, there is increasing attention focused on the challenges of engineering independently useful systems to work together to meet user needs. As the DoD increases focus on capabilities without changing its system-oriented organization, the number of acknowledged SoS is increasing. User capabilities call for sets of systems working together toward the capability objectives. In many cases, the DoD is choosing to leverage existing systems to support these capabilities [3]. The acquisition of systems of systems is somewhat a misnomer since most of the functionality in SoS is already available in fielded systems (which have already been acquired ) or in systems which are themselves in acquisition. The SoS manager and systems engineer work with the owners of the constituent systems to evolve these systems to meet capability needs of the SoS. The current DoD acquisition system is designed for the creation or upgrade of individual systems and the major acquisition milestones and processes are not well matched to the cyclic nature of SoS evolution. In a number of cases, when the investment needed for the SoS is large, an acquisition program has been formed to address these SoS needs, but typically the acquisition program focuses on the new components or major system upgrades needed for the SoS rather than the SoS as a composite enterprise [4]. An Enterprise Architecture (EA) is a high-level architecture or meta-architecture that comprises an organization s information technology systems (hardware and software), their relationships, and the related processes, functions, groups and people. From a functional perspective, an EA explains how all the IT elements work together as a whole along with the groups and the people of the organization [5]. Federated Architecture (FA) is a pattern which describes an approach to enterprise architecture that allows interoperability and information sharing between semi- 200

211 autonomous de-centrally organized lines of business, information technology systems and applications. It provides an approach for aligning, locating, and linking disparate architectures and architecture information via information exchange standards to deliver a seamless outward appearance to users. The US DoD federated GIG (Global Information Grid) Architecture will be based on the semantic alignment of tier level architecture elements with elements of federation high-level taxonomies. Semantic alignment refers to the relationship specified between the meanings of taxonomy elements. The semantic relationships specified between activities will typically include is equivalent to, is part of, or is similar to. [6]. 3 SoS Architecture Description Approach In this section, we propose a process and a meta data model for the acknowledged SoS architecture description approach based FA. The proposed method concretely and systematically identifies the most proper SoS CSs and capability gaps for the acknowledged SoS with current existing legacy CSs. Legacy system architecture data is stored in Federated Architecture Repository and the legacy system architecture data are aligned with federation high-level taxonomies. This approach can specify SoS mission objectives requirements and find proper CSs to satisfy these requirements. The goal of this work is to provide an architecture description approach to SoS capability analysts, acquisition planners and SoS architects. Fig. 1. Process of SoS architecture Description Proposed description process of the acknowledged SoS architecture is performed by five phases. In the first phase, it specifies SoS requirements that consist of the process of SoS Tasks and their required Performance, and Information exchange requirements among tasks. The requirement specification is stored to SoS Architecture Repository, as shown in Fig.1. In the second phase, proposed method extracts the CS candidates against SoS requirement from FA repository by comparing the SoS Task requirement and legacy system Task semantically. In the third phase, SoS architects compose a scenario with extracted CS candidates proper to SoS Task requirements. The fourth phase is SoS scenario assessment phase that includes CS performance assessment against the SoS Task Performance requirements, Information exchange interoperability and Communication Link interoperability assessment. As a result of the SoS scenario assessment, feedback is allowed to the first or third phase to re-specify SoS requirements or to compose of another SoS scenario with other CS 201

212 candidates. This iteration allows architects to get the most proper architecture description. This means that architects can have more proper CSs that satisfy with performance requirements and interoperability against SoS mission objectives with the legacy systems. The result of SoS architecture descriptions also could include insufficient performance and/or not interoperable CS in information or communication links because only legacy systems are used. In the last fifth phase, capability analyst or acquisition decision-maker can make use of SoS description assessments result in order to ameliorate current CS. Proposed method defines the essential meta data that should be stored in the SoS Architecture Repository and Federated Architecture Repository (see Fig.1). Fig. 2. Essential Meta data Model The SoS Architecture Repository contains SoS requirement specification data, CS candidates data (extracted from legacy systems in FA repository), SoS scenario (a sequence of selected CS Candidates to support the required SoS Tasks process), and the assessment results of the SoS scenario. The assessment result contains the data whether it satisfies with Performance and Information and Communication Link interoperability requirements or not. Federated Architecture Repository stores meta data for legacy system architecture such as CS task, CS Performance, CS Information and CS Communication Link. These meta data is used for investigating whether the SoS Task requirements are satisfied or not. Fig. 2 shows acknowledged SoS meta data model and Table 1 explains the meaning of essential meta data. Meta data SoS SoS Task SoS Task Process SoS Task Performance SoS Task Information CS Candidate Composition Scenario Table 1. Explanation of the essential meta data Description Summary and description of SOS. Eg, mission objectives, time point, context, author, etc. activity or action to complete SoS objectives. It should comply with highlevel reference taxonomy. the execution sequence of SoS Tasks measure and desired value of performance needed to complete a SoS Task A description of the information and their relevant attributes exchanged between SoS Tasks A set of the Legacy Systems to support SoS Tasks. SoS composes of a set of Constituent Systems. A set of the ordered pairs {a SoS Task, a CS selected from CS Candidates} 202

213 Performance SAT performance Information SAT send receive msg-form CommLink SAT bandwidth protocol CS CS Task CS Performance CS Information CS CommLink to support a SoS Task. Whether the performance of selected CS satisfies with the required SoS Task Performance or not (SAT or USAT), SAT = Satisfaction, USAT = Unsatisfaction Information Satisfaction Whether sending CS have SoS Task Information or not (SAT or USAT) Whether receiving CS have the SoS Task Information or not (SAT or USAT) Whether message format corresponds with both sending CS and receiving CS or not (SAT or USAT) Weather bandwidth and protocol correspond with both sending CS and receiving CS communication link Weather bandwidth corresponds with between both sending CS and receiving CS or not (SAT or USAT) Weather protocol corresponds with between both sending CS and receiving CS or not (SAT or USAT) A legacy system stored in FA repository. It is used for investigating the SoS Task requirements. activity or action to complete CS objectives. It should comply with highlevel reference taxonomy. measure and desired value of performance needed to complete a CS Task A description of the information and their relevant attributes exchanged between CS Tasks communication medium to exchange CS Information between CSs. 4 Experiment results for Missile Defense SoS Architecture Assessment Characters Table 2. Result data of MD SoS assessment U-2 RC-135 Scenario A AN/ TPY-2 Mistral DPS Satellite Scenario B RC-135 AN/SPY- 1D(V) Detect image Performance 36000Km altitude SAT n/a n/a n/a SAT n/a n/a n/a 10m focal length SAT n/a n/a n/a SAT n/a n/a n/a Detect image-track Information. Image SAT n/a SAT n/a SAT n/a SAT n/a msg-form-corres-sat SAT n/a SAT n/a SAT n/a SAT n/a Ditect image-trackcommlink Bandwidth-sat USAT n/a USAT n/a SAT n/a SAT n/a protocol-sat SAT n/a SAT n/a SAT n/a SAT n/a Detect signal Performance 6500Km combat range n/a SAT n/a n/a n/a SAT n/a n/a Ditect signal-track Information. signal n/a SAT SAT n/a n/a SAT SAT n/a msg-form-corres-sat n/a SAT SAT n/a n/a SAT SAT n/a Ditect signal-trackcommlink bandwidth-sat n/a SAT SAT n/a n/a SAT SAT n/a protocol-sat n/a USAT USAT n/a n/a SAT SAT n/a Track Performance 2000Km track range n/a n/a USAT n/a n/a n/a SAT n/a Track-Kill Information target priority n/a n/a SAT USAT n/a n/a SAT SAT msg-form-corres-sat n/a n/a USAT USAT n/a n/a SAT SAT Track-Kill CommLink bandwidth-sat n/a n/a USAT USAT n/a n/a SAT SAT protocol-sat n/a n/a USAT USAT n/a n/a SAT SAT Kill Performance 100Km intercept range n/a n/a n/a USAT n/a n/a n/a USAT PAC-3 203

214 In this experiment, Missile Defense SoS consists of the four SoS Tasks: Detect launched missile image, Detect signal, Tract the missiles, and Kill the missiles. The SoS Task Performance and Information requirements are also specified in these Tasks. Table 2 shows the assessment results of the scenario A and B by using the assessment algorithm and some weapon systems architecture data. From the result SoS architects can know which scenario is the better SoS architecture. Further, architects can identify the gap between existing SoS and future SoS objective. And then, capability analysts and acquisition decision-makers can plan to resolve the accurate problem point in the current legacy systems capability for the future SoS. 5 Conclusion In this paper, we presented an acknowledged SoS architecture description approach to enable capability analyst and acquisition planner to assess performance and interoperability characteristics against SoS requirements based on FA repository. Our experiment applied to MD SoS and considered only a few assessment attributes like CS Performance, and Information and Communication Link interoperability. However, the proposed approach can be generally applied to different types of acknowledged SoS architecture description, and can be easily expanded to consider other attributes like CS Services and equipment function level performance. References 1. DoD Architecture Framework Version 2.02, URL: 2. CJCSI G Joint Capabilities integration and development system, URL: 3. Systems Engineering Guide for System of Systems ver 1.0,URL: 4. Industry Recommendations for DoD Acquisition of Information Services and SOA Systems, SOA Acquisition Working Group, URL: July 7(2008) 5. A Practical Guide to Federal Enterprise Architecture, URL: 6. Global Information Grid(GIG) Architecture Federation Strategy Version 1.2, Director, URL: 204

215 Scenario Generation for Model Checking Operating Systems Nahida Sultana Chowdhury and Yunja Choi Dept. Of Computer Science and Engineering, Kyungpook National University, South Korea Abstract. This paper suggests an automated scenario generation technique through a property-based static analysis of function-call relationship of the program source code. We present the scenario generation process and show application results on the Trampoline operating system using CBMC as a backend model checker. Keywords: Trampoline OS, CBMC, Verification, Scenario Generation. 1 Introduction Model checking enables engineers directly apply the technique to program source code removing the model construction process required formerly. A straightforward approach that models the interaction behavior as infinite and non-deterministic choices among the system APIs is too costly, since the model checking technique is based on the exhaustive search of the whole system state-space. This paper presents our approach which automatically generates environment models using the structural data dependency information analyzed from the source code. The experimental result shows the efficiency of our approach using the Trampoline operating system as a case example. OSEK/VDX [1] is an international standard for real-time operating system used in the field of automotive embedded software. Trampoline is an open source operating system written in C and is based on OSEK/VDX. In embedded system safety properties can be specified as assert conditions and CBMC [2] use bounded model checking techniques to verify the assertions. If any violated property exists then it returns a counterexample with tracing information, which provides useful information for safety analysis. CBMC requires specifying environments for the verification, application scenario and the validity of the environment model which have a big impact on the efficiency of model checking. When model checking is applied under arbitrary environment it will exhaustively exercise all possible call sequences and its verification result will contain false-negatives, impossible counterexamples. We note that if we look at the Trampoline source code the valid call sequence can be identified. Therefore we suggest a method to automatically generate valid application scenarios for the system. 205

216 2 Property-based Scenario Generation Approach In our approach, scenarios are generated through the analysis of function s called-by graphs and call graphs of the program source code. Property-based scenario generation consists of three parts: (1) extraction of relevant Root-Level-Functions, (2) pruning call sequences from the identified Root-Level-Functions to End-Level- Functions, and (3) non-deterministic choice of the pruned call sequences after applying constraints imposed by the OSEK/VDX standard. In the first step, starting from variables participating in the target verification property, we first extract End- Level-Functions, which directly modify or use the variables. The end-level-functions are then traced in terms of function-call dependency up to the Root-Level-Functions, which are API services provided by the operating system. Since the identified Rootlevel-functions may also include function calls that do not lead to the end-levelfunctions, the second part eliminates the irrelevant paths from the root-level-functions. Finally, the third part generates environment model that exercises all possible call sequences from the identified root-level-functions to the end-level-functions, anticipating only regal scenarios that obey requirements specified in the international standard OSEK/VDX. This approach is implemented on top of Understand analysis tool [3]. The tool structure has provided in Figure 1 to present how each module of our experiment link with each other. Figure 1: Scenario generation and verification tool chain In our approach, the system calling sequence of End-Level-Functions is known as a scenario. Afterwards, in non-deterministic way we choose the Root-Level-Functions and sequentially generate the End-Level-Function sequence. To make a valid scenario, we also consider the constraints between two root functions. 3 Experiments From Trampoline OS we have chosen six safety properties for the verification, as shown in Table 1. Six variables are extracted from the properties, which are tpl_h_prio, tpl_fifo_rw, tpl_ready_list, tpl_kern, prio, tpl_locking_depth. For the all assertion in Trampoline OS we have conducted the scenario verification for end-level-functions sequence. 206

217 Table 1. List of safety properties in Trampoline OS Assert Conditions 1. assert((tpl_h_prio >= 0) && (tpl_h_prio <3)) 2. assert (tpl_h_prio!= -1); 3. assert (tpl_fifo_rw[tpl_h_prio].size > 0); 4. assert (tpl_fifo_rw[prio].size < tpl_ready_list[prio].size); 5. assert (tpl_kern.running!= NULL); 6. assert (tpl_kern.running->state == RUNNING); 7. assert ((prio >= 0) && (prio < 3)); 8. assert( tpl_locking_depth > =0 ); Table 2 represents the verification run time properties based on different assert conditions of Tampoline OS and length of scenario (number of root-level-functions). Table 2. Run time data in verification time Assert Conditions Assert Condition Length of Runtime (s) No. of Size of Program Target Variable Scenario (No. of Root Level Generated VCC Expression (No. of assert(tpl_kern.running!=null) tpl_kern Functions) assignments) assert((tpl_kern.running OM assert((tpl_h_prio >= 0) && tpl_h_prio assert (tpl_h_prio!= -1) assert (tpl_fifo_rw[tpl_h_prio].size > tpl_fifo_rw assert (tpl_fifo_rw[prio].size < tpl_ready_list tpl_ready_list[prio].size) Conclusion The important key facts are: (a) Scenarios are generated by analyzing the end level function call sequence (call graph and called by graph); (b) only valid scenarios are generated; (c) the scenario generation is performed considering the constraints imposed by international standard OSEK/VDX; (d) the last and the most importantly, without deep knowledge about the source code we can easily generate the valid scenario automatically. Acknowledgement. This work was supported by the IT R&D program of MKE/KEIT [ , Self- Organized Software-platform (SOS) for welfare devices]. References 1. OSEK/VDX Operating System Specification CBMC Installation Understand Source code analysis and Metrics

218 Detecting Accesses of First Data Races in Parallel Programs with Random Synchronization Hee-Dong Park 1, and Yong-Kee Jun 2 1 Joongbu University, Kumsan, Korea hdpark@joongbu.ac.kr 2 Gyeongsang National University, Jinju, Korea jun@gnu.ac.kr Abstract. Detecting data races is important in debugging shared memory parallel programs, because the races could result in unintended nondeterministic executions of the programs. Unfortunately, previous race detection techniques cannot guarantee to detect at least one access involved in the first races to occur in parallel programs with random synchronization. This paper presents a program monitoring algorithm which collects all of the current key accesses of their local thread blocks that are involved in races with the latest key accesses in the other concurrent thread blocks. Keywords: race detection, parallel program debugging, first race, random synchronization, program monitoring 1 Introduction One of the inherently fundamental problems encountered when debugging a parallel program is resolving the data race conditions in the program. A data race occurs in a parallel or multi-threaded program when two threads access the same memory location without proper synchronization constraints between the accesses, such that at least one of the accesses is a write [1]. Incorrect synchronization leads to incorrect ordering between accesses to shared memory. Data race could result in either unpredictable results or paths of events in different executions on the same input. Previous race detection techniques [4, 6, 5, 7] cannot locate the candidate accesses which contain at least one access involved in the first races for parallel programs with random synchronization. Such parallel programs could exhibit different sequence of events when executed repeatedly, that is, can not guarantee the access sequence in one execution to that of in another execution due to small timing variations in execution of synchronization event, and could result in non-deterministic event ordering for debugging. It has been proved that detecting races in program executions that have synchronization powerful enough to support mutual exclusion is NP-hard [2]. Thus detecting actual races is of practical use in a particular execution of such parallel programs. 208

219 2 The races that occur first are races between two accesses that are not causally preceded by any other accesses also involved in races. The first races are important in debugging because the removal of such races may make other races disappear. It is even possible that all races reported by other algorithms would disappear once the first races are removed. The main result of our on-the-fly algorithm is to collect filtered candidate accesses in which at least one access be included in first races for a parallel program execution. 2 Candidate Access Filtering for Race Detection The concurrency relation among threads in an execution of shared-memory parallel program can be represented by a directed acyclic graph called POEG (Partial Order Execution Graph) [3] which captures the happens-before relation [1]. An access a i happened before another access a j, denoted as a i a j, and it is concurrent with each other if there exist no paths between them. There can be many races in an execution of parallel program, and first race to occur or simply a first race is either an unaffected race or a tangled race. In a parallel program, a synchronization block or a block is an access sequence between inter-thread coordination events(such as POST,WAIT, fork and join). We define a write (read) access a i as a key access if there does not exist any other write (read or write) access a j within a block such that a j a i. And Block Access-history for a shared variable X and thread identifier T, denoted by BA(X, T ) is a set of key accesses in a block of thread T. A BA(X, T ) has maximum two accesses(read and/or write) and is cleared at each synchronization event. A read (write) access a i in the corresponding access history(ah) is a read (write) candidate, if a i is involved in a race and there exists no other access a h such that a h a i and a h is involved in a race. A BA(X,T) is maintained as local variable of corresponding thread T, which can be free from mutual exclusion with shared memory of other threads or processors. The key accesses are not always involved in the race, and maintaining all the key accesses to detect first races is also inefficient, therefore we filter out the key accesses to make candidate accesses and only the candidate accesses would be involved in the first race. The Candidate Set for a shared variable X, denoted by CS(X), is a set of candidates which are involved in the race for a shared variable X : CS(X, R) for a set of read candidates, CS(X, W ) for a set of write candidates. To get a subset of CS(X), every key access from BA(X, T ) is checked through the logical concurrency with the access in AH(X) in an execution of program. The algorithm is as follows: 1. Collect key accesses : Check if the current access is key or not which from BA(X, T ) in thread T. If not key, then return. 2. Update AH(X) : For all accesses in AH(X), if there is an access a i which is a i current and the race bit of a i is true, return, otherwise delete a i from AH(X); and add the current to the corresponding set of AH(X). 209

220 3 3. Determine CS(X) : For all accesses in AH(X), if there is an access a i which is involved in a race with the current, set the race bits of both accesses, otherwise return. Any current access concurrent with the accesses in AH(X) is added to the corresponding CS(X). 4. Halt : Halt the current thread, if the current is a write access. In step 1 and 2, we monitor all memory operations executed during a particular execution and filter out the accesses to get key accesses in each block access history, and update access history which contain mutually concurrent accesses. In step 3, we inspect access history for getting a subset of the candidates in which at least one access is included in the first races. The accesses which are not key are discarded in processing race condition determination, which makes more efficient in time and space complexity. 3 Conclusion In this paper, we present an algorithm to collect filtered accesses on a particular execution of parallel program with random synchronization, by extracting key accesses and collecting candidate sets in which at least one access is involved in the first races. It could be required huge amount of space to detect the first races in one monitored execution of parallel programs with synchronization, so our technique to collect filtered accesses for race detection is more efficient and practical in debugging a large class of shared-memory parallel programs. References 1. Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System, Communications of the ACM, pp , July Netzer, R.H.B. and B.P. Miller: On the Complexity of Event Ordering for Shared- Memory Parallel Program Executions, Intl. Conf. on Parallel Processing, pp. II-93- II-97 (1990). 3. Dinning, A., and Schonberg E., An Empirical Comparison of Monitoring Algorithms for Access Anomaly Detection, 2nd Symp. on Principles and Practice of Parallel Programming, ACM, pp. 1-10, March Jun, Y., and C. E. McDowell, On-the-fly Detection of the First Races in Programs with Nested Parallelism, 2nd Int l Conf. on Parallel and Distributed Processing Techniques and Applications, CSREA, pp , August Park, H., and Y. Jun, Detecting the Firtst Races in Parallel Programs with Ordered Synchronization, 6th Int l Conference on Parallel and Distributed Systems (ICPADS), pp , IEEE, Tainan, Taiwan, December Kim, J. and Jun, Y., Scalable On-the-fly Detection of the First Races in Parallel Programs, Proc. of the 12nd Int l Conf. on Supercomputing, ACM, pp , July Ha, K., Y. Jun, and K. Yoo, Efficient On-the-fly Detection of First Races in Nested Parallel Programs, Workshop on State-of-the-Art in Scientific Computing (PARA), pp , Copenhagen, Denmark, June

221 Visualizing Data Races in Concurrent Signal Handlers Lin Gan, Guy Martin Tchamgoue, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Jinju , South Korea Abstract. Asynchronous signal handling introduces fine-grained concurrency into sequential programs making them prone to data races. Unfortunately, existing tools for detecting data races in sequential programs that use concurrent signal handlers fail to provide effective means for understanding the dynamic behavior of concurrent signal handlers involved in data races. Thus, this paper presents a visualization tool that uses vertically parallel arrows to capture the logical concurrency between a sequential program and its concurrent signal handlers, materializes synchronization patterns with horizontal arrows, and uses colored squares to represent accesses to shared variables in order to provide a partial ordering of events that occured at runtime. Key words: Data races, sequential programs, concurrent signal handlers, visualization 1 Introduction Data races [2] represent one of the most notorious class of concurrency bugs in shared memory parallel programs. Data races occur when two threads access a shared memory location without proper synchronization, and at least one of the accesses is a write. Sequential programs are prone to data races due to asynchronous signals that introduce fine-grained concurrency into such programs making difficult to be thoroughly tested and debugged. Many tools [3 6] have recently been proposed to automatically detect data races in sequential programs that use concurrent signal handlers. Ronsse et al. [3] adapted an existing on-the-fly race detector for multithreaded programs to fit for sequential programs. Tahara et al. [4] presented an approach for race detection in sequential programs that use signals. The technique is based This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program (NIPA-2012-H ) supervised by the NIPA (National IT Industry Promotion Agency). Corresponding author: In Gyeongsang National University, he is now the director of GNU Embedded Software Center for Avionics (GESCA) which is a national IT Research Center (ITRC) of Republic of Korea 211

222 2 L. Gan, G. M. Tchamgoue, and Y.-K. Jun on the /proc system file (for Solaris 10) or the debug registers (for IA32 Linux) and uses watchpoints to monitor accesses to shared variables. Tchamgoue et al. [5, 6] proposed an efficient on-the-fly data race detection technique for sequential programs with concurrent signal handlers by using a lightweight labeling scheme to generate concurrency information with constant size for the sequential program and every instance of the concurrent signal handlers. Despite the capabilities of all these tools, their outputs still consist of overly concise reports, or very long program traces [1]. Hence, understanding the runtime behavior of a sequential program together with its concurrent signal handlers and the reported data races is still difficult. 2 Visualization Model This paper presents a visualization tool that uses vertically parallel arrows to capture the logical concurrency between the sequential program and its concurrent signal handlers, materializes synchronization patterns with horizontal arrows, and uses colored squares to represent accesses to shared variables in order to provide a partial ordering of events that occured at runtime. Our visualization model is therefore based on an easy to access partial order execution graph that provides programmers with information on reported data races and the runtime behavior of a program at a glance. To understand the inner behavior of a sequential program with concurrent signal handlers and to understand the reported data races, a powerful visualization tool should be able to abstract even complex events for an easy visualization. To visualize data races in sequential programs, we should be able to efficiently represent accesses to shared variables and other concurrency patterns like signal registration, signal blocking and signal blocking. As shown in Fig.1, we use letters R and W into a small squared shapes to respectively represent read and write accesses to shared variables. The instance number for each invocation is shown on top of the access. A bidirectional arrow is used for all signal registration operations. Similarly, unidirectional arrows are used for blocking and unblocking operations on signals. However, to differentiate between the main program or a concurrent signal that initiates an operation, a dot marker is used. Each arrow is topped with a code (i.e. Rg for registration, Bk for blocking, and Ub for unblocking) identifying each operation, followed by the signal number (e.g. Rg:SIGALRM in Fig.1 for the registration of the SIGALRM signal). Vertically parallel arrows are used to represent the logical concurrency between the main program and its concurrent signal handlers as presented in the example of Fig.2. We abstract all invocations of the same signal handler into one vertical arrow. Thus, the signal handler for SIGALRM invoked two times as shown in Fig.2, is only materialized by a single line. Thus, the maximum number of vertical arrows in a visualization instance is always equal to the number of signals in the system plus one, i.e 65 for a UNIX-like system that maintains only 64 signals. However, each access to a shared variable is distinguished by 212

223 Visualizing Data Races in Concurrent Signal Handlers 3 Fig. 1. Visualization Patterns the invocation number of the signal handler to which it belongs. This number is always one for the main program as it is invoked only once. It is therefore easy to see that the main program registered two signal handlers: SIGALRM and SIGINT. The bidirectional arrow for the registration operation can be traversed in one way or another. Right after the registrations, SIGALRM was invoked for the first time and performed a write access to a shared variable. After this, SIGINT is invoked to block SIGALRM, perform a write operation on the shared variable and finally unblock SIGALRM it previously blocked. Similarly, the main program blocks SIGINT, performs a read access on the shared variable before unblocking SIGINT. After these operations, Fig.2 shows that the main program performs a write access on the same shared variable. We note that only accesses to a selected shared variable are kept visible. Following the example of Fig.2, it is clear that the underlying program contains two data races involving the two accesses on the main program and the write access from the second invocation of the SIGALRM signal handler. This is due to the fact that there is no path from the write access of the second invocation of SIGALRM to the accesses of the main program. However, we can always find a path from one of the other accesses to another, meaning they are ordered by the happens-before relation. This simple but powerful visualization model therefore capture the partial ordering of runtime events in a sequential program that use concurrent signal handlers. 3 Conclusion Asynchronous signal handling introduce fine-grained concurrency into sequential programs making prone to data races and difficult to be effectively tested and debugged. Data races may lead programs into non-deterministic executions with unpredictable results. In this paper, we presented a simple but powerful visualization tool that capture the dynamic runtime behavior of a sequential program together with its concurrent signal handlers into an partial order execution graph. By visualizing the data races detected at runtime, this tool provides a great understanding of the program to programmers. 213

224 4 L. Gan, G. M. Tchamgoue, and Y.-K. Jun Fig. 2. Example of Program Visualization However, the proposed visualization tool fails to represent re-entrant signal handlers that is signal handlers that can preempt themselves. We intend to extend this model to effectively represent not only re-entrant signal handlers, but also to support interrupt-based programs. References 1. Artho, C., Havelund, K., Honiden, S.: Visualization of Concurrent Program Executions. In: The 31st Annual International Computer Software and Applications Conference (COMPSAC 07), pp IEEE (July 2007) 2. Banerjee, U., Bliss, B., Ma, Z., Petersen, P.: A Theory of Data Race Detection. In: 4th Workshop on Parallel and Distributed programming: Testing, Analysis and Debugging (PADTAD 06), pp.69-78, ACM, (2006) 3. Ronsse, M., Maebe, J., and De Bosschere, K.: Detecting Data Races in Sequential Programs with DIOTA. In: Euro-Par 2004, LNCS, vol. 3149, pp.82 89, Springer, Heidelberg (2004). 4. Tahara, T., Gondow, K., and Ohsuga, S.: Dracula: Detector of Data Races in Signals Handlers. In: The 15th IEEE Asia-Pacific Software Engineering Conference (APSEC 08), pp.17 24, IEEE (2008). 5. Tchamgoue, G. M., Ha, O.-K., Kim, K.-H., and Jun, Y.-K.: Lightweight Labeling Scheme for On-the-fly Race Detection of Signal Handlers. In: Ubiquitous Computing and Multimedia Applications (UCMA 11), pp , Springer (2011). 6. Tchamgoue, G. M., Kim, K.-H., and Jun, Y.-K.: Efficient Detection of Data Races in Concurrent Signal Handlers. Information-An International Interdisciplinary Journal, 15(3): , (March 2012). 214

225 An Automatic Parallelization Scheme for Simulink-based Real-Time Multicore Systems Minji Cha, Seong Kyun Kim, and Kyong Hoon Kim Department of Informatics, Gyeongsang National University Gajwadong 900, Jinju , South Korea Abstract. Matlab/Simulink provides developer with model-based development environments for various applications. Real-Time Workshop in Simulink toolkits automatically generates C/C++ programs, which enables user to build real-time systems easily. However, the generated program code is only for single process so that it is difficult to build highperformance real-time systems. In this paper, we propose an optimization scheme of parallelizing Simulink blocks for building multi-threaded real-time applications on multicore systems. The proposed the scheme extracts the dependency graph of Simulink blocks and estimates their execution times on target platforms. Based on the dependency graph with estimated execution times, multi-threaded real-time applications are automatically generated on RTAI Linux systems. Keywords: Optimization, Parallelization, Simulink, Real-Time 1 Introduction Matlab/Simulink [1] provides users with graphical user interface (GUI) for block model-based design. One of key functions is the automatic code generation of Real-Time Workshop toolkit that generates C/C++ program for various target platforms. Thus, users easily develop real-time programs and generate codes for their platforms. Although the automatic code generation function of Simulink reduces cost, the generated program is only in the form of single process. This makes it difficult to utilize high performance of multi-core systems. Our earlier work [4] provided user-defined Simulink blocks in order to generate parallel codes for real-time multicore systems. However, the work has the drawback that users specify the parallel part using user-defined block manually. Thus, in this paper, we enhance the framework by adding a new automatic parallelization scheme which extracts the dependency graph of Simulink blocks and produces multi-threaded codes. This research was supported in part by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program (NIPA-2012-H ) supervised by the NIPA (National IT Industry Promotion Agency). 215

226 2 M. Cha, S. K. Kim, and K. H. Kim 2 The Proposed Framework In this section, we present how to run a Simulink application on real-time multicore systems with automatic parallelized threads. Fig. 1 shows the proposed framework. The followings explain each steps of the framework. Step 1. Programming a Simulink application: First, a user develops a Simulink application based on block diagrams. The blocks which are to be parallelized should be specified as subsystems and functions. This program is generated automatically to C code using Real-Time workshop tool. RTW codes and the main C codes among the generated source codes are used to generate the dependency graph. Step 2. Creating the dependency gragh: RTW code includes all of block information such as position, state, dependency, and so on. We can obtain the dependency graph of main blocks which are specified as subsystems and functions from RTW code. However this graph does not include the execution time information, so that we need to estimate the block execution time. Step 3. Estimating the execution time of each block: Dependency graph needs execution time of main blocks in order to parallelize subtasks appropriately. So, We insert the codes for profiling execution times of blocks around the block codes. When the modified program is executed in the target platform, we Matlab/Simulink Generated Code Dependency Graph G A B C D Auto-code generation Graph creation E F C code RTW code Check the execution time Run On Target Task Allocation Execution Time Table t1 Block name Execution time CPU1 E 7 G 10 F 2 A 15 C 3 CPU2 B 5 D 4 Run t2 A 15 t3 B 5 E 7 G 10 C 3 F 2 D 4 Graph creation A B C D E F G Fig. 1. Steps of the proposed framework 216